Skip to content

yukti-khanna/rg-rich-regions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rg-rich-regions

Analysis code + inputs to study RG/RGG tract grammar in disordered regions and generate manuscript figure panels.
Includes CALVADOS slab simulation post-processing utilities under sims_xrgg/.


Contents


Repository layout

rg-rich-regions/
├── inputs/                  # input data files used by scripts (FASTA, lists, tables)
├── outputs/                 # generated figures and exported summaries
├── scripts/                 # main analysis + plotting scripts
├── aa_enrichment_panels/    # AA enrichment assets (aa_log2_enrichment_values.tsv)
├── kde_data/                # KDE contour/supplementary data (kept at repo root)
└── sims_xrgg/
    ├── scripts/             # slab prep + analysis scripts
    └── outputs/             # generated plots/CSVs from sims analysis

Tip: Run scripts from the repo root to avoid path issues:

python scripts/fig3.py

Dependencies

Recommended Python version: 3.10+ (tested best with 3.11).

Core Python libraries used across the analysis/plotting scripts:

  • numpy
  • pandas
  • matplotlib
  • scipy
  • openpyxl (for reading .xlsx inputs)
  • biopython (for FASTA parsing, if used)
  • tqdm (optional; progress bars)

If you want to auto-detect non-stdlib imports used by the repo:

python - <<'PY'
import ast
from pathlib import Path

roots = [Path("scripts"), Path("sims_xrgg/scripts")]
pyfiles = []
for r in roots:
    if r.exists():
        pyfiles += list(r.rglob("*.py"))

mods = set()
for f in pyfiles:
    try:
        tree = ast.parse(f.read_text(encoding="utf-8"))
    except Exception:
        continue
    for n in ast.walk(tree):
        if isinstance(n, ast.Import):
            for a in n.names:
                mods.add(a.name.split(".")[0])
        elif isinstance(n, ast.ImportFrom) and n.module:
            mods.add(n.module.split(".")[0])

stdlib_guess = {
 "os","sys","re","math","json","csv","gzip","bz2","lzma","pathlib","glob",
 "itertools","collections","typing","argparse","subprocess","datetime",
 "functools","statistics","random","warnings","textwrap","shutil"
}
mods = sorted(m for m in mods if m not in stdlib_guess)
print("\n".join(mods))
PY

Setup

Conda is recommended.

  1. Create the environment:
conda env create -f environment.yml
conda activate rg-rich-regions
  1. (Optional) Update later if environment.yml changes:
conda env update -f environment.yml --prune

Example environment.yml (repo root):

name: rg-rich-regions
channels:
  - conda-forge
dependencies:
  - python=3.11
  - numpy
  - pandas
  - scipy
  - matplotlib
  - biopython
  - openpyxl
  - tqdm
  - pip
  - scikit-learn
  - statsmodels

Quick sanity check

python scripts/check_paths.py

This verifies:

  • key directories exist (inputs/, outputs/, scripts/, sims_xrgg/)
  • required input files are present
  • helper folders exist (aa_enrichment_panels/, kde_data/)
  • scripts compile and helper modules import cleanly

Run figure scripts

Run from repo root.

1) Main Fig 3 / RG tract analysis

python scripts/fig3.py

2) Fig S6: amino-acid enrichment summary

python scripts/xrgg_aa_analysis.py

Outputs are written into outputs/ (and/or as specified inside each script).


sims_xrgg workflow

Recommended order (run from repo root):

1) Prepare slab inputs

IDR-only slabs:

python sims_xrgg/scripts/prepare_slab_idr.py

IDR+RNA slabs:

python sims_xrgg/scripts/prepare_slab_mix.py

2) Run slab analysis from trajectories

Single directory:

python sims_xrgg/scripts/run_slab_from_traj.py

Multiple directories:

python sims_xrgg/scripts/run_all_slab_from_traj.py

3) Plot slab outputs

Single directory:

python sims_xrgg/scripts/plot_slab_outputs.py

Multiple directories:

python sims_xrgg/scripts/plot_all_slab_outputs.py

4) Check slab inputs/outputs sanity

python sims_xrgg/scripts/check_slab_inputs.py

5) Scan / summarize partition coefficients

python sims_xrgg/scripts/scan_partition_coefficients.py

Simulation analysis outputs are written to sims_xrgg/outputs/ by default.



### Path / file-not-found errors
Run:
```bash
python scripts/check_paths.py

Then confirm:

  • required files exist in inputs/
  • aa_enrichment_panels/aa_log2_enrichment_values.tsv exists (if you run enrichment panels)
  • kde_data/ exists (if you run KDE-based panels)

Citation

Manuscript in preparation.

Model attribution

Slab simulations were performed using CALVADOS 3, developed in the Lindorff-Larsen lab (University of Copenhagen / KULL Centre). The simulation workflow used the CALVADOS software package.

Please cite:

  • Cao F, von Bülow S, Tesei G, Lindorff-Larsen K. A coarse-grained model for disordered and multi-domain proteins. Protein Science (2024). DOI: 10.1002/pro.5172
  • von Bülow S, Yasuda I, Cao F, Schulze TK, Trolle AI, Rauh AS, Crehuet R, Lindorff-Larsen K, Tesei G. Software package for simulations using the coarse-grained CALVADOS model. arXiv (2025). DOI: 10.48550/arXiv.2504.10408

About

RG/RGG tract grammar analysis + figure-generation scripts, with CALVADOS v3 slab simulation post-processing

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors