Analysis code + inputs to study RG/RGG tract grammar in disordered regions and generate manuscript figure panels.
Includes CALVADOS slab simulation post-processing utilities under sims_xrgg/.
- Repository layout
- Dependencies
- Setup
- Quick sanity check
- Run figure scripts
- sims_xrgg workflow
- Citation
rg-rich-regions/
├── inputs/ # input data files used by scripts (FASTA, lists, tables)
├── outputs/ # generated figures and exported summaries
├── scripts/ # main analysis + plotting scripts
├── aa_enrichment_panels/ # AA enrichment assets (aa_log2_enrichment_values.tsv)
├── kde_data/ # KDE contour/supplementary data (kept at repo root)
└── sims_xrgg/
├── scripts/ # slab prep + analysis scripts
└── outputs/ # generated plots/CSVs from sims analysis
Tip: Run scripts from the repo root to avoid path issues:
python scripts/fig3.pyRecommended Python version: 3.10+ (tested best with 3.11).
Core Python libraries used across the analysis/plotting scripts:
numpypandasmatplotlibscipyopenpyxl(for reading.xlsxinputs)biopython(for FASTA parsing, if used)tqdm(optional; progress bars)
If you want to auto-detect non-stdlib imports used by the repo:
python - <<'PY'
import ast
from pathlib import Path
roots = [Path("scripts"), Path("sims_xrgg/scripts")]
pyfiles = []
for r in roots:
if r.exists():
pyfiles += list(r.rglob("*.py"))
mods = set()
for f in pyfiles:
try:
tree = ast.parse(f.read_text(encoding="utf-8"))
except Exception:
continue
for n in ast.walk(tree):
if isinstance(n, ast.Import):
for a in n.names:
mods.add(a.name.split(".")[0])
elif isinstance(n, ast.ImportFrom) and n.module:
mods.add(n.module.split(".")[0])
stdlib_guess = {
"os","sys","re","math","json","csv","gzip","bz2","lzma","pathlib","glob",
"itertools","collections","typing","argparse","subprocess","datetime",
"functools","statistics","random","warnings","textwrap","shutil"
}
mods = sorted(m for m in mods if m not in stdlib_guess)
print("\n".join(mods))
PYConda is recommended.
- Create the environment:
conda env create -f environment.yml
conda activate rg-rich-regions- (Optional) Update later if
environment.ymlchanges:
conda env update -f environment.yml --pruneExample environment.yml (repo root):
name: rg-rich-regions
channels:
- conda-forge
dependencies:
- python=3.11
- numpy
- pandas
- scipy
- matplotlib
- biopython
- openpyxl
- tqdm
- pip
- scikit-learn
- statsmodelspython scripts/check_paths.pyThis verifies:
- key directories exist (
inputs/,outputs/,scripts/,sims_xrgg/) - required input files are present
- helper folders exist (
aa_enrichment_panels/,kde_data/) - scripts compile and helper modules import cleanly
Run from repo root.
python scripts/fig3.pypython scripts/xrgg_aa_analysis.pyOutputs are written into outputs/ (and/or as specified inside each script).
Recommended order (run from repo root):
IDR-only slabs:
python sims_xrgg/scripts/prepare_slab_idr.pyIDR+RNA slabs:
python sims_xrgg/scripts/prepare_slab_mix.pySingle directory:
python sims_xrgg/scripts/run_slab_from_traj.pyMultiple directories:
python sims_xrgg/scripts/run_all_slab_from_traj.pySingle directory:
python sims_xrgg/scripts/plot_slab_outputs.pyMultiple directories:
python sims_xrgg/scripts/plot_all_slab_outputs.pypython sims_xrgg/scripts/check_slab_inputs.pypython sims_xrgg/scripts/scan_partition_coefficients.pySimulation analysis outputs are written to sims_xrgg/outputs/ by default.
### Path / file-not-found errors
Run:
```bash
python scripts/check_paths.py
Then confirm:
- required files exist in
inputs/ aa_enrichment_panels/aa_log2_enrichment_values.tsvexists (if you run enrichment panels)kde_data/exists (if you run KDE-based panels)
Manuscript in preparation.
Slab simulations were performed using CALVADOS 3, developed in the Lindorff-Larsen lab (University of Copenhagen / KULL Centre). The simulation workflow used the CALVADOS software package.
Please cite:
- Cao F, von Bülow S, Tesei G, Lindorff-Larsen K. A coarse-grained model for disordered and multi-domain proteins. Protein Science (2024). DOI: 10.1002/pro.5172
- von Bülow S, Yasuda I, Cao F, Schulze TK, Trolle AI, Rauh AS, Crehuet R, Lindorff-Larsen K, Tesei G. Software package for simulations using the coarse-grained CALVADOS model. arXiv (2025). DOI: 10.48550/arXiv.2504.10408