Skip to content

Emad-COMBINE-lab/pllm-ppi-data-leakage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pllm-ppi-data-leakage

License: AGPL v3 DOI: 10.1101/2025.04.21.649858

This repository houses the code and data for "A flaw in using pre-trained pLLMs in protein-protein interaction inference models".

How To Use this Repository

Please consult the documentation for information on how to install dependencies, run experiments, and generate all figures in the manuscript.

Where's the Data?

There are 14.5GiB of compressed data which is too large for Git to reasonably handle. Data files can be downloaded via the HTTP and BitTorrent protocols. More information is available in the data folder.

Installation & Requirements

Instructions for installation, as well as details on requirements and dependencies, are all available through the online documentation.

Install time for the various experiments in this repository vary greatly and depend on hardware (e.g., disk I/O), but typically will take less than 15 minutes for each code base.

Python 3 is required to locally building the online documentation site. Package dependencies is available in the requirements.txt file in this folder. All code was tested on Linux (Debian-based Distrubtions).

Demo

A demonstration of how to regenerate all the figures in the manuscript is available through the online documentation.

License

Code

All code files in this repository, unless otherwise specified, are licensed under the GNU AGPLv3 License.

Code for "A flaw in using pre-trained pLMs in protein-protein interaction inference models"

Copyright (C) 2025 Joseph Szymborski

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Data

All data files in this repository, unless otherwise specified, are licensed under the CC BY-NC-SA 4.0:

Data for "A serious flaw in the design of pLM-based protein-protein interactions" by Joseph Szymborski is licensed under CC BY-NC-SA 4.0

About

Code and Data for "A flaw in using pre-trained pLLMs in protein-protein interaction inference models"

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors