A biomedical question-answering system focused on drug information, built using BioBERT and PyTorch.
This project implements a specialized question-answering model for drug-related queries. The model uses BioBERT (Bidirectional Encoder Representations from Transformers pre-trained on biomedical text) to understand and answer questions about medications, their side effects, dosages, and uses.
- Fine-tuned BioBERT model for biomedical question answering
- Support for multiple dataset sources (BioASQ, DrugEHRQA, synthetic data)
- Interactive Q&A interface for drug information queries
- Evaluation metrics including Exact Match and F1 Score
- Built-in context about common medications
- Confidence scoring for answers
- Python 3.6+
- PyTorch
- TensorFlow
- Transformers
- Pandas
- NumPy
- scikit-learn
- NLTK
- tqdm
Install dependencies with: bash pip install torch tensorflow transformers pandas numpy scikit-learn nltk tqdm matplotlib
The model can work with several biomedical QA datasets:
- BioASQ: A biomedical semantic indexing and question answering challenge
- DrugEHRQA: A dataset focusing on drug-related questions in electronic health records
- Synthetic Dataset: A fallback option that generates realistic drug-related QA pairs
The implementation attempts to download these datasets in order of preference and falls back to synthetic data generation if necessary.
The system uses the BioBERT v1.1 model, which is a version of BERT pre-trained on biomedical text. Key components include:
- BioBERT base model with question-answering head
- Custom dataset processing for handling answer spans
- AdamW optimizer with linear learning rate scheduler
- Context-based answer extraction
python
python
python
python
interactive_qa() # Run this function to interact with the model
The model can answer questions like:
- "What are the side effects of Aspirin?"
- "How should I take Metformin?"
- "Can antibiotics treat viral infections?"
- "What is Lisinopril used for?"
- Expand the default context with information about more medications
- Implement negation handling for more accurate answers
- Add support for multi-document question answering
- Enhance the model with entity linking to medical knowledge bases