Welcome to the ScrapeGraphAI SDK documentation hub. This directory contains comprehensive documentation for understanding, developing, and maintaining the official Python SDK for the ScrapeGraph AI API.
Complete SDK architecture documentation including:
- Repository Structure - How the Python SDK is organized
- Python SDK Architecture - Client structure, async/sync support, models
- API Endpoints Coverage - All supported endpoints
- Authentication - API key management and security
- Testing Strategy - Unit tests, integration tests, CI/CD
- Release Process - Semantic versioning and publishing
Future: PRD and implementation plans for specific SDK features
Future: Standard operating procedures (e.g., adding new endpoints, releasing versions)
-
Read First:
- Main README - Project overview and features
- Python SDK README - Python SDK guide
-
Set Up Development Environment:
cd scrapegraph-py # Install dependencies with uv (recommended) pip install uv uv sync # Or use pip pip install -e . # Install pre-commit hooks pre-commit install
-
Run Tests:
cd scrapegraph-py pytest tests/ -v -
Explore the Codebase:
- Python:
scrapegraph_py/client.py- Sync client,scrapegraph_py/async_client.py- Async client - Examples:
scrapegraph-py/examples/
- Python:
...how to add a new endpoint:
- Read: Python SDK -
scrapegraph_py/client.py,scrapegraph_py/async_client.py - Examples: Look at existing endpoint implementations
...how authentication works:
- Read: Python SDK -
scrapegraph_py/client.py(initialization) - The SDK supports
SGAI_API_KEYenvironment variable
...how error handling works:
- Read: Python SDK -
scrapegraph_py/exceptions.py
...how testing works:
- Read: Python SDK -
tests/directory,pytest.ini - Run: Follow test commands in README
...how releases work:
- Read: Python SDK -
.releaserc.yml(semantic-release config) - GitHub Actions:
.github/workflows/for automated releases
cd scrapegraph-py
# Run all tests
pytest tests/ -v
# Run specific test file
pytest tests/test_smartscraper.py -v
# Run with coverage
pytest --cov=scrapegraph_py --cov-report=html tests/cd scrapegraph-py
# Format code
black scrapegraph_py tests
# Sort imports
isort scrapegraph_py tests
# Lint code
ruff check scrapegraph_py tests
# Type check
mypy scrapegraph_py
# Run all checks via Makefile
make format
make lintcd scrapegraph-py
# Build package
python -m build
# Publish to PyPI (automated via GitHub Actions)
twine upload dist/*The SDK supports the following endpoints:
| Endpoint | Python SDK | Purpose |
|---|---|---|
| SmartScraper | ✅ | AI-powered data extraction |
| SearchScraper | ✅ | Multi-website search extraction |
| Markdownify | ✅ | HTML to Markdown conversion |
| SmartCrawler | ✅ | Sitemap generation & crawling |
| AgenticScraper | ✅ | Browser automation |
| Scrape | ✅ | Basic HTML extraction |
| Scheduled Jobs | ✅ | Cron-based job scheduling |
| Credits | ✅ | Credit balance management |
| Feedback | ✅ | Rating and feedback |
Entry Points:
scrapegraph_py/__init__.py- Package exportsscrapegraph_py/client.py- Synchronous clientscrapegraph_py/async_client.py- Asynchronous client
Models:
scrapegraph_py/models/- Pydantic request/response modelssmartscraper_models.py- SmartScraper schemassearchscraper_models.py- SearchScraper schemascrawler_models.py- Crawler schemasmarkdownify_models.py- Markdownify schemas- And more...
Utilities:
scrapegraph_py/utils/- Helper functionsscrapegraph_py/logger.py- Logging configurationscrapegraph_py/config.py- Configuration constantsscrapegraph_py/exceptions.py- Custom exceptions
Configuration:
pyproject.toml- Package metadata, dependencies, tool configspytest.ini- Pytest configurationMakefile- Common development tasks.releaserc.yml- Semantic-release configuration
scrapegraph-py/tests/
├── test_async_client.py # Async client tests
├── test_client.py # Sync client tests
├── test_smartscraper.py # SmartScraper endpoint tests
├── test_searchscraper.py # SearchScraper endpoint tests
├── test_crawler.py # Crawler endpoint tests
└── conftest.py # Pytest fixtures
Python Example:
import pytest
from scrapegraph_py import Client
def test_smartscraper_basic():
client = Client(api_key="test-key")
response = client.smartscraper(
website_url="https://example.com",
user_prompt="Extract title"
)
assert response.request_id is not NoneIssue: Import errors in Python SDK
- Cause: Package not installed or outdated
- Solution:
cd scrapegraph-py pip install -e . # Or with uv uv sync
Issue: API key errors
- Cause: Invalid or missing API key
- Solution:
- Set
SGAI_API_KEYenvironment variable - Or pass
api_keyparameter directly - Get API key from https://scrapegraphai.com
- Set
Issue: Type errors in Python SDK
- Cause: Using wrong model types
- Solution: Check
scrapegraph_py/models/for correct Pydantic models
Issue: Tests failing
- Cause: Missing test environment variables
- Solution: Set
SGAI_API_KEYfor integration tests or use mocked tests
- Read relevant documentation - Understand the SDK structure
- Check existing issues - Avoid duplicate work
- Run tests - Ensure current state is green
- Create a branch - Use descriptive branch names (e.g.,
feat/add-pagination-support)
- Make changes - Write clean, documented code
- Add tests - Cover new functionality
- Run code quality checks - Format, lint, type check
- Run tests - Ensure all tests pass
- Update documentation - Update README and examples
- Commit with semantic commit messages -
feat:,fix:,docs:, etc. - Create pull request - Describe changes thoroughly
Python SDK:
- Black - Code formatting (line length: 88)
- isort - Import sorting (Black profile)
- Ruff - Fast linting
- mypy - Type checking (strict mode)
- Type hints - Use Pydantic models and type annotations
- Docstrings - Document public functions and classes
Follow Conventional Commits:
feat: add pagination support for smartscraper
fix: handle timeout errors gracefully
docs: update README with new examples
test: add unit tests for crawler endpoint
chore: update dependencies
This enables automated semantic versioning and changelog generation.
Update .agent/README.md when:
- Adding new SDK features
- Changing development workflows
- Updating testing procedures
Update README.md (root) when:
- Adding new endpoints
- Changing installation instructions
- Adding new features or use cases
Update SDK-specific READMEs when:
- Adding new endpoint methods
- Changing API surface
- Adding examples
- Keep examples working - Test code examples regularly
- Be specific - Include version numbers, function names
- Include error handling - Show try/catch patterns
- Cross-reference - Link between related sections
- Keep changelogs - Document all changes in CHANGELOG.md
The SDK uses semantic-release for automated versioning and publishing:
- Make changes - Develop and test new features
- Commit with semantic messages -
feat:,fix:, etc. - Merge to main - Pull request approved and merged
- Automated release - GitHub Actions:
- Determines version bump (major/minor/patch)
- Updates version in
pyproject.toml - Generates CHANGELOG.md
- Creates GitHub release
- Publishes to PyPI
feat:→ Minor version bump (0.x.0)fix:→ Patch version bump (0.0.x)BREAKING CHANGE:→ Major version bump (x.0.0)
- Main README - Project overview
- Python SDK README - Python guide
- Cookbook - Usage examples
- API Documentation - Full API docs
For questions or issues:
- Check this documentation first
- Review SDK-specific README
- Search existing GitHub issues
- Create a new issue with:
- SDK version
- Error message
- Minimal reproducible example
Happy Coding! 🚀