Skip to content

Commit 3c82ca7

Browse files
VinciGit00claude
andcommitted
feat: align v2 wire format with scrapegraph-py PR #84
Rebase base URL, env vars, and auth header onto the new scrapegraph-py v2 SDK contract (ScrapeGraphAI/scrapegraph-py#84): - Base URL: /api/v2 -> /v2 (default https://api.scrapegraphai.com/v2) - Env: SGAI_API_URL (SCRAPEGRAPH_API_BASE_URL kept as legacy alias) - Env: SGAI_TIMEOUT_S for httpx timeout (default 120s) - Drop Authorization: Bearer; keep SGAI-APIKEY only (matches SDK) - Update docstrings, resources, README, server.json, .agent docs to reference #84 and the /v2 base URL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 272ee65 commit 3c82ca7

File tree

5 files changed

+63
-30
lines changed

5 files changed

+63
-30
lines changed

.agent/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -379,7 +379,7 @@ npx @modelcontextprotocol/inspector scrapegraph-mcp
379379
## 📅 Changelog
380380

381381
### April 2026
382-
- ✅ Migrated MCP client and tools to **API v2** ([scrapegraph-py#82](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/82)): base `https://api.scrapegraphai.com/api/v2`, Bearer + SGAI-APIKEY, new crawl/monitor/credits/history tools; removed sitemap, agentic_scrapper, status polling tools.
382+
- ✅ Migrated MCP client and tools to **API v2** ([scrapegraph-py#84](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/84)): base `https://api.scrapegraphai.com/v2`, `SGAI-APIKEY` header (matches SDK wire format), new crawl/monitor/credits/history tools; removed sitemap, agentic_scrapper, status polling tools. Env vars aligned with SDK: `SGAI_API_URL`, `SGAI_TIMEOUT_S`.
383383

384384
### January 2026
385385
- ✅ Added `time_range` parameter to SearchScraper for filtering results by recency (v1-era; **ignored on API v2**)

.agent/system/project_architecture.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,11 @@
2020
The ScrapeGraph MCP Server is a production-ready [Model Context Protocol](https://modelcontextprotocol.io/introduction) (MCP) server that provides seamless integration between AI assistants (like Claude, Cursor, etc.) and the [ScrapeGraphAI API](https://scrapegraphai.com). This server enables language models to leverage advanced AI-powered web scraping capabilities with enterprise-grade reliability.
2121

2222
**Key Capabilities (API v2):**
23-
- **Scrape** (`markdownify`, `scrape`) — POST `/api/v2/scrape`
24-
- **Extract** (`smartscraper`) — POST `/api/v2/extract` (URL-only)
25-
- **Search** (`searchscraper`) — POST `/api/v2/search`
26-
- **Crawl** — POST/GET `/api/v2/crawl` (+ stop/resume); markdown/html crawl only
27-
- **Monitor, credits, history**`/api/v2/monitor`, `/credits`, `/history`
23+
- **Scrape** (`markdownify`, `scrape`) — POST `/v2/scrape`
24+
- **Extract** (`smartscraper`) — POST `/v2/extract` (URL-only)
25+
- **Search** (`searchscraper`) — POST `/v2/search`
26+
- **Crawl** — POST/GET `/v2/crawl` (+ stop/resume); markdown/html crawl only
27+
- **Monitor, credits, history**`/v2/monitor`, `/credits`, `/history`
2828

2929
**Purpose:**
3030
- Bridge AI assistants (Claude, Cursor, etc.) with web scraping capabilities
@@ -130,7 +130,7 @@ AI Assistant (Claude/Cursor)
130130
↓ (stdio via MCP)
131131
FastMCP Server (this project)
132132
↓ (HTTPS API calls)
133-
ScrapeGraphAI API (default https://api.scrapegraphai.com/api/v2)
133+
ScrapeGraphAI API (default https://api.scrapegraphai.com/v2)
134134
↓ (web scraping)
135135
Target Websites
136136
```
@@ -140,9 +140,9 @@ Target Websites
140140
The server follows a simple, single-file architecture:
141141

142142
**`ScapeGraphClient` Class:**
143-
- HTTP client wrapper for ScrapeGraphAI API v2 ([scrapegraph-py#82](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/82))
144-
- Base URL: `https://api.scrapegraphai.com/api/v2` (override with env `SCRAPEGRAPH_API_BASE_URL`)
145-
- Auth: `Authorization: Bearer`, `SGAI-APIKEY`, `X-SDK-Version: scrapegraph-mcp@2.0.0`
143+
- HTTP client wrapper for ScrapeGraphAI API v2 ([scrapegraph-py#84](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/84))
144+
- Base URL: `https://api.scrapegraphai.com/v2` (override with env `SGAI_API_URL`)
145+
- Auth: `SGAI-APIKEY`, `X-SDK-Version: scrapegraph-mcp@2.0.0` (matches scrapegraph-py v2)
146146
- v2 methods include `scrape_v2`, `extract`, `search_api`, `crawl_*`, `monitor_*`, `credits`, `history`, plus compatibility wrappers used by MCP tools
147147

148148
**FastMCP Server:**
@@ -391,10 +391,10 @@ If status is "completed":
391391

392392
### ScrapeGraphAI API
393393

394-
**Base URL:** `https://api.scrapegraphai.com/api/v2` (configurable via `SCRAPEGRAPH_API_BASE_URL`)
394+
**Base URL:** `https://api.scrapegraphai.com/v2` (configurable via `SGAI_API_URL`)
395395

396396
**Authentication:**
397-
- Headers: `Authorization: Bearer <key>`, `SGAI-APIKEY: <key>`
397+
- Headers: `SGAI-APIKEY: <key>` (matches scrapegraph-py v2 wire format)
398398
- Obtain API key from: [ScrapeGraph Dashboard](https://dashboard.scrapegraphai.com)
399399

400400
**Endpoints used (v2):**

README.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,15 @@ A production-ready [Model Context Protocol](https://modelcontextprotocol.io/intr
2828

2929
## API v2
3030

31-
This MCP server targets **ScrapeGraph API v2** (`https://api.scrapegraphai.com/api/v2`), aligned with
32-
[scrapegraph-py PR #82](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/82). Auth sends both
33-
`Authorization: Bearer` and `SGAI-APIKEY`. Override the base URL with **`SCRAPEGRAPH_API_BASE_URL`** if needed.
31+
This MCP server targets **ScrapeGraph API v2** (`https://api.scrapegraphai.com/v2`), aligned 1:1 with
32+
[scrapegraph-py PR #84](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/84). Auth uses the
33+
`SGAI-APIKEY` header. Environment variables mirror the Python SDK:
34+
35+
- **`SGAI_API_URL`** — override the base URL (default `https://api.scrapegraphai.com/v2`)
36+
- **`SGAI_TIMEOUT_S`** — request timeout in seconds (default `120`)
37+
- **`SGAI_API_KEY`** — API key (can also be passed via MCP `scrapegraphApiKey` or `X-API-Key` header)
38+
39+
> `SCRAPEGRAPH_API_BASE_URL` is still honored as a legacy alias for `SGAI_API_URL`.
3440
3541
## Key Features
3642

server.json

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,18 @@
2424
"name": "SGAI_API_KEY"
2525
},
2626
{
27-
"description": "Override API base URL (default https://api.scrapegraphai.com/api/v2)",
27+
"description": "Override API base URL (default https://api.scrapegraphai.com/v2)",
2828
"isRequired": false,
2929
"format": "string",
3030
"isSecret": false,
31-
"name": "SCRAPEGRAPH_API_BASE_URL"
31+
"name": "SGAI_API_URL"
32+
},
33+
{
34+
"description": "Request timeout in seconds (default 120)",
35+
"isRequired": false,
36+
"format": "string",
37+
"isSecret": false,
38+
"name": "SGAI_TIMEOUT_S"
3239
}
3340
]
3441
}

src/scrapegraph_mcp/server.py

Lines changed: 33 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"""
33
MCP server for ScapeGraph API integration (API v2).
44
5-
Aligned with scrapegraph-py v2 ([ScrapeGraphAI/scrapegraph-py#82](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/82)):
5+
Aligned with scrapegraph-py v2 ([ScrapeGraphAI/scrapegraph-py#84](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/84)):
66
- markdownify: Page content via POST /scrape (markdown by default)
77
- smartscraper: Structured extraction via POST /extract (url + prompt; schema optional)
88
- searchscraper: Web search via POST /search (supports numResults, schema, prompt,
@@ -22,7 +22,11 @@
2222
includePatterns, excludePatterns, contentTypes, webhookUrl, contentType).
2323
2424
Removed on v2 (no API equivalent): sitemap, agentic_scrapper, markdownify_status, smartscraper_status.
25-
Optional base URL override: SCRAPEGRAPH_API_BASE_URL (default https://api.scrapegraphai.com/api/v2).
25+
26+
Environment variables (match scrapegraph-py v2):
27+
- SGAI_API_URL (default https://api.scrapegraphai.com/v2) — base URL override
28+
- SGAI_TIMEOUT_S (default 120) — request timeout in seconds
29+
- SCRAPEGRAPH_API_BASE_URL — legacy alias for SGAI_API_URL (still honored)
2630
2731
## Parameter Validation and Error Handling
2832
@@ -83,11 +87,26 @@
8387
logger = logging.getLogger(__name__)
8488

8589
MCP_SERVER_VERSION = "2.0.0"
86-
DEFAULT_API_BASE_URL = "https://api.scrapegraphai.com/api/v2"
90+
# Matches scrapegraph-py v2 (env.py): https://api.scrapegraphai.com/v2
91+
DEFAULT_API_BASE_URL = "https://api.scrapegraphai.com/v2"
8792

8893

8994
def _api_base_url() -> str:
90-
return os.environ.get("SCRAPEGRAPH_API_BASE_URL", DEFAULT_API_BASE_URL).rstrip("/")
95+
# SGAI_API_URL mirrors scrapegraph-py v2; SCRAPEGRAPH_API_BASE_URL is a legacy alias.
96+
return (
97+
os.environ.get("SGAI_API_URL")
98+
or os.environ.get("SCRAPEGRAPH_API_BASE_URL")
99+
or DEFAULT_API_BASE_URL
100+
).rstrip("/")
101+
102+
103+
def _api_timeout_s() -> float:
104+
# SGAI_TIMEOUT_S mirrors scrapegraph-py v2 (default 120s).
105+
val = os.environ.get("SGAI_TIMEOUT_S")
106+
try:
107+
return float(val) if val else 120.0
108+
except ValueError:
109+
return 120.0
91110

92111

93112
DEFAULT_SCREENSHOT_FORMAT: Dict[str, Any] = {
@@ -144,19 +163,20 @@ def _build_json_format_entry(
144163

145164

146165
class ScapeGraphClient:
147-
"""HTTP client for ScrapeGraphAI API v2 (see scrapegraph-py PR #82)."""
166+
"""HTTP client for ScrapeGraphAI API v2 (see scrapegraph-py PR #84)."""
148167

149168
def __init__(self, api_key: str, base_url: Optional[str] = None) -> None:
150169
self.api_key = api_key
151170
self.base_url = (base_url or _api_base_url()).rstrip("/")
171+
# Match scrapegraph-py v2 wire format: single SGAI-APIKEY header. We keep
172+
# Content-Type/accept for broker compatibility and X-SDK-Version for telemetry.
152173
self.headers = {
153-
"Authorization": f"Bearer {api_key}",
154174
"SGAI-APIKEY": api_key,
155175
"Content-Type": "application/json",
156176
"accept": "application/json",
157177
"X-SDK-Version": f"scrapegraph-mcp@{MCP_SERVER_VERSION}",
158178
}
159-
self.client = httpx.Client(timeout=httpx.Timeout(120.0))
179+
self.client = httpx.Client(timeout=httpx.Timeout(_api_timeout_s()))
160180

161181
def _parse_response(self, response: httpx.Response) -> Dict[str, Any]:
162182
if response.status_code >= 400:
@@ -608,7 +628,7 @@ def web_scraping_guide() -> str:
608628
"""
609629
return """# ScapeGraph Web Scraping Guide (API v2)
610630
611-
See [scrapegraph-py#82](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/82) for the upstream SDK migration.
631+
See [scrapegraph-py#84](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/84) for the upstream SDK migration.
612632
613633
## Core tools
614634
- **markdownify** — `POST /scrape` (markdown output)
@@ -627,7 +647,7 @@ def web_scraping_guide() -> str:
627647
1. Use **markdownify** or **scrape** before **smartscraper** when you only need readable text.
628648
2. Multi-page **AI** extraction: run **smartscraper** per URL, or use **monitor_create** on a schedule.
629649
3. Poll **smartcrawler_fetch_results** until the crawl finishes.
630-
4. Override API host with env **SCRAPEGRAPH_API_BASE_URL** if needed (default production v2 base URL).
650+
4. Override API host with env **SGAI_API_URL** if needed (default `https://api.scrapegraphai.com/v2`).
631651
"""
632652

633653

@@ -677,7 +697,7 @@ def quick_start_examples() -> str:
677697
limit: 10
678698
```
679699
680-
Auth: `SGAI_API_KEY` or MCP `scrapegraphApiKey`. Optional: `SCRAPEGRAPH_API_BASE_URL`.
700+
Auth: `SGAI_API_KEY` or MCP `scrapegraphApiKey`. Optional: `SGAI_API_URL`, `SGAI_TIMEOUT_S`.
681701
"""
682702

683703

@@ -691,9 +711,9 @@ def api_status() -> str:
691711
"""
692712
return """# ScapeGraph API Status (MCP v2)
693713
694-
- **MCP package version**: 2.0.0 (matches [scrapegraph-py#82](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/82) API surface)
695-
- **Default API base**: `https://api.scrapegraphai.com/api/v2` (override with `SCRAPEGRAPH_API_BASE_URL`)
696-
- **Auth headers**: `Authorization: Bearer`, `SGAI-APIKEY`, `X-SDK-Version: scrapegraph-mcp@2.0.0`
714+
- **MCP package version**: 2.0.0 (matches [scrapegraph-py#84](https://github.com/ScrapeGraphAI/scrapegraph-py/pull/84) API surface)
715+
- **Default API base**: `https://api.scrapegraphai.com/v2` (override with `SGAI_API_URL`)
716+
- **Auth headers**: `SGAI-APIKEY`, `X-SDK-Version: scrapegraph-mcp@2.0.0`
697717
698718
## Tools
699719
markdownify, scrape, smartscraper, searchscraper, smartcrawler_initiate, smartcrawler_fetch_results, crawl_stop, crawl_resume, generate_schema, credits, sgai_history, monitor_create, monitor_list, monitor_get, monitor_pause, monitor_resume, monitor_delete

0 commit comments

Comments
 (0)