An automated scraper that pulls job listings and company data from YCombinator's Workatastartup platform. It bypasses login bottlenecks by utilizing authenticated sessions and ensures no duplicates are recorded by saving everything directly to a local SQLite database (jobs.db).
- Deduplication: Utilizes
better-sqlite3to store state, ensuring you never scrape the same job twice. - Robust Extraction: Identifies hidden JSON payloads on YC pages to grab accurate backend
job_idvalues. - Filtered Exports: Includes an export script (
export_radar_candidates.js) that queries the SQLite database for intent-based hiring (e.g., GTM, DevRel, Growth, Content) and outputs it as a JSON payload for secondary research tools.
-
Clone the repository.
-
Navigate to the
scripts/directory:cd scripts npm install npx playwright install -
Authenticate (First Time Only): Run the following script and log in to YC via the browser that opens. This creates a
state.jsonfile.node auth.js
-
Run the Scraper:
node scraper.js
-
Export Targeted Jobs:
node export_radar_candidates.js
This will query the DB and produce
radar_candidates.jsoncontaining the targeted companies and matching roles.
The .gitignore strictly protects your state.json (authentication cookies) and jobs.db (local history). Do not commit these files to a public repository.