Rottentomatoes Reviews Scraper collects structured critic and audience review data from Rotten Tomatoes pages, so you can analyze opinions at scale without manual copy-paste. It’s built for research workflows where consistent fields (scores, freshness, verification, spoilers, and more) matter. Use this Rotten Tomatoes reviews scraper to power dashboards, datasets, and media insights.
Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for rottentomatoes-reviews-scraper you've just found your team — Let’s Chat. 👆👆
This project extracts user and critic reviews from Rotten Tomatoes movie and TV pages and returns normalized, analysis-ready records. It solves the common problem of having review data scattered across paginated pages with inconsistent presentation. It’s for developers, analysts, and researchers who need repeatable review collection for sentiment, scoring trends, and audience-vs-critic comparisons.
- Supports both critic and audience/verified audience review pages via URL inputs
- Captures structured review metadata (scores, freshness, verification, spoiler/profanity flags)
- Handles large pagination ranges with resilient retries and consistent output schema
- Produces clean datasets suitable for BI tools, ML pipelines, and monitoring jobs
- Works across multiple titles by accepting a list of review page URLs
| Feature | Description |
|---|---|
| Multi-URL scraping | Process multiple Rotten Tomatoes review pages in a single run for batch collection. |
| Audience + critic coverage | Extract reviews for audience, verified audience, and critic sections based on the provided URLs. |
| Rich review metadata | Collects rating/score, freshness flags, verification, spoiler/profanity indicators, and IDs. |
| Stable pagination handling | Automatically traverses review pages to gather complete review sets reliably. |
| Retry & resilience | Built-in retries and defensive parsing to reduce failures from transient network or layout issues. |
| Clean, analysis-ready output | Produces consistent JSON records ideal for sentiment analysis and trend reporting. |
| Proxy-ready networking | Optional proxy support for safer large-scale runs and reduced rate-limit risk. |
| Field Name | Field Description |
|---|---|
| rating | Normalized rating value (often numeric, may include halves depending on page type). |
| score | Computed score value when available (can match rating for audience entries). |
| originalScore | Raw score as shown on the page (string form for exact fidelity). |
| quote | The full review text/quote provided by the reviewer. |
| reviewId | Unique identifier for the review entry. |
| reviewUrl | Direct URL to the review when available. |
| creationDate | Display date for when the review was created/published. |
| isVerified | Indicates whether the review is verified (e.g., verified audience). |
| isSuperReviewer | Indicates elevated reviewer status where supported. |
| isTopCritic | Indicates top critic status for critic reviews where supported. |
| isFresh | Indicates “fresh” classification when provided by the page. |
| isRotten | Indicates “rotten” classification when provided by the page. |
| hasSpoilers | True if the review is flagged as containing spoilers. |
| hasProfanity | True if the review is flagged as containing profanity. |
| userDisplayName | Display name of the reviewer (audience). |
| userId | Reviewer identifier when present. |
| userRealm | Source realm/provider label when present (e.g., ticketing/account provider). |
| name | Reviewer name field (normalized convenience alias when present). |
| publicationName | Publication/outlet name for critic reviews when present. |
| criticName | Critic identity field when present. |
| avatarImageUrl | Reviewer/critic profile image URL when present. |
[
{
"rating": 5,
"quote": "Fun, entertaining, and suspenseful! Worth every penny! I think this movie can be enjoyed by anyone not just motor sport enthusiasts. Go watch it - youll be glad you did!",
"reviewId": "b595b8b1-32ad-49da-9856-65402a765869",
"isVerified": true,
"isSuperReviewer": false,
"hasSpoilers": false,
"hasProfanity": false,
"score": 5,
"creationDate": "Jun 29, 2025",
"userDisplayName": "Donnie",
"userRealm": "Fandango",
"userId": "2A48F1D8-971E-4636-BB2F-367192ED6B1C",
"originalScore": "5",
"name": "Donnie"
},
{
"rating": 2.5,
"quote": "First half of the movie is very slow. Pacing is off. Movie could be about an hour shorter overall. Great cinematography and driving sequences but the story is stale and predictable with lots of contrived elements.",
"reviewId": "4a85e68d-fd03-4821-9c22-b604b6102646",
"isVerified": true,
"isSuperReviewer": false,
"hasSpoilers": false,
"hasProfanity": false,
"score": 2.5,
"creationDate": "Jun 29, 2025",
"userDisplayName": "Josh",
"userRealm": "Fandango",
"userId": "847768b1-452e-4d3a-aae3-95d308383088",
"originalScore": "2.5",
"name": "Josh"
}
]
rigelbytes/rottentomatoes-reviews-scraper (IMPORTANT :!! always keep this name as the name of the apify actor !!! Rottentomatoes Reviews Scraper )/
├── src/
│ ├── main.js
│ ├── router/
│ │ ├── routes.js
│ │ └── validators.js
│ ├── crawlers/
│ │ ├── reviewsCrawler.js
│ │ └── pagination.js
│ ├── extractors/
│ │ ├── parseReview.js
│ │ ├── parseCriticReview.js
│ │ ├── parseAudienceReview.js
│ │ └── normalizeFields.js
│ ├── utils/
│ │ ├── http.js
│ │ ├── retry.js
│ │ ├── logger.js
│ │ └── url.js
│ └── config/
│ ├── defaults.json
│ └── schema.json
├── data/
│ ├── input.example.json
│ └── output.sample.json
├── tests/
│ ├── parseReview.test.js
│ └── normalizeFields.test.js
├── .env.example
├── .gitignore
├── package.json
├── package-lock.json
└── README.md
- Media analysts use it to track audience vs critic sentiment over time, so they can spot polarization and momentum early.
- Data teams use it to build movie review datasets for NLP, so they can train sentiment and topic models with consistent fields.
- Marketing teams use it to monitor release reception across titles, so they can adjust messaging based on real review signals.
- Researchers use it to study spoiler/profanity prevalence and review behavior, so they can quantify qualitative patterns at scale.
- Product teams use it to compare freshness and scoring distributions, so they can create ranking and recommendation experiments.
How do I choose the right URLs to scrape? Use full review-page URLs for the titles you want, including any query parameters that select the review type (for example, verified audience). The scraper follows pagination from those starting points and collects all accessible reviews for each URL.
Does it scrape both critic and audience reviews automatically? It scrapes what your input URLs point to. Provide critic review URLs for critic data and audience/verified audience URLs for audience data. You can include multiple URLs in one run to mix both.
What should I do if I hit rate limits or partial loads? Enable proxy usage and reduce concurrency if your environment supports it. This scraper is designed with retries, but proxies and moderate request pacing improve stability on large runs.
Why do some fields appear missing in certain records? Some properties are page-type specific (e.g., top critic, publication name). When the source page doesn’t expose a field for a review type, the scraper outputs only what is available while keeping the schema consistent.
Primary Metric: Typical throughput of 250–600 reviews/minute depending on page complexity, pagination depth, and network conditions.
Reliability Metric: 97–99% successful page fetch rate on stable networks with retries enabled; higher stability when using proxies for larger batches.
Efficiency Metric: Low-memory streaming extraction that writes records incrementally, keeping runs stable even on long paginations.
Quality Metric: High field completeness for audience reviews (IDs, quotes, scores, flags) with consistent normalization across records; critic-specific fields populate when present on critic pages.
