Skip to content

techrunner496io/rottentomatoes-reviews-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Rottentomatoes Reviews Scraper

Rottentomatoes Reviews Scraper collects structured critic and audience review data from Rotten Tomatoes pages, so you can analyze opinions at scale without manual copy-paste. It’s built for research workflows where consistent fields (scores, freshness, verification, spoilers, and more) matter. Use this Rotten Tomatoes reviews scraper to power dashboards, datasets, and media insights.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for rottentomatoes-reviews-scraper you've just found your team — Let’s Chat. 👆👆

Introduction

This project extracts user and critic reviews from Rotten Tomatoes movie and TV pages and returns normalized, analysis-ready records. It solves the common problem of having review data scattered across paginated pages with inconsistent presentation. It’s for developers, analysts, and researchers who need repeatable review collection for sentiment, scoring trends, and audience-vs-critic comparisons.

Built for Review Intelligence

  • Supports both critic and audience/verified audience review pages via URL inputs
  • Captures structured review metadata (scores, freshness, verification, spoiler/profanity flags)
  • Handles large pagination ranges with resilient retries and consistent output schema
  • Produces clean datasets suitable for BI tools, ML pipelines, and monitoring jobs
  • Works across multiple titles by accepting a list of review page URLs

Features

Feature Description
Multi-URL scraping Process multiple Rotten Tomatoes review pages in a single run for batch collection.
Audience + critic coverage Extract reviews for audience, verified audience, and critic sections based on the provided URLs.
Rich review metadata Collects rating/score, freshness flags, verification, spoiler/profanity indicators, and IDs.
Stable pagination handling Automatically traverses review pages to gather complete review sets reliably.
Retry & resilience Built-in retries and defensive parsing to reduce failures from transient network or layout issues.
Clean, analysis-ready output Produces consistent JSON records ideal for sentiment analysis and trend reporting.
Proxy-ready networking Optional proxy support for safer large-scale runs and reduced rate-limit risk.

What Data This Scraper Extracts

Field Name Field Description
rating Normalized rating value (often numeric, may include halves depending on page type).
score Computed score value when available (can match rating for audience entries).
originalScore Raw score as shown on the page (string form for exact fidelity).
quote The full review text/quote provided by the reviewer.
reviewId Unique identifier for the review entry.
reviewUrl Direct URL to the review when available.
creationDate Display date for when the review was created/published.
isVerified Indicates whether the review is verified (e.g., verified audience).
isSuperReviewer Indicates elevated reviewer status where supported.
isTopCritic Indicates top critic status for critic reviews where supported.
isFresh Indicates “fresh” classification when provided by the page.
isRotten Indicates “rotten” classification when provided by the page.
hasSpoilers True if the review is flagged as containing spoilers.
hasProfanity True if the review is flagged as containing profanity.
userDisplayName Display name of the reviewer (audience).
userId Reviewer identifier when present.
userRealm Source realm/provider label when present (e.g., ticketing/account provider).
name Reviewer name field (normalized convenience alias when present).
publicationName Publication/outlet name for critic reviews when present.
criticName Critic identity field when present.
avatarImageUrl Reviewer/critic profile image URL when present.

Example Output

[
      {
            "rating": 5,
            "quote": "Fun, entertaining, and suspenseful! Worth every penny! I think this movie can be enjoyed by anyone not just motor sport enthusiasts. Go watch it - youll be glad you did!",
            "reviewId": "b595b8b1-32ad-49da-9856-65402a765869",
            "isVerified": true,
            "isSuperReviewer": false,
            "hasSpoilers": false,
            "hasProfanity": false,
            "score": 5,
            "creationDate": "Jun 29, 2025",
            "userDisplayName": "Donnie",
            "userRealm": "Fandango",
            "userId": "2A48F1D8-971E-4636-BB2F-367192ED6B1C",
            "originalScore": "5",
            "name": "Donnie"
      },
      {
            "rating": 2.5,
            "quote": "First half of the movie is very slow. Pacing is off. Movie could be about an hour shorter overall. Great cinematography and driving sequences but the story is stale and predictable with lots of contrived elements.",
            "reviewId": "4a85e68d-fd03-4821-9c22-b604b6102646",
            "isVerified": true,
            "isSuperReviewer": false,
            "hasSpoilers": false,
            "hasProfanity": false,
            "score": 2.5,
            "creationDate": "Jun 29, 2025",
            "userDisplayName": "Josh",
            "userRealm": "Fandango",
            "userId": "847768b1-452e-4d3a-aae3-95d308383088",
            "originalScore": "2.5",
            "name": "Josh"
      }
]

Directory Structure Tree

rigelbytes/rottentomatoes-reviews-scraper (IMPORTANT :!! always keep this name as the name of the apify actor !!! Rottentomatoes Reviews Scraper )/
├── src/
│   ├── main.js
│   ├── router/
│   │   ├── routes.js
│   │   └── validators.js
│   ├── crawlers/
│   │   ├── reviewsCrawler.js
│   │   └── pagination.js
│   ├── extractors/
│   │   ├── parseReview.js
│   │   ├── parseCriticReview.js
│   │   ├── parseAudienceReview.js
│   │   └── normalizeFields.js
│   ├── utils/
│   │   ├── http.js
│   │   ├── retry.js
│   │   ├── logger.js
│   │   └── url.js
│   └── config/
│       ├── defaults.json
│       └── schema.json
├── data/
│   ├── input.example.json
│   └── output.sample.json
├── tests/
│   ├── parseReview.test.js
│   └── normalizeFields.test.js
├── .env.example
├── .gitignore
├── package.json
├── package-lock.json
└── README.md

Use Cases

  • Media analysts use it to track audience vs critic sentiment over time, so they can spot polarization and momentum early.
  • Data teams use it to build movie review datasets for NLP, so they can train sentiment and topic models with consistent fields.
  • Marketing teams use it to monitor release reception across titles, so they can adjust messaging based on real review signals.
  • Researchers use it to study spoiler/profanity prevalence and review behavior, so they can quantify qualitative patterns at scale.
  • Product teams use it to compare freshness and scoring distributions, so they can create ranking and recommendation experiments.

FAQs

How do I choose the right URLs to scrape? Use full review-page URLs for the titles you want, including any query parameters that select the review type (for example, verified audience). The scraper follows pagination from those starting points and collects all accessible reviews for each URL.

Does it scrape both critic and audience reviews automatically? It scrapes what your input URLs point to. Provide critic review URLs for critic data and audience/verified audience URLs for audience data. You can include multiple URLs in one run to mix both.

What should I do if I hit rate limits or partial loads? Enable proxy usage and reduce concurrency if your environment supports it. This scraper is designed with retries, but proxies and moderate request pacing improve stability on large runs.

Why do some fields appear missing in certain records? Some properties are page-type specific (e.g., top critic, publication name). When the source page doesn’t expose a field for a review type, the scraper outputs only what is available while keeping the schema consistent.


Performance Benchmarks and Results

Primary Metric: Typical throughput of 250–600 reviews/minute depending on page complexity, pagination depth, and network conditions.

Reliability Metric: 97–99% successful page fetch rate on stable networks with retries enabled; higher stability when using proxies for larger batches.

Efficiency Metric: Low-memory streaming extraction that writes records incrementally, keeping runs stable even on long paginations.

Quality Metric: High field completeness for audience reviews (IDs, quotes, scores, flags) with consistent normalization across records; critic-specific fields populate when present on critic pages.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Releases

No releases published

Packages

 
 
 

Contributors