Source: shopify-review-scraper README (2026-05-14)

Repo: github.com/mikefutia/shopify-review-scraper · Stars: 2 · Language: JavaScript (71.6%) + CSS + HTML · License: MIT

Local Node + Playwright web app that takes a Shopify product URL and returns up to 250 customer reviews as CSV (to the user’s Downloads folder) or JSON (via REST API). Provider-aware for Okendo, Junip, and Judge.me; generic fallbacks for JSON-LD review schema, rendered review markup, and review-shaped network payloads. No API keys, no signup — runs on localhost:3000. Shipped by Mike Futia from the “Scale AI” Skool community as a free public tool. Small repo (9 commits, 2 stars at ingest), MIT-licensed, narrow purpose. Fits the agentic data-infrastructure shelf alongside ScrapeCreators and TinyFish — but at the opposite end of the scale axis: single-purpose self-hosted CLI app instead of paid multi-platform API.

Key Takeaways

  • Single-purpose, self-hosted, free. One job: pull customer reviews off a Shopify product page. Runs entirely local — clone, npm install, npx playwright install chromium, npm start, hit http://localhost:3000. No API keys, no signup, no rate limits other than what the target storefront enforces.
  • Four extraction paths in priority order. (1) Provider-specific adapters for Okendo, Junip, and Judge.me (the three review apps with the largest Shopify install base). (2) JSON-LD review schema extraction (works on stores using schema.org review markup for SEO). (3) Generic DOM scrape of visible review markup. (4) Network-response analysis for review-shaped JSON payloads emitted by other widgets. The provider-aware path is the cleanest; the generic paths cover the long tail.
  • Playwright is the load-bearing dependency. Most Shopify review widgets are client-rendered — review content is injected into the DOM after page load by a third-party JS bundle, so a curl-and-parse approach misses everything. The README states the Chromium browser is needed specifically “so it can read client-rendered review widgets.” This is the same architectural reason TinyFish runs its custom Chromium fleet — review widgets and many modern Shopify storefronts both 403 or empty-response on naive HTTP.
  • Capped at 250 reviews per request. maxReviews parameter defaults to 25, hard cap at 250. Stated rationale: “Some providers only expose reviews through paginated widgets” so larger pulls compound latency, and 250 keeps local runs predictable. For competitor-intel runs across many products, the cap is a feature — keeps you from accidentally hammering a third-party storefront from one IP.
  • Two interfaces: browser UI + REST API. The browser at localhost:3000 is the demo surface. The POST /api/scrape endpoint takes {url, maxReviews} and returns structured JSON with per-review {source, rating, title, body, author, date} plus run metadata (scrapedAt, durationMs, count, per-source breakdown). Pipe-friendly — fits naturally into Claude Code subagent loops, n8n nodes, or cron-driven backups.
  • CSV writes to the local Downloads folder, not browser-managed downloads. Deliberate choice: “CSV export is handled by the local server so it works even when browser downloads are blocked.” Useful in restricted-browser environments (managed Chrome profiles, kiosk mode) but also avoids the pop-up dialog churn for automated runs.
  • “Shopify does not provide reviews through a standard product-page API.” This is the load-bearing premise — and it’s correct as of 2026-05-14. Native Shopify Reviews app was deprecated; the platform’s first-party review surface is now scattered across third-party apps (Okendo, Junip, Judge.me, Yotpo, Loox, Stamped, etc.). There is no /products/{id}/reviews.json equivalent. Any review-intel workflow has to scrape, and a per-provider adapter pattern (what this repo implements) is the structural answer.
  • Responsible-use guidance is on the README, not in the code. Author explicitly notes: research / internal analysis / your-own-stores use case, respect terms / rate limits / robots.txt / applicable laws. No --max-rate flag, no built-in delay between requests. The discipline is operator-side. Same posture as most open scraping tools (see ScrapeCreators’s public-platform compliance posture).
  • Small repo, narrow target audience. 2 stars / 2 forks / 9 commits / 0 issues at ingest. Mike Futia ships this through the Scale AI Skool community (skool.com/scale-ai/about) — a paid AI-business community. The tool is more useful as a template for the “local Playwright app exposing a REST API for one scraping task” pattern than as a high-traffic public project — and that template is replicable for any platform where the data is in a client-rendered widget.

Where this fits in the wiki

  • agents-agentic-systems/ is the right home — joins the data-infrastructure family: ScrapeCreators (paid, 20+ social platforms), TinyFish (paid, web Search/Fetch/Browser), Autobrowse (Browserbase harness). This is the free, self-hosted, single-platform-deep corner of the same category. Use it when you want a specific data slice from one platform without paying a vendor or learning a multi-API SDK.
  • Direct overlap with Apify (Cowork recipe) — Apify has Shopify-review actors too, paid per-actor-minute. mikefutia’s repo is the OSS alternative: zero marginal cost, you run the browser yourself, you own the data flow. Trade-off: no managed proxy rotation, no built-in retry/queue, no actor marketplace ecosystem.
  • Adjacent to Clawdbot’s competitive-intelligence surface. Clawdbot pulls structured business signals across 8 channels (GBP, reviews on review platforms, content, ads). For DSO and dental practice intel, customer-review sentiment from competitor practice storefronts has obvious overlap — but Clawdbot’s review surface is GBP-side, not Shopify-side. This repo would extend that intel into the e-commerce / product-review channel if a competitor sells products through a Shopify storefront.
  • Pairs with Meta Ads CLI for product-launch analysis. Meta Ads CLI surfaces what creative a competitor is running; this repo surfaces what their customers are saying about the underlying products. Pair them to triangulate which competitor SKUs are working (positive review velocity + active ad spend).
  • Pattern peer: vendor-direct tool calls. Another data point for the open cross-topic connection candidate on vendor-direct/single-platform-deep tools that bypass middleware. The shape here is self-hosted-direct instead of vendor-API-direct, but the operator economics are similar: one job, one tool, no abstraction layer.

Implementation

  • Tool/Service: Shopify Review Scraper by Mike Futia (Scale AI Skool community release).
  • Setup:
    git clone https://github.com/mikefutia/shopify-review-scraper.git
    cd shopify-review-scraper
    npm install
    npx playwright install chromium
    npm start    # serves http://localhost:3000
    Requires Node.js 20+ and npm. First playwright install pulls a Chromium binary (~150MB).
  • Cost: Free. MIT. Self-hosted — only cost is the laptop or VM running it, plus the storefronts you target if they meter or block.
  • Integration notes:
    • REST endpoint: POST /api/scrape with body {url, maxReviews}. Response shape includes {url, scrapedAt, durationMs, maxReviews, count, sources, reviews[]}.
    • Per-review fields: {source, rating, title, body, author, date}.
    • maxReviews caps at 250; defaults to 25.
    • Provider-aware adapters: Okendo, Junip, Judge.me. Other providers fall through to JSON-LD → DOM → network-response extractors in that order.
    • CSV output lands in the local Downloads folder, not via browser download dialog.
    • Run inside Claude Code as a long-lived background process and POST against it from a subagent for batch product audits.
    • No API key required — but operator-side rate limiting is your responsibility. Storefronts will 429 or shadow-block on aggressive concurrent runs.
  • Example call (from README):
    curl -X POST http://localhost:3000/api/scrape \
      -H "Content-Type: application/json" \
      -d '{"url":"https://example-store.com/products/example-product","maxReviews":25}'

Open Questions

  • Provider coverage beyond the named three. Yotpo, Loox, Stamped, Reviews.io, Shopify Product Reviews (legacy) — would the generic JSON-LD / DOM / network-response fallbacks catch these reliably, or do they need provider-specific adapters too? The README says “may work partially” through generic extractors. For competitive intel on a large competitor catalog, the failure mode matters: a silent zero-reviews response from a generic-fallback path looks identical to “the product has no reviews.”
  • Pagination depth. Each provider exposes reviews differently — some embed the first N in the page payload, others lazy-load via XHR after scroll. The 250 cap implies the tool can paginate, but the README doesn’t document how deep it can go on each provider or whether some providers cap at, e.g., 50 visible reviews regardless of the parameter.
  • Anti-bot posture. Playwright is detectable. Major Shopify-themes already ship with bot-protection on review widgets (Okendo Stealth, Judge.me has a Cloudflare-fronted endpoint for some plans). Does this repo do any anti-detection hardening — stealth plugin, fingerprint randomization, residential-proxy support — or is it bare Playwright? Bare Playwright is fine for own-store and friendly-domain use, fragile for adversarial competitor intel. ^[inferred]
  • Rate-limiting hooks. No --delay or --rate flag visible in the README. For batch competitor sweeps, operator would need to wrap with sleep or external orchestration to avoid hammering one storefront. A future maintenance task for a fork.
  • Maintenance posture. 9 commits, 2 stars, 0 issues, 0 PRs is a tiny project. The data-extraction surface decays fast — Okendo / Junip / Judge.me ship widget updates often; provider-specific adapters need ongoing maintenance. Without active maintenance, the provider-aware path will silently fall through to generic extractors as DOM/network shapes drift.

Try It

  1. Clone and run locally on your own Shopify storefront. git clonenpm installnpm start → paste your own product URL on localhost:3000. Verify the review count and source breakdown match what you see on the live page. This validates extraction quality against a known-good baseline before pointing it at competitors.
  2. Map your competitor SKUs to a CSV batch. Build a list of competitor Shopify product URLs (manually or via ScrapeCreators’s Shopify endpoints), then iterate the /api/scrape endpoint from a Claude Code subagent. Persist {url, count, sources, top_review_rating, top_review_body} to a Sheet for ad-creative-and-product-positioning triangulation.
  3. Pair with Meta Ads CLI for product-launch analysis. For any competitor product running active Meta ads, pull its review velocity from this tool. Reviews-per-week × average rating gives you the “is this SKU working?” signal that pairs naturally with the “is this creative working?” signal from Meta Ads CLI.
  4. Hold it behind a Hermes tool. Register POST /api/scrape as a Hermes custom tool so any agent run can pull product reviews on demand. Hermes already wraps long-lived local services this way — same pattern as ElevenLabs voice agents and Meta Ads CLI.
  5. Fork it for non-Shopify platforms. The architecture (local Express + Playwright + provider-aware extractors + REST API + CSV-to-Downloads) is generic. The same template works for BigCommerce reviews, Etsy listings, Amazon product Q&A, or any client-rendered widget on any platform. Use this repo as the scaffold, not the destination.
  6. Cross-check against the wiki’s strict-bar rule. Compliance and ToS posture varies by storefront — some Shopify storefronts explicitly forbid scraping, others tolerate research-grade traffic. Default to your own stores and friendly partners; treat competitor scraping the same way you would Reddit data — research use, no aggressive batch runs.
  • ScrapeCreators — paid social/web scraping API; broader platform coverage, fewer per-platform depth
  • TinyFish — paid web infrastructure (Search/Fetch/Browser/Agent), same architectural problem
  • Autobrowse — Browserbase’s self-improving browser-agent harness; relevant for adapting this scaffold to harder targets
  • Apify (Cowork recipe) — paid actor marketplace alternative for Shopify review scraping
  • Meta Ads CLI — pairs with this tool for product+ad triangulation
  • Clawdbot — competitive intelligence across 8 channels, GBP review counterpart
  • Hermes Agent — host this as a registered tool for agent-callable review pulls
  • AI Marketing
  • Agents & Agentic Systems