SEO Patterns Learned

Source: wiki synthesis: gsc-autonomous-seo, seo-audit-skill, clawdbot-competitive-intel, blog-agent-worker

Accumulated SEO and content patterns extracted from building four interconnected projects at WEO Marketly. These are not theoretical best practices — they are lessons learned from production systems processing real dental marketing data. The patterns compound: each project’s lessons inform the others, creating a feedback loop of operational knowledge.

From GSC Autonomous SEO

  • Per-query tracking beats aggregate metrics — Hunter’s #1 requirement. A page ranking #3 for one query and #30 for another looks fine at the page level but has massive hidden opportunity. Per-query scoring reveals this
  • Surgical enhancement limits — Never rewrite more than 40% of a page in a single pass. Beyond that threshold, Google’s content-change detection may treat it as a new page, resetting authority signals
  • Cooldowns tied to confirmed crawl — 7 days for schema, 14 days for meta/title, 60 days for content body. The timer starts AFTER Google crawls, not after you publish. Without this, you stack changes Google hasn’t even seen yet
  • Natural re-crawling via sitemaps — Hunter explicitly rejected the Indexing API as too aggressive. Sitemap submission lets Google discover changes at its own pace, which builds more natural crawl patterns
  • Per-query state in PostgreSQL — Each query gets its own row with position history, enhancement history, and cooldown state. This is transactional data that needs ACID guarantees, not a content store

From Blog-Agent-Worker (Pulse)

  • Multi-agent sequential beats single-pass — Research, Write, SEO, Edit as separate agent passes consistently outperforms a single prompt asking for everything. The research phase alone dramatically improves factual accuracy by front-loading source discovery
  • Answer Capsule + FAQ required for GEO — AI Overviews (Google’s generative search) heavily cite content with a direct answer after the H1 (2-3 sentences) and FAQ sections with schema markup. Missing these means missing AI Overview citations
  • Fact density standard — ~5 statistics per 1500 words, with citations from 2024-2026 sources. Below this, content reads as opinion. Above this, it becomes an unreadable data dump
  • Content health is multi-source — Freshness + CWV + GA4 + GSC + links + geo readiness. No single metric tells the full story. Content can have great engagement (GA4) but terrible load speed (CWV), or rank well (GSC) but have decaying facts (freshness)
  • Quality gates before human checkpoints — The 117-point automated validation runs before content reaches human review. This catches mechanical issues (missing meta, broken links, schema errors) so human reviewers focus on voice and strategy
  • Model allocation saves 60%+ on API costs — Opus for creative/orchestration, Sonnet for analytical/technical, Haiku for high-volume derivatives. Using Opus for everything is wasteful; using Haiku for everything is low quality

From Clawdbot / Reports

  • Dual rendering pipelines must stay in sync — PDF (HTML + Playwright) and DOCX (docx library) are completely separate code paths. Every new data source requires updating both. This is the most common source of report bugs
  • Own-company tracking separately from competitors — WEO Marketly (covering both legacy WEO Media and Marketly Digital domains) is flagged isOwn: true in all modules. Without this separation, your own activity skews competitor rankings
  • Apify APIs are flaky — Every scraper integration needs a “data unavailable” fallback that renders gracefully in reports. Silent omission causes confusion (“Did we lose LinkedIn data or do they not have LinkedIn?“)
  • 7-channel competitive position model — Content 19%, Social 19%, SEO 16%, YouTube 15%, Blog 11%, Ads 11%, GBP 9%. Social gets a breadth bonus (up to 1.15x for 4 platforms). These weights reflect dental marketing specifically — different industries would weight differently
  • Google Docs delivery by default — Eliminates the download-upload friction. The --local flag exists for development testing but production runs go straight to Google Docs

From SEOmator Audit

  • Self-registering rule pattern scales cleanly — Each rule calls defineRule(), gets stored in a global Map via registerRule(). No central registry file to maintain. Adding rule #252 is identical to adding rule #1
  • Category weights must sum to exactly 100 — Validated programmatically, not just documented. This catches bugs where adding a new category changes the scoring model without anyone noticing
  • Structured outputs for simple rules, regex fallbacks for dynamic objects — Anthropic’s structured output API requires additionalProperties: false on ALL nested objects. Dynamic objects like FAQ or schema.org data are incompatible. Regex JSON parsing is the pragmatic fallback
  • Dual delivery (CLI + Electron) from shared core — Same audit engine powers both. But better-sqlite3 needs native rebuild for each target. The package.json serves dual purpose (main for Electron, exports for CLI)

Cross-Project Meta-Patterns

Claude Model Selection

  • Opus — Orchestration, creative writing, complex reasoning. Worth the cost when quality directly impacts output
  • Sonnet — Analysis, SEO optimization, editing, research. Best cost/quality ratio for most tasks
  • Haiku — High-volume derivatives (social posts, email variants). Fast and cheap enough for batch operations

Batch Operations

  • Parallel sub-agents for 100+ item operations, never single-threaded
  • Each agent handles one atomic unit of work (one URL, one query, one content piece)
  • Coordinate results after all agents complete, not during

Railway Deployment

  • Edge caching requires cache-buster params when verifying deploys
  • Wait 60-90s after deploy before verifying — stale CDN responses cause false failures
  • Always rebuild frontend before pushing (CLAUDE.md deploy checklist)

Persistence Choices

  • PostgreSQL — Transactional state (query tracking, enhancement history, cooldowns). Needs ACID
  • SQLite — Content storage, generation history, audit snapshots. Embedded, no server needed
  • Filesystem — Bot memory, website baselines, place ID caches. Simple key-value patterns

Anthropic API Gotchas

  • Model IDs use aliases (claude-sonnet-4-5), not date-suffixed versions (those 404)
  • additionalProperties: false required on ALL nested objects for structured outputs, not just root
  • Schema format: { type: 'json_schema', schema: { type: 'object', ... } } — not OpenAI-style wrapper
  • output_config parameter for structured outputs in the JS SDK: requestParams.output_config = { format: schema }

Key Takeaways

  • Per-query SEO tracking, multi-agent content generation, and multi-source health scoring all share the same insight: granular beats aggregate
  • The most fragile pattern across all projects is dual rendering/delivery paths — always verify both when adding features
  • Model selection is a cost optimization lever: wrong model choice can 5x your API bill without improving quality
  • Cooldowns, quality gates, and compliance rules are not bureaucracy — they prevent the specific failure modes each system has experienced
  • PostgreSQL for state, SQLite for content, filesystem for caches is a pattern that works across all four projects

Try It

  1. Review each project’s specific patterns before starting related work — they encode hard-won lessons
  2. When adding a new data source to Clawdbot, verify it appears in BOTH rendering paths before merging
  3. When choosing a Claude model for a new agent, match the pattern: Opus for creative, Sonnet for analytical, Haiku for derivatives
  4. When building any new pipeline, start with the persistence choice: does the data need ACID (PostgreSQL), embedded storage (SQLite), or simple caching (filesystem)?
  5. Run the Anthropic API with the correct schema format from day one — migrating later is painful