SEO Patterns Learned
Source: wiki synthesis: gsc-autonomous-seo, seo-audit-skill, clawdbot-competitive-intel, blog-agent-worker
Accumulated SEO and content patterns extracted from building four interconnected projects at WEO Marketly. These are not theoretical best practices — they are lessons learned from production systems processing real dental marketing data. The patterns compound: each project’s lessons inform the others, creating a feedback loop of operational knowledge.
From GSC Autonomous SEO
- Per-query tracking beats aggregate metrics — Hunter’s #1 requirement. A page ranking #3 for one query and #30 for another looks fine at the page level but has massive hidden opportunity. Per-query scoring reveals this
- Surgical enhancement limits — Never rewrite more than 40% of a page in a single pass. Beyond that threshold, Google’s content-change detection may treat it as a new page, resetting authority signals
- Cooldowns tied to confirmed crawl — 7 days for schema, 14 days for meta/title, 60 days for content body. The timer starts AFTER Google crawls, not after you publish. Without this, you stack changes Google hasn’t even seen yet
- Natural re-crawling via sitemaps — Hunter explicitly rejected the Indexing API as too aggressive. Sitemap submission lets Google discover changes at its own pace, which builds more natural crawl patterns
- Per-query state in PostgreSQL — Each query gets its own row with position history, enhancement history, and cooldown state. This is transactional data that needs ACID guarantees, not a content store
From Blog-Agent-Worker (Pulse)
- Multi-agent sequential beats single-pass — Research, Write, SEO, Edit as separate agent passes consistently outperforms a single prompt asking for everything. The research phase alone dramatically improves factual accuracy by front-loading source discovery
- Answer Capsule + FAQ required for GEO — AI Overviews (Google’s generative search) heavily cite content with a direct answer after the H1 (2-3 sentences) and FAQ sections with schema markup. Missing these means missing AI Overview citations
- Fact density standard — ~5 statistics per 1500 words, with citations from 2024-2026 sources. Below this, content reads as opinion. Above this, it becomes an unreadable data dump
- Content health is multi-source — Freshness + CWV + GA4 + GSC + links + geo readiness. No single metric tells the full story. Content can have great engagement (GA4) but terrible load speed (CWV), or rank well (GSC) but have decaying facts (freshness)
- Quality gates before human checkpoints — The 117-point automated validation runs before content reaches human review. This catches mechanical issues (missing meta, broken links, schema errors) so human reviewers focus on voice and strategy
- Model allocation saves 60%+ on API costs — Opus for creative/orchestration, Sonnet for analytical/technical, Haiku for high-volume derivatives. Using Opus for everything is wasteful; using Haiku for everything is low quality
From Clawdbot / Reports
- Dual rendering pipelines must stay in sync — PDF (HTML + Playwright) and DOCX (docx library) are completely separate code paths. Every new data source requires updating both. This is the most common source of report bugs
- Own-company tracking separately from competitors — WEO Marketly (covering both legacy WEO Media and Marketly Digital domains) is flagged
isOwn: truein all modules. Without this separation, your own activity skews competitor rankings - Apify APIs are flaky — Every scraper integration needs a “data unavailable” fallback that renders gracefully in reports. Silent omission causes confusion (“Did we lose LinkedIn data or do they not have LinkedIn?“)
- 7-channel competitive position model — Content 19%, Social 19%, SEO 16%, YouTube 15%, Blog 11%, Ads 11%, GBP 9%. Social gets a breadth bonus (up to 1.15x for 4 platforms). These weights reflect dental marketing specifically — different industries would weight differently
- Google Docs delivery by default — Eliminates the download-upload friction. The
--localflag exists for development testing but production runs go straight to Google Docs
From SEOmator Audit
- Self-registering rule pattern scales cleanly — Each rule calls
defineRule(), gets stored in a global Map viaregisterRule(). No central registry file to maintain. Adding rule #252 is identical to adding rule #1 - Category weights must sum to exactly 100 — Validated programmatically, not just documented. This catches bugs where adding a new category changes the scoring model without anyone noticing
- Structured outputs for simple rules, regex fallbacks for dynamic objects — Anthropic’s structured output API requires
additionalProperties: falseon ALL nested objects. Dynamic objects like FAQ or schema.org data are incompatible. Regex JSON parsing is the pragmatic fallback - Dual delivery (CLI + Electron) from shared core — Same audit engine powers both. But
better-sqlite3needs native rebuild for each target. Thepackage.jsonserves dual purpose (main for Electron, exports for CLI)
Cross-Project Meta-Patterns
Claude Model Selection
- Opus — Orchestration, creative writing, complex reasoning. Worth the cost when quality directly impacts output
- Sonnet — Analysis, SEO optimization, editing, research. Best cost/quality ratio for most tasks
- Haiku — High-volume derivatives (social posts, email variants). Fast and cheap enough for batch operations
Batch Operations
- Parallel sub-agents for 100+ item operations, never single-threaded
- Each agent handles one atomic unit of work (one URL, one query, one content piece)
- Coordinate results after all agents complete, not during
Railway Deployment
- Edge caching requires cache-buster params when verifying deploys
- Wait 60-90s after deploy before verifying — stale CDN responses cause false failures
- Always rebuild frontend before pushing (CLAUDE.md deploy checklist)
Persistence Choices
- PostgreSQL — Transactional state (query tracking, enhancement history, cooldowns). Needs ACID
- SQLite — Content storage, generation history, audit snapshots. Embedded, no server needed
- Filesystem — Bot memory, website baselines, place ID caches. Simple key-value patterns
Anthropic API Gotchas
- Model IDs use aliases (
claude-sonnet-4-5), not date-suffixed versions (those 404) additionalProperties: falserequired on ALL nested objects for structured outputs, not just root- Schema format:
{ type: 'json_schema', schema: { type: 'object', ... } }— not OpenAI-style wrapper output_configparameter for structured outputs in the JS SDK:requestParams.output_config = { format: schema }
Key Takeaways
- Per-query SEO tracking, multi-agent content generation, and multi-source health scoring all share the same insight: granular beats aggregate
- The most fragile pattern across all projects is dual rendering/delivery paths — always verify both when adding features
- Model selection is a cost optimization lever: wrong model choice can 5x your API bill without improving quality
- Cooldowns, quality gates, and compliance rules are not bureaucracy — they prevent the specific failure modes each system has experienced
- PostgreSQL for state, SQLite for content, filesystem for caches is a pattern that works across all four projects
Related
- gsc-autonomous-seo — Source of per-query and cooldown patterns
- blog-agent-worker — Source of multi-agent and content quality patterns
- clawdbot-competitive-intel — Source of reporting and competitive patterns
- seo-audit-skill — Source of rule architecture patterns
- ecosystem-architecture — How patterns combine into a unified system
- essential-mcp-servers — MCP infrastructure patterns
- _index — Automation patterns for intelligence
- marketing-automation-use-cases — Broader marketing automation context
Try It
- Review each project’s specific patterns before starting related work — they encode hard-won lessons
- When adding a new data source to Clawdbot, verify it appears in BOTH rendering paths before merging
- When choosing a Claude model for a new agent, match the pattern: Opus for creative, Sonnet for analytical, Haiku for derivatives
- When building any new pipeline, start with the persistence choice: does the data need ACID (PostgreSQL), embedded storage (SQLite), or simple caching (filesystem)?
- Run the Anthropic API with the correct schema format from day one — migrating later is painful