METHODOLOGY

How the scans work

ARC Report runs one scan per brand per day at 02:00 UTC across 1,015 tracked brands — an HTTP-only scanner making roughly 25 requests per brand. No browser automation, no scraping of page content beyond the homepage and one product page. Everything published is reproducible from the requests described below. Last completed scan: 2026-06-20 06:31 UTC.

Machine-readable version: /methodology.md

The 9 tracked agents

User-AgentCompanyUsed by
GPTBotOpenAIChatGPT training
ChatGPT-UserOpenAIChatGPT live browsing
ClaudeBotAnthropicClaude training
Claude-WebAnthropicClaude live browsing
PerplexityBotPerplexityPerplexity / Comet
Google-ExtendedGoogleAI Mode / Gemini
AmazonbotAmazonBuy For Me
BingbotMicrosoftCopilot / Bing
CCBotCommon CrawlOpen training data

1. robots.txt parsing

We fetch /robots.txt and parse it with a standards-compliant parser, recording the effective rule for each of the 9 agents: explicitly allowed, explicitly blocked (Disallow), or no_rule (no mention — allowed by web convention). robots.txt is a policy declaration, so these verdicts are high-confidence text diffs.

2. Live HTTP access tests

Policy and enforcement differ, so we also send real requests with each agent's User-Agent string against the homepage and one product page, comparing each response to a Chrome-baseline request:

3. Structured data, protocol files, infrastructure

The two-scan confirmation rule

Published changes follow a two-tier confirmation system. Tier 1 (immediate): robots.txt rule changes — these are text-file diffs; if the rule changed, it changed. Tier 2 (requires confirmation): HTTP access verdicts, blocked-agent counts, CDN/WAF detection, and structured-data presence are inferences that can flicker with timeouts, WAF moods, or CDN caches — a Tier 2 change must appear in two consecutive daily scans before it is published to the changelog. Scanner failures (timeouts, HTTP 429) are never published as brand changes.

ARC Score v1.0

A 0–100 score summarizing how accessible a brand is to AI agents, computed from the latest scan. The component breakdown is always shown alongside the number.

ComponentPointsComputation
Agent access breadth50Mean per-agent access over the 9 agents: allowed / no_rule = 1.0, inconclusive = 0.5, restricted = 0.25, blocked = 0 — × 50.
Structured data quality25JSON-LD 7 · Schema.org Product 7 · Open Graph 4 · sitemap 4 · product feed 3.
Protocol files15llms.txt present 6 (+3 if it contains links) · agents.txt 3 · UCP 3.
Scan stability10Share of the 9 per-agent checks returning a conclusive verdict in the latest scan — × 10. Measures confidence, not access.

Versioning policy: the formula above is frozen as Score v1.0. Any change to weights or inputs ships as a new version with a changelog entry on this page, and the score version is included in all data downloads (score_version) and MCP responses so historical comparisons stay meaningful.

Known limitations

Corrections & disputes

Think a data point is wrong? See the reliability page for the dispute process, or email hello@arcreport.ai with subject [DATA DISPUTE] your-domain.com.