Robots.txt Checker
View and validate any website's robots.txt file. See all crawl rules, test whether a specific URL is blocked for Googlebot or other crawlers, and check sitemap declarations.
What is robots.txt?+
robots.txt is a file placed in the root of a website that tells search engine crawlers which pages or sections they are allowed or not allowed to crawl. It is part of the Robots Exclusion Protocol and is checked by Googlebot, Bingbot and other crawlers before accessing any URL.
Does robots.txt prevent Google from indexing a page?+
robots.txt prevents crawling — Googlebot will not visit the page. But it does not prevent indexing entirely. Google can still index a URL it has seen in links even without crawling it. To prevent indexing, use a noindex meta tag or X-Robots-Tag header, which requires the page to be crawlable.
What is the difference between Allow and Disallow?+
Disallow: /path blocks a crawler from accessing any URL starting with that path. Allow: /path explicitly permits access to a path that would otherwise be blocked by a broader Disallow rule. Allow rules take precedence over Disallow rules of equal specificity.
Can robots.txt block all crawlers?+
Disallow: / under User-agent: * blocks all compliant crawlers from the entire site. Note: robots.txt is advisory — malicious bots and scrapers ignore it. It only affects well-behaved crawlers like Googlebot that voluntarily follow the protocol.