Robots.txt Complete Guide: How to Control What Google Crawls

JAY

Author

May 27, 2026 · 2 min read · 138 views

Robots.txt Complete Guide: How to Control What Google Crawls

Learn how robots.txt works, how to write rules for specific bots, and common mistakes that accidentally block Google from crawling your site.

Your robots.txt file is one of the most powerful — and most misunderstood — files on your website. Get it wrong and you can accidentally deindex your entire site. Get it right and you save crawl budget and keep private pages private.

What Is robots.txt?

robots.txt is a plain text file at your site root at yoursite.com/robots.txt. It tells search engine crawlers which pages to crawl or skip, following the Robots Exclusion Protocol — a voluntary standard followed by Googlebot, Bingbot, and DuckDuckBot.

Important: robots.txt controls crawling, not indexing. A blocked page can still appear in search results if other sites link to it. Use a noindex meta tag to prevent indexing.

Common robots.txt Rules

To allow everything: set User-agent to * and Allow to / and include your Sitemap URL. To block a directory: set Disallow to /admin/ or /private/. To block AI training bots like GPTBot: set User-agent to GPTBot and Disallow to / then repeat for other AI crawlers.

Common Mistakes to Avoid

Blocking CSS and JavaScript — Googlebot needs to render your pages to understand them
Using robots.txt for sensitive data — it is a public file anyone can read
Forgetting the Sitemap line — include it to help crawlers discover all your pages
Case errors — /Admin/ and /admin/ are treated as different paths

Generate Your robots.txt Free

Use the Anonymiz Robots.txt Generator to build a valid file without memorizing syntax. Choose directories to block, add your sitemap URL, and download the ready-to-upload file in seconds.

🆔

UUID Generator

Generate v4 UUIDs instantly, in bulk, free.

Generate UUIDs →

Rate this article

Your rating is stored anonymously. You can rate once per post.

Written by

JAY

JAY founded Anonymiz in 2013 and has personally built and maintained every one of its 110+ privacy and web utility tools since — from the referrer-stripping dereferer engine to the DNS leak and WebRTC leak testers. All technical infrastructure, tool logic, and site content are handled directly

Robots.txt Complete Guide: How to Control What Google Crawls

What Is robots.txt?

Common robots.txt Rules

Common Mistakes to Avoid

Generate Your robots.txt Free

Related Articles