Robots.txt Generator

What Is robots.txt?

The robots.txt file sits at the root of your website (e.g. https://example.com/robots.txt) and uses the Robots Exclusion Protocol to tell search engine crawlers and other bots which pages they are allowed or not allowed to access. It is not a security mechanism — it's a polite instruction that well-behaved bots follow voluntarily.

How to Use

Add a rule group for each bot or use * for all bots
Disallow paths you want to block (e.g. /admin/)
Allow paths override a broader Disallow (e.g. allow /api/public/ within a blocked /api/)
Add your sitemap URL so crawlers can discover all your pages
Copy the output and upload to your web root as robots.txt

Common Bot Names

* — all bots
Googlebot — Google web crawler
Bingbot — Microsoft Bing crawler
GPTBot — OpenAI's training data crawler
facebookexternalhit — Facebook link preview bot
Twitterbot — Twitter/X card preview bot

FAQ

Does robots.txt prevent indexing?

Disallowing a page only prevents crawling — Google can still index a URL it has seen elsewhere (e.g. via a link) even if it cannot crawl it. To prevent indexing entirely, use a noindex meta tag or HTTP header.

Should I block GPTBot?

GPTBot is OpenAI's web crawler used to collect training data. If you do not want your content used for AI training, add a rule to disallow GPTBot. This is increasingly common for content creators and publishers.

What is Crawl-delay?

Crawl-delay tells a bot to wait N seconds between requests. It reduces server load from aggressive crawlers. Note: Googlebot ignores this directive — use Google Search Console to set a crawl rate for Google instead.

Is robots.txt case-sensitive?

The bot name is case-sensitive (Googlebot ≠ googlebot), and paths are case-sensitive on Linux servers. Always match the exact case of the bot name and your URL paths.

Rule 1

Global Directives