Robots.txt Generator
Generate a robots.txt file for your website. Control which bots can crawl your pages, block specific paths, and add your sitemap URL.
Rule 1
Global Directives
Upload this file to your web root as robots.txt (e.g. https://example.com/robots.txt)
What Is robots.txt?
The robots.txt file sits at the root of your website (e.g. https://example.com/robots.txt) and uses the Robots Exclusion Protocol to tell search engine crawlers and other bots which pages they are allowed or not allowed to access. It is not a security mechanism — it's a polite instruction that well-behaved bots follow voluntarily.
How to Use
- Add a rule group for each bot or use
*for all bots - Disallow paths you want to block (e.g.
/admin/) - Allow paths override a broader Disallow (e.g. allow
/api/public/within a blocked/api/) - Add your sitemap URL so crawlers can discover all your pages
- Copy the output and upload to your web root as
robots.txt
Common Bot Names
*— all botsGooglebot— Google web crawlerBingbot— Microsoft Bing crawlerGPTBot— OpenAI's training data crawlerfacebookexternalhit— Facebook link preview botTwitterbot— Twitter/X card preview bot
FAQ
Does robots.txt prevent indexing?
Disallowing a page only prevents crawling — Google can still index a URL it has seen elsewhere (e.g. via a link) even if it cannot crawl it. To prevent indexing entirely, use a noindex meta tag or HTTP header.
Should I block GPTBot?
GPTBot is OpenAI's web crawler used to collect training data. If you do not want your content used for AI training, add a rule to disallow GPTBot. This is increasingly common for content creators and publishers.
What is Crawl-delay?
Crawl-delay tells a bot to wait N seconds between requests. It reduces server load from aggressive crawlers. Note: Googlebot ignores this directive — use Google Search Console to set a crawl rate for Google instead.
Is robots.txt case-sensitive?
The bot name is case-sensitive (Googlebot ≠ googlebot), and paths are case-sensitive on Linux servers. Always match the exact case of the bot name and your URL paths.