T
DataToolings

Robots.txt Generator

Generate a robots.txt file for your website. Control which bots can crawl your pages, block specific paths, and add your sitemap URL.

Rule 1

Global Directives

Upload this file to your web root as robots.txt (e.g. https://example.com/robots.txt)

What Is robots.txt?

The robots.txt file sits at the root of your website (e.g. https://example.com/robots.txt) and uses the Robots Exclusion Protocol to tell search engine crawlers and other bots which pages they are allowed or not allowed to access. It is not a security mechanism — it's a polite instruction that well-behaved bots follow voluntarily.

How to Use

  • Add a rule group for each bot or use * for all bots
  • Disallow paths you want to block (e.g. /admin/)
  • Allow paths override a broader Disallow (e.g. allow /api/public/ within a blocked /api/)
  • Add your sitemap URL so crawlers can discover all your pages
  • Copy the output and upload to your web root as robots.txt

Common Bot Names

  • * — all bots
  • Googlebot — Google web crawler
  • Bingbot — Microsoft Bing crawler
  • GPTBot — OpenAI's training data crawler
  • facebookexternalhit — Facebook link preview bot
  • Twitterbot — Twitter/X card preview bot

FAQ

Does robots.txt prevent indexing?

Disallowing a page only prevents crawling — Google can still index a URL it has seen elsewhere (e.g. via a link) even if it cannot crawl it. To prevent indexing entirely, use a noindex meta tag or HTTP header.

Should I block GPTBot?

GPTBot is OpenAI's web crawler used to collect training data. If you do not want your content used for AI training, add a rule to disallow GPTBot. This is increasingly common for content creators and publishers.

What is Crawl-delay?

Crawl-delay tells a bot to wait N seconds between requests. It reduces server load from aggressive crawlers. Note: Googlebot ignores this directive — use Google Search Console to set a crawl rate for Google instead.

Is robots.txt case-sensitive?

The bot name is case-sensitive (Googlebot ≠ googlebot), and paths are case-sensitive on Linux servers. Always match the exact case of the bot name and your URL paths.