Question 1

What is a robots.txt file?

Accepted Answer

A robots.txt file sits at the root of your domain (e.g. https://example.com/robots.txt) and tells search engine crawlers which pages or directories they may or may not access. It is a plain-text file following the Robots Exclusion Protocol — a widely adopted de-facto standard supported by Google, Bing, and every major search engine.

Question 2

Does robots.txt actually block indexing?

Accepted Answer

No — it only controls whether a crawler may fetch a URL, not whether the URL appears in search results. A disallowed page can still be indexed if other sites link to it. To prevent indexing, combine robots.txt disallow rules with a <meta name="robots" content="noindex"> tag or an X-Robots-Tag HTTP response header on the target page.

Question 3

Should I use Allow: / or leave Allow empty to permit all crawling?

Accepted Answer

Leaving both Allow and Disallow empty (or only specifying "User-agent: *") already permits full crawling — it is the default. An explicit "Allow: /" is only meaningful when paired with a more specific "Disallow:" rule, such as "Disallow: /private/" and "Allow: /private/press-kit/", to carve out a sub-path from a broader block.

Robots.txt Generator

What is a robots.txt file and why does it matter?

How to use this generator

Quick presets

Privacy

Frequently Asked Questions