Technical SEO

What is Robots.txt?

Robots.txt is a text file at the root of a website that tells search engine crawlers which pages or sections they are allowed or disallowed from crawling — controlling how search engines access and discover content on the site.

Why It Matters

Not everything on a website should be crawled by search engines. Admin pages, internal search results, staging environments, duplicate filtered views, and user account pages all waste crawl budget if Google tries to access them. Robots.txt is the first file a search engine reads when it visits a site — it sets the rules for what the crawler is allowed to do.

For large sites, robots.txt is a critical crawl budget management tool. An ecommerce site with thousands of filter combinations can generate millions of crawlable URLs from parameter-based pages. Without robots.txt blocking these, Google spends its crawl budget on worthless URL variations instead of the actual product and category pages that should be indexed.

How It Works

Robots.txt uses simple directives:

User-agent — Specifies which crawler the rules apply to. User-agent: * applies to all crawlers. User-agent: Googlebot applies only to Google. Different rules can target different crawlers.
Disallow — Prevents crawling of specified paths. Disallow: /admin/ blocks the entire admin directory. Disallow: /search? blocks internal search result pages. The crawler will not request these URLs.
Allow — Permits crawling of specific paths within a disallowed directory. Allow: /admin/public/ within a disallowed /admin/ lets Google access public admin pages.
Sitemap — Points to the XML sitemap location. Sitemap: https://example.com/sitemap.xml ensures crawlers can find the sitemap regardless of other directives.

Common Mistakes

Using robots.txt to hide pages from Google's index. Robots.txt prevents crawling, not indexing. If other sites link to a disallowed page, Google may index the URL anyway — it just cannot see the content, resulting in a blank listing. To prevent indexing, use a noindex meta tag or X-Robots-Tag header. The page must be crawlable for Google to see the noindex directive.

The other mistake is a misconfigured robots.txt that blocks important content. A single wrong Disallow rule can prevent Google from crawling the entire site. CSS and JavaScript files blocked by robots.txt prevent Google from rendering pages correctly. Always verify robots.txt changes with Google's robots.txt testing tool before deploying.

How I Use This

My SEO automation audits robots.txt configuration — checking for overly broad disallow rules, blocked resources that prevent rendering, and missing sitemap references. The advanced SEO audit cross-references robots.txt rules against the site's actual URL structure to identify crawl budget waste and unintentional blocking.

References & Authority

This term is recognised by established knowledge bases:

Wikipedia Wikidata: Q80776

Related Services

How BrightIQ uses Robots.txt

This concept is central to the following services:

SEO Automation → Advanced SEO Audit →

Related Terms

Crawl Budget

Crawl budget is the number of pages a search engine will crawl on your site within a given timeframe — determined by your server's capacity and the perceived value of your content. Managing crawl budget ensures Google spends its limited crawling resources on the pages that matter.

Indexation

Indexation is the process by which search engines discover, crawl, process, and store web pages in their index — making them eligible to appear in search results. A page that is not indexed cannot rank, regardless of its content quality or optimisation.

Technical SEO

Technical SEO is the foundation layer of search engine optimisation — the crawlability, indexability, site speed, and structural elements that determine whether search engines can find, understand, and rank your pages.

XML Sitemap

An XML sitemap is a file that lists all the important URLs on a website in a format search engines can read — helping Google discover, crawl, and understand the site's structure, especially for large sites, new sites, or pages with limited internal linking.

More Technical SEO Terms

301 Redirect Broken Link Canonical Tag Core Web Vitals Crawl Budget Hreflang

← Back to Glossary

SEO Audits

SEO Services

Optimisation Tools

Content Creation Tools

SEO Audits

SEO Services

Optimisation Tools

Content Creation Tools

What is Robots.txt?

Why It Matters

How It Works

Common Mistakes

How I Use This

References & Authority

Related Services

Related Terms

Crawl Budget

Indexation

Technical SEO

XML Sitemap

More Technical SEO Terms