Technical SEO DNS Infrastructure Guide

How DNS Resolution Affects TTFB and Crawl Efficiency

Every request Googlebot makes to your site begins with a DNS lookup. Before a single byte of your page is fetched, Googlebot's infrastructure must resolve your domain to an IP address. This lookup adds latency to every crawl request — and while Googlebot caches DNS resolutions, that cache has a finite TTL aligned with the TTL you set on your DNS records.

DNS TTL and crawler behaviour

If your A record TTL is 86400 (24 hours), Googlebot resolves your domain once and caches the result for up to 24 hours. This is efficient for stable infrastructure. If your TTL is 300 (5 minutes), Googlebot resolves your domain much more frequently — adding DNS lookup overhead to a larger proportion of crawl requests.

For most sites, a TTL of 3600 (1 hour) on A records is a reasonable balance: fast enough for planned infrastructure changes to propagate within an hour, without generating excessive DNS query overhead for crawlers.

💡 Tip: The DNS TTL that matters most for crawlers is on your A record (or CNAME if you use one). Use our DNS Lookup tool to verify the current TTL on your records and cross-reference with how frequently you need to make infrastructure changes. See our DNS TTL guide for the full TTL optimisation playbook.

DNS resolution latency and TTFB

TTFB (Time to First Byte) is a Core Web Vitals signal and a proxy for server performance that Google uses as an input to ranking. DNS resolution time is included in TTFB when the resolver cache is cold — the total time from request to first byte includes DNS lookup + TCP connection + TLS handshake + server processing + first byte transmitted.

On a cold DNS cache, resolution from a distant resolver can add 50–200ms before the TCP connection even begins. Reducing this means:

Using a fast authoritative DNS provider — major providers like Cloudflare DNS, Route 53, and Google Cloud DNS have globally distributed anycast infrastructure that responds faster than single-location authoritative nameservers
Keeping TTLs at a reasonable level — very low TTLs force more frequent re-resolution from cold caches
Avoiding unnecessary CNAME chains — each CNAME in a chain requires an additional DNS lookup. www.example.com → cdn.example.net → actual-ip costs two lookups instead of one

Nameserver consistency

Googlebot — like all DNS resolvers — queries your authoritative nameservers and may receive different results from different nameservers if your zone is in an inconsistent state. During nameserver migrations or propagation windows, some of Googlebot's queries may hit old nameservers and some may hit new ones, producing inconsistent crawl results. Use our DNS Check tool during any nameserver change to verify all authoritative nameservers return consistent records before Googlebot's next crawl cycle.

HTTP Status Codes and What Googlebot Does With Each

Every HTTP response code sends a specific signal to Googlebot about how to treat the content at that URL. Getting these right is foundational technical SEO.

Status Code	Googlebot Behaviour	SEO Implication
`200 OK`	Crawls and indexes content	Normal — content is eligible for indexing
`301 Moved Permanently`	Follows redirect, transfers PageRank, updates index	Preferred for permanent URL changes
`302 Found` (Temporary)	Follows redirect, may not transfer PageRank, keeps original in index	Use only for genuinely temporary redirects
`304 Not Modified`	Serves from cache — no re-crawl needed	Efficient for static content; reduces crawl overhead
`404 Not Found`	Removes URL from index after repeated 404s	Use for genuinely deleted pages; don't use for soft-blocks
`410 Gone`	Removes URL from index faster than 404	Use when page is permanently deleted and should be removed immediately
`429 Too Many Requests`	Backs off crawl rate	Server is signalling overload; Googlebot respects this
`500 Server Error`	Retries later; repeated 500s may cause deindexing	Fix server errors promptly — sustained 500s damage crawl coverage
`503 Service Unavailable`	Retries later; with `Retry-After` header, pauses crawl	Use for maintenance windows; include `Retry-After` to signal duration

The soft 404 problem: Returning 200 OK for a page that displays a "not found" or "no results" message is a soft 404. Googlebot may index these pages as thin content or eventually detect them as duplicate content. Always return genuine 404 or 410 status codes for pages that don't exist — don't redirect them to your homepage (which creates a different problem) or return 200 with error content.

The 410 vs 404 decision: Both eventually result in deindexing, but 410 signals permanence and typically results in faster removal. Use 410 for pages that are definitively gone and will never return. Use 404 for pages that may be restored or whose removal is uncertain.

The Canonical HTTP Header: When to Use It and When It Conflicts

The canonical HTTP header (Link: <url>; rel="canonical") is a server-level signal that tells Googlebot which URL is the preferred version of a page — functionally equivalent to the in-page <link rel="canonical"> tag, but delivered at the HTTP response level rather than in the HTML.

Code

# Check canonical header on any URL
curl -sI https://example.com/page | grep -i "link:"
# Link: <https://example.com/page>; rel="canonical"

Use our HTTP Headers Check tool to inspect canonical headers across your URLs without command-line access.

When the canonical HTTP header is the right choice

Non-HTML content: PDFs, Word documents, and other non-HTML files cannot contain a <link rel="canonical"> tag. The HTTP header is the only canonical mechanism available for these file types.
Paginated content served dynamically: When page parameters are added to URLs server-side, the canonical header can be set uniformly at the response level without modifying each HTML page.
Proxied or CDN-served content: CDNs can inject canonical headers at the edge, allowing canonical signals to be applied to content you don't directly control at the HTML level.

The conflict problem

When a canonical HTTP header and an in-page canonical tag point to different URLs, Googlebot receives contradictory signals. Google's guidance is to treat this as an ambiguous signal and use its own heuristics to determine the preferred URL — which may not be the one you intended.

Common scenarios that create conflicts:

CMS plugins that add in-page canonicals while the CDN or reverse proxy injects a different canonical header
URL normalisation rules at the CDN adding https://www.example.com/page as the canonical while the CMS is sending https://example.com/page
A/B testing tools that modify page URLs while the original page retains its own canonical tag

⚠️ Warning: Canonical conflicts are invisible to users and to many SEO audits that only check in-page tags. Always verify both the HTTP header and the in-page canonical using our HTTP Headers Check tool — check the raw response headers alongside the page source for every URL you are actively canonicalising.

X-Robots-Tag: Server-Level Indexing Control

The X-Robots-Tag HTTP response header applies the same directives as the HTML <meta name="robots"> tag, but at the response level. This makes it the only mechanism for controlling indexing of non-HTML resources.

Code

X-Robots-Tag: noindex
X-Robots-Tag: noindex, nofollow
X-Robots-Tag: noindex, noarchive
X-Robots-Tag: googlebot: noindex    ← bot-specific directive

Code

# Check X-Robots-Tag on any URL
curl -sI https://example.com/document.pdf | grep -i "x-robots"

When X-Robots-Tag is essential

PDFs and documents you don't want indexed: A PDF hosted on your server will be indexed by default. If it contains sensitive information, internal documentation, or content that shouldn't appear in search results, an X-Robots-Tag: noindex header is the correct control mechanism.

Staging environments served over public URLs: If your staging environment is accidentally publicly accessible and crawlable, an X-Robots-Tag: noindex applied server-wide at the staging domain prevents the content from appearing in search results without requiring per-page robots meta tags.

URL parameter variants you're not canonicalising: If tracking parameters or session IDs generate unique URLs that aren't canonicalised, applying X-Robots-Tag: noindex via server rules prevents parameter variants from accumulating in the index.

The silent deindex risk

The X-Robots-Tag header is frequently the cause of unexplained deindexing. A misconfigured server rule, a CDN configuration change, or a deployment that incorrectly applies a staging environment configuration to production can add noindex to every response — silently removing your entire site from search results without any visible error to users.

⚠️ Warning: After any significant infrastructure change — CDN configuration update, server migration, new reverse proxy — immediately verify that your production pages do not carry an unintended X-Robots-Tag: noindex header. Use our HTTP Headers Check on a sample of key URLs, and monitor Google Search Console's Coverage report for sudden drops in indexed pages.

Redirect Hygiene for Crawl Budget and PageRank

Redirects are an unavoidable part of managing a site over time, but poorly managed redirects create two compounding problems: crawl budget waste and PageRank dilution.

Crawl budget

Googlebot allocates a crawl budget to each site — a limit on how many URLs it crawls per day based on your server's crawl capacity and your site's perceived importance. Every URL in a redirect chain consumes crawl budget: an intermediate redirect URL costs a crawl even though it serves no indexable content.

For large sites with thousands of redirected URLs, redirect chains can consume a meaningful share of crawl budget, reducing the number of actual content pages Googlebot crawls per day.

PageRank flow through redirects

301 redirects pass PageRank to the destination URL, but chains weaken the signal with each hop. A link pointing to /old-page that chains through /interim then /www-interim before reaching /new-page delivers less equity to /new-page than a direct 301 would.

The redirect hygiene standard

Every redirect chain longer than one hop should be collapsed to a direct redirect pointing at the final destination
Every URL in your XML sitemap should return 200 OK — not redirect
Internal links should point directly to canonical URLs, not to redirected URLs
302 (temporary) redirects used for permanent moves should be converted to 301

For the complete redirect chain audit and fix workflow, see our Redirect Chains and Loops guide, which covers detection methods, the correct fix sequence, and post-fix verification.

HTTPS, Mixed Content, and HSTS

HTTPS is a confirmed Google ranking signal and a prerequisite for HTTP/2 and HTTP/3. But HTTPS implementation errors can create crawlability and trust problems that are less obvious than a failed certificate.

Mixed content

Mixed content occurs when an HTTPS page loads resources (images, scripts, stylesheets, iframes) over HTTP. Browsers block active mixed content (scripts, iframes) entirely and warn on passive mixed content (images). Googlebot similarly downgrades pages with mixed content issues.

Code

# Check for mixed content in page source (basic scan)
curl -s https://example.com | grep -o 'src="http://' | wc -l
curl -s https://example.com | grep -o 'href="http://' | wc -l

HSTS and its SEO implications

The Strict-Transport-Security (HSTS) header instructs browsers to always connect to your domain over HTTPS, even if the user types http:// or follows an HTTP link. Once a browser receives an HSTS header, it automatically upgrades connections to HTTPS without performing the HTTP request — eliminating the redirect from HTTP to HTTPS for returning visitors and reducing the latency from that redirect.

From a crawl perspective, HSTS means that after Googlebot has seen the HSTS header once, subsequent HTTP URLs for your domain are automatically upgraded to HTTPS before the request is made — reducing redirect hops in Googlebot's crawl path.

Code

Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

The preload directive submits your domain to the HSTS preload list maintained by browsers — meaning the HTTPS-only policy is enforced even before Googlebot has ever visited your site.

Certificate expiry and crawl interruption

An expired SSL certificate does not just affect users — it breaks Googlebot's ability to crawl your HTTPS site entirely. Googlebot will not crawl URLs that produce SSL errors. If your certificate expires, your crawl coverage drops to zero for HTTPS URLs within hours of the expiry. Monitor certificate expiry proactively — see our SSL Certificate Expiry Monitoring guide for the full monitoring setup.

Server Response Headers That Affect Crawling

Beyond the headers discussed above, several additional response headers directly influence crawler behaviour.

Content-Type

The Content-Type header tells Googlebot what type of content it has received. If a page returns Content-Type: text/html but delivers binary content, or vice versa, Googlebot may fail to parse the content correctly. Ensure Content-Type accurately reflects the content delivered and includes the charset where applicable: Content-Type: text/html; charset=utf-8.

Cache-Control and Vary

Cache-Control headers affect how Googlebot and intermediate caches handle your content. For pages that change frequently, Cache-Control: no-cache or short max-age values signal to Googlebot that the content should be re-fetched rather than served from cache. For static assets, long max-age values reduce crawl overhead.

The Vary header tells caches (and crawlers) which request headers affect the response. Vary: User-Agent signals that the content changes based on the requesting client — relevant for mobile-first indexing if you serve different content to different user agents. Googlebot uses Vary to understand whether it needs to crawl a URL separately as both desktop and mobile Googlebot.

Server header and fingerprinting

The Server response header identifies your web server software and version (e.g., Server: nginx/1.24.0). This is not an SEO signal, but it is security-relevant — exposing version information aids attackers in targeting known vulnerabilities. Consider setting a generic Server header or suppressing it entirely.

Infrastructure Audit Checklist

Use this checklist to audit the infrastructure layer of your technical SEO.

DNS

A record TTL is set to 3600 or lower — fast enough to allow planned changes to propagate within hours
No unnecessary CNAME chains adding extra DNS lookup overhead
All authoritative nameservers return consistent records — verify with DNS Check
DNS propagation confirmed after any recent nameserver or record changes

HTTP Status Codes

No soft 404s — pages that don't exist return genuine 404 or 410, not 200
Deleted pages return 410 where immediate deindexing is desired
No unintended 500 errors on production — monitor server error rates
Maintenance windows use 503 with a Retry-After header

Canonical Signals

Canonical HTTP header and in-page canonical tag agree on every page — verify with HTTP Headers Check
No canonical conflicts introduced by CDN, reverse proxy, or CMS plugin interactions
PDFs and non-HTML resources have appropriate canonical headers if they should be indexed

X-Robots-Tag

Production pages do not carry X-Robots-Tag: noindex — verify after every deployment
Staging environments are blocked from indexing via X-Robots-Tag or IP restriction
PDFs and documents that shouldn't be indexed have X-Robots-Tag: noindex

Redirects

No redirect chains longer than one hop — use Redirect Checker to audit
XML sitemap contains only URLs returning 200 OK
Internal links point directly to canonical destination URLs
No 302 redirects used for permanent moves

HTTPS and Security

SSL certificate is valid and not expiring within 60 days
No mixed content on any HTTPS pages
HSTS header is set with appropriate max-age
HTTP to HTTPS redirect is a single-hop 301

Frequently Asked Questions

Q: Does DNS provider choice affect SEO?

Indirectly, yes. DNS resolution speed contributes to TTFB, which is a Core Web Vitals signal. An authoritative DNS provider with globally distributed anycast infrastructure (Cloudflare DNS, Route 53, Google Cloud DNS) resolves faster from more geographic locations than a single-region provider. The impact is typically small relative to server processing time, but on sites optimising every millisecond of TTFB, nameserver choice matters.

Q: Can a CDN introduce technical SEO problems?

Yes, in several ways. CDNs can inject or modify HTTP response headers — including X-Robots-Tag, Link: rel="canonical", and Cache-Control — that conflict with what your origin server sends. CDNs can also cache error pages (including 404s) and serve them with a 200 status code (creating soft 404s), or cache redirect responses and serve stale redirect chains. Audit your CDN configuration specifically for header injection rules and cache behaviours that affect SEO-sensitive responses.

Q: How do I check if my site has canonical conflicts?

Run our HTTP Headers Check on each URL and compare the Link: rel="canonical" header in the HTTP response against the <link rel="canonical"> tag in the page source. Both should point to the same URL. Discrepancies indicate a conflict that needs to be resolved — identify whether the CDN, reverse proxy, or CMS is the source of the conflicting signal.

Q: Does HTTPS affect Googlebot's crawl rate?

HTTPS itself doesn't directly affect crawl rate, but HTTPS performance does. TLS handshake overhead adds latency to every HTTPS connection. If your TLS configuration is slow (weak cipher suite selection, missing session resumption, no OCSP stapling), it contributes to higher TTFB and potentially slower perceived server response — which can affect how aggressively Googlebot crawls.

Q: My Search Console shows a sudden drop in indexed pages — what should I check first?

Check these in order: (1) X-Robots-Tag: noindex on production pages — a deployment may have accidentally applied a staging configuration to production; (2) robots.txt changes — a Disallow: / at the wrong level blocks all crawling; (3) canonical conflicts that redirect all pages to a single URL; (4) SSL certificate expiry or errors preventing HTTPS crawling; (5) DNS resolution failures if pages are inaccessible. Use HTTP Headers Check and DNS Check to verify each layer.

Next Steps

Start with the Infrastructure Audit Checklist above — run through each category against your live production environment. Use HTTP Headers Check to verify your response headers, Redirect Checker to audit redirect chains, and DNS Check to confirm nameserver consistency.

For the redirect chain audit in full detail, see our Redirect Chains and Loops guide.

For SSL certificate health monitoring — keeping your HTTPS crawling available and avoiding sudden crawl dropouts from certificate expiry — see our SSL Certificate Expiry Monitoring guide.

Browse all webmaster guides on DNSnexus for related technical SEO topics.

DNS Infrastructure and Technical SEO: What Every Webmaster Should Know

How DNS Resolution Affects TTFB and Crawl Efficiency

DNS TTL and crawler behaviour

DNS resolution latency and TTFB

Nameserver consistency

HTTP Status Codes and What Googlebot Does With Each

The Canonical HTTP Header: When to Use It and When It Conflicts

When the canonical HTTP header is the right choice

The conflict problem

X-Robots-Tag: Server-Level Indexing Control

When X-Robots-Tag is essential

The silent deindex risk

Redirect Hygiene for Crawl Budget and PageRank

Crawl budget

PageRank flow through redirects

The redirect hygiene standard

HTTPS, Mixed Content, and HSTS

Mixed content

HSTS and its SEO implications

Certificate expiry and crawl interruption

Server Response Headers That Affect Crawling

Content-Type

Cache-Control and Vary

Server header and fingerprinting

Infrastructure Audit Checklist

Frequently Asked Questions

Next Steps

Related Guides