Home Guides Webmaster & Technical SEODNS Infrastructure and Technical SEO: What Every Webmaster Should Know
Webmaster & Technical SEO10 minUpdated 2026-03-01

DNS Infrastructure and Technical SEO: What Every Webmaster Should Know

Most technical SEO checklists cover sitemaps, robots.txt, and structured data. Fewer cover the infrastructure layer where many crawlability and indexability problems actually originate. DNS resolution latency adds to your Time to First Byte before a single line of HTML is served. A mis-set `X-Robots-Tag` response header can silently deindex pages that are perfectly accessible to users. A canonical HTTP header conflicting with an in-page canonical tag creates ambiguous signals that Googlebot has to resolve with imperfect heuristics. **Technical SEO at the infrastructure level** is where changes have the highest leverage and the lowest visibility — problems here are invisible to users but fully visible to crawlers. This guide covers every infrastructure-layer signal that affects how Googlebot discovers, crawls, and indexes your site.

How DNS Resolution Affects TTFB and Crawl Efficiency

Every request Googlebot makes to your site begins with a DNS lookup. Before a single byte of your page is fetched, Googlebot's infrastructure must resolve your domain to an IP address. This lookup adds latency to every crawl request — and while Googlebot caches DNS resolutions, that cache has a finite TTL aligned with the TTL you set on your DNS records.

DNS TTL and crawler behaviour

If your A record TTL is 86400 (24 hours), Googlebot resolves your domain once and caches the result for up to 24 hours. This is efficient for stable infrastructure. If your TTL is 300 (5 minutes), Googlebot resolves your domain much more frequently — adding DNS lookup overhead to a larger proportion of crawl requests.

For most sites, a TTL of 3600 (1 hour) on A records is a reasonable balance: fast enough for planned infrastructure changes to propagate within an hour, without generating excessive DNS query overhead for crawlers.

💡 Tip: The DNS TTL that matters most for crawlers is on your A record (or CNAME if you use one). Use our DNS Lookup tool to verify the current TTL on your records and cross-reference with how frequently you need to make infrastructure changes. See our DNS TTL guide for the full TTL optimisation playbook.

DNS resolution latency and TTFB

TTFB (Time to First Byte) is a Core Web Vitals signal and a proxy for server performance that Google uses as an input to ranking. DNS resolution time is included in TTFB when the resolver cache is cold — the total time from request to first byte includes DNS lookup + TCP connection + TLS handshake + server processing + first byte transmitted.

On a cold DNS cache, resolution from a distant resolver can add 50–200ms before the TCP connection even begins. Reducing this means:

  • Using a fast authoritative DNS provider — major providers like Cloudflare DNS, Route 53, and Google Cloud DNS have globally distributed anycast infrastructure that responds faster than single-location authoritative nameservers
  • Keeping TTLs at a reasonable level — very low TTLs force more frequent re-resolution from cold caches
  • Avoiding unnecessary CNAME chains — each CNAME in a chain requires an additional DNS lookup. www.example.com → cdn.example.net → actual-ip costs two lookups instead of one

Nameserver consistency

Googlebot — like all DNS resolvers — queries your authoritative nameservers and may receive different results from different nameservers if your zone is in an inconsistent state. During nameserver migrations or propagation windows, some of Googlebot's queries may hit old nameservers and some may hit new ones, producing inconsistent crawl results. Use our DNS Check tool during any nameserver change to verify all authoritative nameservers return consistent records before Googlebot's next crawl cycle.

HTTP Status Codes and What Googlebot Does With Each

Every HTTP response code sends a specific signal to Googlebot about how to treat the content at that URL. Getting these right is foundational technical SEO.

Status CodeGooglebot BehaviourSEO Implication
200 OKCrawls and indexes contentNormal — content is eligible for indexing
301 Moved PermanentlyFollows redirect, transfers PageRank, updates indexPreferred for permanent URL changes
302 Found (Temporary)Follows redirect, may not transfer PageRank, keeps original in indexUse only for genuinely temporary redirects
304 Not ModifiedServes from cache — no re-crawl neededEfficient for static content; reduces crawl overhead
404 Not FoundRemoves URL from index after repeated 404sUse for genuinely deleted pages; don't use for soft-blocks
410 GoneRemoves URL from index faster than 404Use when page is permanently deleted and should be removed immediately
429 Too Many RequestsBacks off crawl rateServer is signalling overload; Googlebot respects this
500 Server ErrorRetries later; repeated 500s may cause deindexingFix server errors promptly — sustained 500s damage crawl coverage
503 Service UnavailableRetries later; with Retry-After header, pauses crawlUse for maintenance windows; include Retry-After to signal duration

The soft 404 problem: Returning 200 OK for a page that displays a "not found" or "no results" message is a soft 404. Googlebot may index these pages as thin content or eventually detect them as duplicate content. Always return genuine 404 or 410 status codes for pages that don't exist — don't redirect them to your homepage (which creates a different problem) or return 200 with error content.

The 410 vs 404 decision: Both eventually result in deindexing, but 410 signals permanence and typically results in faster removal. Use 410 for pages that are definitively gone and will never return. Use 404 for pages that may be restored or whose removal is uncertain.

The Canonical HTTP Header: When to Use It and When It Conflicts

The canonical HTTP header (Link: <url>; rel="canonical") is a server-level signal that tells Googlebot which URL is the preferred version of a page — functionally equivalent to the in-page <link rel="canonical"> tag, but delivered at the HTTP response level rather than in the HTML.

Code
# Check canonical header on any URL
curl -sI https://example.com/page | grep -i "link:"
# Link: <https://example.com/page>; rel="canonical"

Use our HTTP Headers Check tool to inspect canonical headers across your URLs without command-line access.

When the canonical HTTP header is the right choice

  • Non-HTML content: PDFs, Word documents, and other non-HTML files cannot contain a <link rel="canonical"> tag. The HTTP header is the only canonical mechanism available for these file types.
  • Paginated content served dynamically: When page parameters are added to URLs server-side, the canonical header can be set uniformly at the response level without modifying each HTML page.
  • Proxied or CDN-served content: CDNs can inject canonical headers at the edge, allowing canonical signals to be applied to content you don't directly control at the HTML level.

The conflict problem

When a canonical HTTP header and an in-page canonical tag point to different URLs, Googlebot receives contradictory signals. Google's guidance is to treat this as an ambiguous signal and use its own heuristics to determine the preferred URL — which may not be the one you intended.

Common scenarios that create conflicts:

  • CMS plugins that add in-page canonicals while the CDN or reverse proxy injects a different canonical header
  • URL normalisation rules at the CDN adding https://www.example.com/page as the canonical while the CMS is sending https://example.com/page
  • A/B testing tools that modify page URLs while the original page retains its own canonical tag

⚠️ Warning: Canonical conflicts are invisible to users and to many SEO audits that only check in-page tags. Always verify both the HTTP header and the in-page canonical using our HTTP Headers Check tool — check the raw response headers alongside the page source for every URL you are actively canonicalising.

X-Robots-Tag: Server-Level Indexing Control

The X-Robots-Tag HTTP response header applies the same directives as the HTML <meta name="robots"> tag, but at the response level. This makes it the only mechanism for controlling indexing of non-HTML resources.

Code
X-Robots-Tag: noindex
X-Robots-Tag: noindex, nofollow
X-Robots-Tag: noindex, noarchive
X-Robots-Tag: googlebot: noindex    ← bot-specific directive
Code
# Check X-Robots-Tag on any URL
curl -sI https://example.com/document.pdf | grep -i "x-robots"

When X-Robots-Tag is essential

PDFs and documents you don't want indexed: A PDF hosted on your server will be indexed by default. If it contains sensitive information, internal documentation, or content that shouldn't appear in search results, an X-Robots-Tag: noindex header is the correct control mechanism.

Staging environments served over public URLs: If your staging environment is accidentally publicly accessible and crawlable, an X-Robots-Tag: noindex applied server-wide at the staging domain prevents the content from appearing in search results without requiring per-page robots meta tags.

URL parameter variants you're not canonicalising: If tracking parameters or session IDs generate unique URLs that aren't canonicalised, applying X-Robots-Tag: noindex via server rules prevents parameter variants from accumulating in the index.

The silent deindex risk

The X-Robots-Tag header is frequently the cause of unexplained deindexing. A misconfigured server rule, a CDN configuration change, or a deployment that incorrectly applies a staging environment configuration to production can add noindex to every response — silently removing your entire site from search results without any visible error to users.

⚠️ Warning: After any significant infrastructure change — CDN configuration update, server migration, new reverse proxy — immediately verify that your production pages do not carry an unintended X-Robots-Tag: noindex header. Use our HTTP Headers Check on a sample of key URLs, and monitor Google Search Console's Coverage report for sudden drops in indexed pages.

Redirect Hygiene for Crawl Budget and PageRank

Redirects are an unavoidable part of managing a site over time, but poorly managed redirects create two compounding problems: crawl budget waste and PageRank dilution.

Crawl budget

Googlebot allocates a crawl budget to each site — a limit on how many URLs it crawls per day based on your server's crawl capacity and your site's perceived importance. Every URL in a redirect chain consumes crawl budget: an intermediate redirect URL costs a crawl even though it serves no indexable content.

For large sites with thousands of redirected URLs, redirect chains can consume a meaningful share of crawl budget, reducing the number of actual content pages Googlebot crawls per day.

PageRank flow through redirects

301 redirects pass PageRank to the destination URL, but chains weaken the signal with each hop. A link pointing to /old-page that chains through /interim then /www-interim before reaching /new-page delivers less equity to /new-page than a direct 301 would.

The redirect hygiene standard

  • Every redirect chain longer than one hop should be collapsed to a direct redirect pointing at the final destination
  • Every URL in your XML sitemap should return 200 OK — not redirect
  • Internal links should point directly to canonical URLs, not to redirected URLs
  • 302 (temporary) redirects used for permanent moves should be converted to 301

For the complete redirect chain audit and fix workflow, see our Redirect Chains and Loops guide, which covers detection methods, the correct fix sequence, and post-fix verification.

HTTPS, Mixed Content, and HSTS

HTTPS is a confirmed Google ranking signal and a prerequisite for HTTP/2 and HTTP/3. But HTTPS implementation errors can create crawlability and trust problems that are less obvious than a failed certificate.

Mixed content

Mixed content occurs when an HTTPS page loads resources (images, scripts, stylesheets, iframes) over HTTP. Browsers block active mixed content (scripts, iframes) entirely and warn on passive mixed content (images). Googlebot similarly downgrades pages with mixed content issues.

Code
# Check for mixed content in page source (basic scan)
curl -s https://example.com | grep -o 'src="http://' | wc -l
curl -s https://example.com | grep -o 'href="http://' | wc -l

HSTS and its SEO implications

The Strict-Transport-Security (HSTS) header instructs browsers to always connect to your domain over HTTPS, even if the user types http:// or follows an HTTP link. Once a browser receives an HSTS header, it automatically upgrades connections to HTTPS without performing the HTTP request — eliminating the redirect from HTTP to HTTPS for returning visitors and reducing the latency from that redirect.

From a crawl perspective, HSTS means that after Googlebot has seen the HSTS header once, subsequent HTTP URLs for your domain are automatically upgraded to HTTPS before the request is made — reducing redirect hops in Googlebot's crawl path.

Code
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

The preload directive submits your domain to the HSTS preload list maintained by browsers — meaning the HTTPS-only policy is enforced even before Googlebot has ever visited your site.

Certificate expiry and crawl interruption

An expired SSL certificate does not just affect users — it breaks Googlebot's ability to crawl your HTTPS site entirely. Googlebot will not crawl URLs that produce SSL errors. If your certificate expires, your crawl coverage drops to zero for HTTPS URLs within hours of the expiry. Monitor certificate expiry proactively — see our SSL Certificate Expiry Monitoring guide for the full monitoring setup.

Server Response Headers That Affect Crawling

Beyond the headers discussed above, several additional response headers directly influence crawler behaviour.

Content-Type

The Content-Type header tells Googlebot what type of content it has received. If a page returns Content-Type: text/html but delivers binary content, or vice versa, Googlebot may fail to parse the content correctly. Ensure Content-Type accurately reflects the content delivered and includes the charset where applicable: Content-Type: text/html; charset=utf-8.

Cache-Control and Vary

Cache-Control headers affect how Googlebot and intermediate caches handle your content. For pages that change frequently, Cache-Control: no-cache or short max-age values signal to Googlebot that the content should be re-fetched rather than served from cache. For static assets, long max-age values reduce crawl overhead.

The Vary header tells caches (and crawlers) which request headers affect the response. Vary: User-Agent signals that the content changes based on the requesting client — relevant for mobile-first indexing if you serve different content to different user agents. Googlebot uses Vary to understand whether it needs to crawl a URL separately as both desktop and mobile Googlebot.

Server header and fingerprinting

The Server response header identifies your web server software and version (e.g., Server: nginx/1.24.0). This is not an SEO signal, but it is security-relevant — exposing version information aids attackers in targeting known vulnerabilities. Consider setting a generic Server header or suppressing it entirely.

Infrastructure Audit Checklist

Use this checklist to audit the infrastructure layer of your technical SEO.

DNS

  • A record TTL is set to 3600 or lower — fast enough to allow planned changes to propagate within hours
  • No unnecessary CNAME chains adding extra DNS lookup overhead
  • All authoritative nameservers return consistent records — verify with DNS Check
  • DNS propagation confirmed after any recent nameserver or record changes

HTTP Status Codes

  • No soft 404s — pages that don't exist return genuine 404 or 410, not 200
  • Deleted pages return 410 where immediate deindexing is desired
  • No unintended 500 errors on production — monitor server error rates
  • Maintenance windows use 503 with a Retry-After header

Canonical Signals

  • Canonical HTTP header and in-page canonical tag agree on every page — verify with HTTP Headers Check
  • No canonical conflicts introduced by CDN, reverse proxy, or CMS plugin interactions
  • PDFs and non-HTML resources have appropriate canonical headers if they should be indexed

X-Robots-Tag

  • Production pages do not carry X-Robots-Tag: noindex — verify after every deployment
  • Staging environments are blocked from indexing via X-Robots-Tag or IP restriction
  • PDFs and documents that shouldn't be indexed have X-Robots-Tag: noindex

Redirects

  • No redirect chains longer than one hop — use Redirect Checker to audit
  • XML sitemap contains only URLs returning 200 OK
  • Internal links point directly to canonical destination URLs
  • No 302 redirects used for permanent moves

HTTPS and Security

  • SSL certificate is valid and not expiring within 60 days
  • No mixed content on any HTTPS pages
  • HSTS header is set with appropriate max-age
  • HTTP to HTTPS redirect is a single-hop 301

Frequently Asked Questions

Q: Does DNS provider choice affect SEO?

Indirectly, yes. DNS resolution speed contributes to TTFB, which is a Core Web Vitals signal. An authoritative DNS provider with globally distributed anycast infrastructure (Cloudflare DNS, Route 53, Google Cloud DNS) resolves faster from more geographic locations than a single-region provider. The impact is typically small relative to server processing time, but on sites optimising every millisecond of TTFB, nameserver choice matters.

Q: Can a CDN introduce technical SEO problems?

Yes, in several ways. CDNs can inject or modify HTTP response headers — including X-Robots-Tag, Link: rel="canonical", and Cache-Control — that conflict with what your origin server sends. CDNs can also cache error pages (including 404s) and serve them with a 200 status code (creating soft 404s), or cache redirect responses and serve stale redirect chains. Audit your CDN configuration specifically for header injection rules and cache behaviours that affect SEO-sensitive responses.

Q: How do I check if my site has canonical conflicts?

Run our HTTP Headers Check on each URL and compare the Link: rel="canonical" header in the HTTP response against the <link rel="canonical"> tag in the page source. Both should point to the same URL. Discrepancies indicate a conflict that needs to be resolved — identify whether the CDN, reverse proxy, or CMS is the source of the conflicting signal.

Q: Does HTTPS affect Googlebot's crawl rate?

HTTPS itself doesn't directly affect crawl rate, but HTTPS performance does. TLS handshake overhead adds latency to every HTTPS connection. If your TLS configuration is slow (weak cipher suite selection, missing session resumption, no OCSP stapling), it contributes to higher TTFB and potentially slower perceived server response — which can affect how aggressively Googlebot crawls.

Q: My Search Console shows a sudden drop in indexed pages — what should I check first?

Check these in order: (1) X-Robots-Tag: noindex on production pages — a deployment may have accidentally applied a staging configuration to production; (2) robots.txt changes — a Disallow: / at the wrong level blocks all crawling; (3) canonical conflicts that redirect all pages to a single URL; (4) SSL certificate expiry or errors preventing HTTPS crawling; (5) DNS resolution failures if pages are inaccessible. Use HTTP Headers Check and DNS Check to verify each layer.

Next Steps

Start with the Infrastructure Audit Checklist above — run through each category against your live production environment. Use HTTP Headers Check to verify your response headers, Redirect Checker to audit redirect chains, and DNS Check to confirm nameserver consistency.

For the redirect chain audit in full detail, see our Redirect Chains and Loops guide.

For SSL certificate health monitoring — keeping your HTTPS crawling available and avoiding sudden crawl dropouts from certificate expiry — see our SSL Certificate Expiry Monitoring guide.

Browse all webmaster guides on DNSnexus for related technical SEO topics.