DNS Monitoring Best Practices for Reliability

Why DNS Is Critical Infrastructure

Every interaction a user has with your application begins with a DNS lookup. Before a single byte of HTML is transferred, before any TLS handshake occurs, a DNS query must resolve your domain name to an IP address. If that resolution fails or returns the wrong answer, nothing else matters -- your application is effectively unreachable regardless of whether the servers behind it are running perfectly.

Despite this fundamental dependency, DNS monitoring is frequently overlooked. Teams invest heavily in application performance monitoring, server health checks, and uptime monitoring, but treat DNS as a static configuration that never changes and never breaks. That assumption is dangerous.

DNS failures come in several forms, each with distinct characteristics and consequences. Complete resolution failures mean your domain simply stops working -- browsers display connection errors and APIs return nothing. Incorrect resolution, where queries return the wrong IP address, can silently redirect traffic to the wrong server or, in the worst case, to an attacker-controlled host. Slow resolution adds latency to every single request, compounding across page loads that may trigger dozens of DNS lookups for different subdomains and third-party resources.

The cascading nature of DNS failures makes them particularly damaging. A misconfigured nameserver doesn't just affect your website -- it can break email delivery, API integrations, CDN routing, certificate validation, and every other service tied to your domain. A single DNS change gone wrong can simultaneously take down every digital service your organisation operates.

Proper DNS monitoring is not optional for any organisation that depends on its online presence. It's the foundation layer that everything else sits upon, and monitoring it effectively requires understanding what can go wrong and how to detect it before your users do.

Common DNS Failures and Their Impact

Understanding the failure modes helps you design monitoring that catches problems early. Here are the most common DNS failures that affect production services.

Nameserver outages

Your authoritative nameservers are the ultimate source of truth for your domain's DNS records. If all of them become unavailable simultaneously, resolvers worldwide will start returning SERVFAIL errors once their cached records expire. The severity depends on your TTL values -- short TTLs mean faster failure but also faster recovery, while long TTLs provide a buffer but delay the propagation of legitimate changes.

Accidental record deletion or modification

Human error remains the leading cause of DNS incidents. A team member modifying DNS records in a web console can easily delete the wrong record, change an A record to point to a decommissioned server, or introduce a typo in a CNAME target. These mistakes are trivially easy to make and can be devastatingly difficult to diagnose without monitoring in place.

Registrar and delegation issues

Domain expiration is a surprisingly common cause of catastrophic outages, even for large organisations. When a domain expires, the registrar may stop serving DNS for it entirely, or worse, redirect it to a parking page. Similarly, if the NS delegation records at the registrar don't match your actual nameservers, resolution can fail unpredictably.

DNS cache poisoning

Cache poisoning attacks inject false records into recursive resolvers, redirecting traffic intended for your domain to a malicious server. While DNSSEC helps mitigate this, adoption remains incomplete, and monitoring for unexpected resolution results is an important complementary defence.

Provider-level failures

Even major DNS hosting providers experience outages. If your DNS is hosted with a single provider and that provider has an incident, your domain becomes unresolvable. This risk can be mitigated with secondary DNS providers, but monitoring is essential to detect when a failover is needed and to verify it worked correctly.

TTL misconfiguration

Setting TTLs too high means that incorrect records persist in caches for hours or days after you fix them. Setting them too low increases query volume and makes your service more vulnerable to nameserver outages. Monitoring should track TTL values alongside the records themselves to ensure they remain within your intended ranges.

What DNS Records to Monitor

Effective DNS monitoring means watching the specific record types that matter to your services. Each type serves a different purpose, and failures in each have different consequences.

A and AAAA records

These are the most fundamental records, mapping your domain to IPv4 (A) and IPv6 (AAAA) addresses. Monitor these for your primary domain, www subdomain, and any application subdomains (app.example.com, api.example.com). Verify that the returned IP addresses match your expected infrastructure. A DNS lookup tool is invaluable for quick spot-checks, but automated monitoring should run continuously.

CNAME records

CNAME records create aliases, often used to point subdomains to CDN endpoints or load balancers. A broken CNAME chain -- where the target no longer exists or has changed -- can silently take a subdomain offline. Monitor both the CNAME record itself and the resolution of its target to catch chain breaks.

MX records

Mail exchange records control where email for your domain is delivered. If MX records are deleted, modified, or pointed to the wrong server, incoming email vanishes silently. The sender gets no bounce; the recipient simply never receives the message. Use an MX lookup tool regularly, and set up automated monitoring that alerts on any change to your MX configuration.

TXT records

TXT records serve multiple critical functions: SPF records for email authentication, DKIM keys for email signing, DMARC policies for email security, and domain verification tokens for various services. An accidentally deleted SPF record can cause your legitimate emails to be flagged as spam across the internet. A SPF checker helps verify your email authentication configuration, but continuous monitoring ensures it stays correct over time.

NS records

Nameserver records define which servers are authoritative for your domain. These rarely change, which means unauthorised changes are a strong signal of either a configuration error or a security incident. Monitor NS records and alert immediately on any modification.

SOA records

The Start of Authority record contains metadata about your DNS zone, including the serial number that increments with each change. Monitoring the SOA serial number gives you a high-level signal that something in your zone has changed, prompting a more detailed investigation of what specifically was modified.

CAA records

Certificate Authority Authorization records specify which certificate authorities are permitted to issue SSL certificates for your domain. If CAA records are deleted or modified, it could allow an attacker to obtain a valid certificate for your domain from a CA you don't use. Monitor these as part of your broader certificate security posture.

Detecting DNS Hijacking and Unauthorised Changes

DNS hijacking -- where an attacker gains control of your DNS records and redirects traffic to their servers -- is one of the most dangerous attacks a domain can suffer. Because the hijacked domain still appears legitimate (it resolves, the URL looks correct), users and even automated systems may not notice until significant damage has been done.

How hijacking occurs

Attackers typically gain DNS control through one of several vectors: compromised registrar accounts (often through weak passwords or social engineering), exploited vulnerabilities in DNS management APIs, compromised hosting provider accounts, or BGP hijacking that redirects queries to rogue nameservers. In high-profile cases, attackers have even used fraudulent registrar transfer requests to take control of domains.

Monitoring for hijacking indicators

Effective detection relies on continuously verifying that your DNS records match your expected configuration. Set up monitors that check:

A/AAAA record IP addresses -- Alert if the resolved IP addresses change to values outside your known infrastructure ranges
NS record values -- Any change to your nameserver delegation is a critical event requiring immediate investigation
SOA serial numbers -- Unexpected serial number changes indicate zone modifications you may not have authorised
WHOIS data -- Changes to registrant information, nameservers in WHOIS, or registrar transfer status can indicate an account compromise
Certificate transparency logs -- New certificates issued for your domain that you didn't request may indicate an attacker is preparing a hijacked endpoint with valid TLS

Multi-location verification

Hijacking attacks sometimes target specific geographic regions by poisoning resolvers in particular networks. Monitoring from a single location might miss an attack that only affects users in another country. Use DNS monitoring that queries from multiple geographic locations to detect region-specific manipulation.

Baseline and anomaly detection

Establish a baseline of your expected DNS configuration and alert on any deviation. This should include not just the record values but also the response metadata: TTL values, the responding nameserver, and response times. Sudden changes in any of these parameters warrant investigation even if the record values themselves appear unchanged.

DNSSEC validation

If you have deployed DNSSEC, monitor that your signatures remain valid and that the chain of trust from the root zone to your domain is intact. DNSSEC validation failures can indicate either a configuration problem on your side or an active attack. Either way, they require immediate attention.

Monitoring DNS Propagation

When you make a legitimate DNS change -- migrating to a new hosting provider, switching CDN endpoints, or updating MX records -- you need to verify that the change propagates correctly across the global DNS infrastructure. Propagation is not instantaneous, and monitoring it prevents you from declaring a migration complete before all users are actually reaching the new destination.

How propagation works

DNS propagation is not really "propagation" in the broadcast sense. It's the gradual expiration and refresh of cached records across thousands of recursive resolvers worldwide. When you change a record, resolvers that have cached the old value will continue serving it until their cached copy expires (governed by the TTL). Only then will they query your authoritative nameserver for the fresh record.

Factors affecting propagation time

The primary factor is the TTL of the record being changed. A record with a 3600-second (one hour) TTL will take up to one hour to propagate after the change, assuming all resolvers respect the TTL. Some resolvers, however, apply minimum TTL floors or cache records longer than specified. In practice, global propagation typically completes within 24 to 48 hours for records with standard TTLs, though the vast majority of traffic shifts within the first few hours.

Pre-migration TTL reduction

A common best practice before planned DNS changes is to reduce the TTL well in advance. If your records currently have a 24-hour TTL, reduce them to 300 seconds (5 minutes) at least 48 hours before the planned change. This ensures that by the time you make the actual change, most resolvers will have cached the record with the shorter TTL, leading to much faster propagation of the new value. After the migration is confirmed successful, you can increase the TTL back to a longer duration.

Monitoring propagation progress

During a migration, query your records from resolvers in multiple geographic regions at regular intervals. Track what percentage of resolvers are returning the new value versus the old one. This gives you a real-time propagation percentage that tells you when it's safe to decommission old infrastructure. Your DNS monitoring setup should be able to query from diverse locations and compare results against expected values.

Handling split-horizon DNS

If you use split-horizon DNS (returning different records to internal versus external queries), ensure your monitoring covers both perspectives. A change that propagates correctly from the external internet view might not be reflected in internal DNS, or vice versa. Monitor from both inside and outside your network to catch discrepancies.

Setting Up DNS Alerts

Monitoring without alerting is just data collection. The value of DNS monitoring is realised when it triggers timely notifications that allow your team to respond before users are significantly impacted.

What to alert on

Not every DNS monitoring check warrants a page to your on-call engineer. Prioritise alerts based on impact:

Critical (immediate page): Complete resolution failure (NXDOMAIN or SERVFAIL for your primary domain), NS record changes, A record pointing to unknown IP, MX record deletion
High (alert within minutes): A/AAAA record changes, CNAME target changes, TXT record modifications (SPF, DKIM, DMARC), unusually high resolution latency
Medium (notify during business hours): TTL changes, SOA serial number changes without corresponding record changes, secondary nameserver discrepancies
Low (weekly report): Minor TTL drift, propagation completion status after planned changes, DNS query volume anomalies

Reducing false positives

DNS monitoring can be noisy if not configured carefully. Transient resolution failures happen -- recursive resolvers occasionally timeout, networks have brief hiccups, and anycast routing can cause temporary inconsistencies. To reduce false positives:

Require failures from multiple monitoring locations before alerting
Implement a brief confirmation delay (30 to 60 seconds) before firing critical alerts
Query multiple authoritative nameservers and only alert if the discrepancy is consistent
Whitelist expected changes during planned maintenance windows

Integration with incident management

DNS alerts should feed into your broader incident management workflow. When a critical DNS alert fires, it should create an incident, notify the appropriate team, and begin the response process automatically. Integrating DNS monitoring with your SSL monitoring provides a more complete picture, since DNS and TLS issues often occur together during infrastructure changes or attacks.

Alert routing

Route DNS alerts to the team that manages your DNS infrastructure, which may be different from the team that manages your application servers. In many organisations, DNS is managed by a platform or infrastructure team, and they need to receive these alerts directly rather than having them filtered through a general on-call rotation.

DNS Monitoring Checklist

Use this checklist to evaluate whether your DNS monitoring covers the essential bases. If you can tick every item, you have a robust DNS monitoring posture. If not, each unchecked item represents a gap that could allow a DNS-related incident to go undetected.

Record monitoring

A and AAAA records for all production domains and subdomains are monitored
MX records are monitored with alerts on any change
TXT records (SPF, DKIM, DMARC) are monitored for deletions and modifications
NS records are monitored with critical-severity alerts
CNAME records are monitored, including resolution of their targets
CAA records are monitored if you use them

Resolution monitoring

Resolution is tested from at least three geographic regions
Both authoritative and recursive resolver responses are checked
Resolution latency is tracked with thresholds for alerting
DNSSEC validation is monitored if deployed

Security monitoring

Registrar account has two-factor authentication and registrar lock enabled
NS delegation at the registrar matches your intended nameservers
Domain expiration date is monitored with renewal reminders
Certificate transparency logs are watched for unexpected certificate issuance
WHOIS registrant data is monitored for unauthorised changes

Operational practices

DNS changes follow a change management process with review and approval
TTLs are reduced before planned migrations and restored afterwards
Propagation is monitored during and after DNS changes
DNS monitoring alerts integrate with your incident management workflow
Regular audits compare live DNS against intended configuration

DNS is invisible when it works and catastrophic when it doesn't. By treating it as the critical infrastructure it is and monitoring it with the same rigour you apply to your application servers, you eliminate an entire category of silent, hard-to-diagnose failures. Combined with solid uptime monitoring across the rest of your stack, DNS monitoring ensures that the very first step of every user's interaction with your service works reliably, every time.

Start monitoring your infrastructure today

50 free monitors, no credit card needed. Set up in under 30 seconds.

Get started free