0

For almost 48 hours now we've been experiencing sporadic DNS outages for A and CNAME records. Have mostly been tracking the issue via ping and nslookup on Windows.

Lookups for 'www' (A), 'img1' (CNAME), 'store' (A) have come back as not found (windows ping or nslookup says it just cant find the host) - andon one online DNS tester I even saw an NXDOMAIN response.

I'm pretty sure that somewhere in the DNS 'chain' there is a cached NXDOMAIN response coming back thats still getting cached after 46 hours now.

I've even seen a case where - using nslookup - i've done a lookup on a CNAME record img1.example.com 10 times within 5 seconds and had negative and positive responses from the same verizon DNS server within a second.

Like I said this has happened for 48 hours now. The 'outage' occurs only briefly for a few minutes, but has been seen from at least 4 differnet geographical locations/networks.

I thought the bad record would have cleared itself out by now, but I'm hoping that finding the offending DNS server I can at least try to contact them - or find out whose fault it is.

Answers to obvious questions

  • DNS currently godaddy, has not been changed at all
  • domain has been active with DNS on godaddys hosted DNS (ns41.domaincontrol.com) for 3 years
  • Problem observed on several differnet networks, verizon DSL, comcast cable, verizon EVDO, site24x7 website
  • even happening with CNAME records to amazon A3 (i.e. 100% not a webserver problem and 100% DNS problem)
  • I'm not an expert, but the problem confirmed by two people that know more than i do. one thinks the most likely issue is a cached NXDOMAIN response somewhere.

Should we just wait up to 4 days before changing DNS providers? Is there a tool of some sort to trace where the DNS is coming from and find the actual server which is caching the NXDOMAIN response - or perhaps a service to just test hundreds/thousands of DNS servers for their responses?

2 Answers 2

2

I think you may have a conceptual issue with how DNS works.

Only DNS servers performing recursive resolution cache lookups. The DNS servers that the affected users on "verizon DSL, comcast cable, verizon EVDO, site24x7 website" are using are the ones caching lookups.

The root DNS servers, .com servers, and the servers authoritative for your domain aren't caching lookups, because they're not providing recursive resolution service.

It's possible (likely, actually, from what I'm seeing in Google searches) that GoDaddy is sporadically returning NXDOMAINs for your domain, and those NXDOMAINs are being cached by recursive resolvers. (Per RFC2308, they should be cached, at most, either the TTL for the zone as specified in the SOA, or the SOA minimum-- whichever is least.)

Apparently, GoDaddy's "free" DNS service isn't too highly regarded. I don't use it, personally, so I can't comment on it.

There is no central "list" of DNS servers providing recursive resolution for you to "test against". (I have one here in my house, and I could spin a few more up on VMs if I needed to...) You need a reliable provider to be authoritative for your domain, and you just have to hope that everybody else in the world honors TTLs and acts as "good DNS citizens".


Edit:

"Recursive resolution" is the process by which the a DNS server resolves a record for which it is not authoritative. The process starts with the root DNS servers, and proceeds recursively (that is, a process that loops back on itself) through all the authoritative DNS servers for the domains specified in the query until the last DNS server is reached and the desired resource record (or a negative response) is returned.

For a three-level query, like "www.example.com", the following occurs (I am leaving out the fact that, all along the way, the ISP DNS server is checking its cache in lieu of issuing queries to remote DNS servers and putting the results it receives into its cache, to make this clearer and a bit more simplistic):

  • Your PC issues a query to your specified DNS server (at your ISP, for example).

  • The ISP DNS server verifies that it doesn't have a response in cache, and then queries one of the root DNS servers.

  • The root DNS server, only being authoritative for the root, responds with a list of DNS servers authoritative for the gTLD specified in the query (.com, .net, .tv, .fu, etc). The protocol continues as such, w/ the full query always being sent to each successive DNS server throughout this process. Since it's not possible to know which DNS server will be authoritative for any given query and we want to minimize the number of round-trips, we always send the full domain in each query.

  • The ISP DNS server queries one of the DNS servers returned as authoritative for the gTLD specified.

  • The gTLD DNS server, being authoritative for the second-level domain (example, microsoft.com, example.com, etc) only, responds with a list of DNS servers authoritative for the second-level domain.

  • The ISP DNS server queries one of the DNS servers returned as authoritative for the second-level domain.

  • The DNS server authoritative for the second-level, being for the third-level domain (www.microsoft.com, ftp.example.com, etc), domain returns the record requested.

  • The ISP DNS server returns the record your PC queried back to your PC.

Typically ISPs offer recursive resolution services to their Customers. The DNS servers at hosting providers that are authoritative for Customer hosted domains generally don't provide recursive service (and will return the root servers if queried for domains they aren't authoritative for).

18
  • we're using godaddy's DNS as part of a dedicated server plan. whether its FREE or not i guess is irrelevant from what you're saying. if they really are returning NXDOMAIN then they are the weakest link and changing providers will fix the issue. we've only seen this in last 48 hours though and not before - and its real users seeing it. we have constant traffic coming to the site so caching rate should be very high. i'm not quite clear what you mean about recursive DNS. theres obviously some kind of chain of caching - my PC, my router, verizon DNS, etc.. should i raise my TTL ?
    – Simon
    Dec 4, 2009 at 2:27
  • So most likely the GoDaddy name servers haven't replicated the domain properly amongst themselves. What is the domain?
    – Sim
    Dec 4, 2009 at 2:31
  • rollingrazor.com - its been up for over a year on that DNS provider. the oddest behavior i saw earlier today was on DNS server 68.238.64.12 (local ISP) performing multiple lookups. i got NXDOMAIN then the correct CNAME resolution and then NXDOMAIN again within 10 seconds. problem fixes itself and then several hours later resolves incorrectly again
    – Simon
    Dec 4, 2009 at 2:35
  • Jeff Atwood and the Stack Overflow crew didn't like GoDaddy's DNS service much: blog.stackoverflow.com/2009/09/new-dns-provider Personally, I use the provider that Jeff went with, Dynamic Network Services: dynamicnetworkservices.com Dec 4, 2009 at 2:36
  • 1
    Not to put too fine a point on it but the servers that are authorative for the root domain (.) are technically the only "root" servers (a.root-servers.net, b.root-servers.net, etc.). The servers responsible for the gTLD's (.com, .edu, etc.) are not technically root servers as they don't exist at the root. These servers (a.gtld-servers.net, b.gtld-servers.net, etc.) exist one level below the root servers in the DNS hierarchy.
    – joeqwerty
    Dec 4, 2009 at 4:13
0

You really need to address the root of the problem. This is what I would do:

  1. Perform a whois query for the domain in question.

  2. Write down the name servers listed for the domain.

  3. Perform an NS nslookup against each of the name servers listed in whois and make sure they return the same list of name servers that whois listed.

  4. Query each name server from step 2 for the domain in question and make sure they all return the correct info. If any of the name servers return an NXDOMAIN response then you've found the culprit.

Any name servers that are listed in whois that aren't listed when you query the name servers individually need to be removed from whois.

Conversely, any name servers returned from your NS nslookup that aren't listed in whois need to be removed as name servers.

3
  • I'd highly recommend using dig to do the querying, rather than nslookup. That aside, I've found enough complaints out there to make me think that GoDaddy has load-balancers on their DNS server such that you won't be able to find the errant DNS servers as they're hidden behind the load-balancers. Dec 4, 2009 at 3:20
  • Don't think dig is available for Windows.
    – fpmurphy
    Dec 4, 2009 at 4:39
  • @fpmurphy dig is available for windows try a google search. It is certainly included as part of the BIND for windows software but requires a resolv.conf file to be created and a number of BIND dlls to be available.
    – Sim
    Dec 4, 2009 at 5:10

You must log in to answer this question.