It looks like I'm trying to bump this message - but I'm not, honest. I hope that by the time I find a fix, there will be a lot of information in this thread to help anyone else to fix a similar problem.
The DNS name resolution worked fine until the normal system restart on 18 March 2004. From that point on, the DNS cache has been reporting this error when it needs to go out to a remote DNS:
2004-03-18 00:47:00.270946500 starting
2004-03-18 00:48:15.478242500 query 1 c0a8014c:0400:0069 1 servertest.e-smith.com.
2004-03-18 00:48:15.478266500 tx 0 1 servertest.e-smith.com. . c0a80108
2004-03-18 00:48:20.405080500 query 2 c0a8014c:0400:0069 1 servertest.e-smith.com.
2004-03-18 00:48:20.405104500 tx 0 1 servertest.e-smith.com. . c0a80108
2004-03-18 00:48:25.415432500 query 3 c0a8014c:0400:006a 1 servertest.e-smith.com.my.domain.
2004-03-18 00:48:25.415458500 tx 0 1 servertest.e-smith.com.my.domain. . c0a80108
2004-03-18 00:48:30.425058500 query 4 c0a8014c:0400:006a 1 servertest.e-smith.com.my.domain.
2004-03-18 00:48:30.425085500 tx 0 1 servertest.e-smith.com.my.domain. . c0a80108
2004-03-18 00:49:14.564942500 servfail servertest.e-smith.com. input/output error
2004-03-18 00:49:14.564967500 sent 1 40
2004-03-18 00:49:19.494778500 servfail servertest.e-smith.com. input/output error
2004-03-18 00:49:19.494803500 sent 2 40
2004-03-18 00:49:24.504774500 servfail servertest.e-smith.com.my.domain. input/output error
2004-03-18 00:49:24.504800500 sent 3 54
2004-03-18 00:49:29.514765500 servfail servertest.e-smith.com.my.domain. input/output error
2004-03-18 00:49:29.514792500 sent 4 54
Obviously something changed that day (17 March), but I have no idea what. Clam AV had been installed and running for a number of weeks, and there was nothing else of significance installed on the server. I'll check through the logs and see what I can find.
The following reference:
http://dqd.com/~mayoff/notes/djbdns/dnscache-log.htmllists these possible causes of that error (which all sound more like configuration or permissions problems):
Some of the errors that can make dnscache do this:
- failure to allocate storage for a received DNS packet
- failure to create a UDP socket
- failure to set the O_NONBLOCK flag on the UDP socket
- failure to bind the UDP socket to a port
- failure to transmit a packet to any of up to 16 nameservers and receive a response packet with an rcode of 0 (no error) or 3 (NXDOMAIN), with four attempts per nameserver
- failure to create a TCP socket
- failure to set the O_NONBLOCK flag on the TCP socket
- failure to bind the TCP socket to a port
- failure to connect the TCP socket to any of up to 16 nameservers (one attempt per nameserver), transmit a query to the nameserver, and receive a response packet with an rcode of 0 (no error) or 3 (NXDOMAIN)
Trouble is, I can ping most of the nameservers listed in /etc/dnsroots.global, though that file on SME 6 is actually out of date (based on what I understand - at least four or five of the IPs listed need changing).
-- Jason