Koozali.org: home of the SME Server
		Obsolete Releases => SME Server 7.x => Topic started by: mmccarn on November 02, 2006, 06:34:18 PM
		
			
			- 
				I have one site that repeatedly shows the following symptoms:
 
 1. Too many connections: 40 >= 40. Waiting one second
 SMTP connections reach the configured maximum "Instances" for smtpd, resulting in a continuous stream of "Too many connections: 40 >= 40. Waiting one second" in /var/log/qpsmtpd/current.
 
 2. Server Overload
 The server response slows to a crawl at this time.
 
 3. Duplicate Email
 Many emails received while the number of connections is overloaded like this are received two, three, or even more times.  Examination of the headers shows that the messages have been re-sent by the sending email host several hours apart.  The senders get notices from their mail servers saying that their email couldn't be sent because our server disconnected unexpectedly, and that the email will be retried at a later time.  I believe this happens if qpsmtpd & spamd take too long to respond "OK" to the sending server - the sending server disconnects and decides to retry the transmission later.
 
 4. Deny smtp connections using iptables:
 The system stays in this overloaded condition until I scan /var/log/qpsmtpd/* for all hosts that have been denied by either 'dnsrbl' or 'check_earlytalker', add them to iptables with a DENY rule, then run 'service qpsmtpd restart'.  The system is usually so overloaded that I cannot use "smtpd DenyHosts" and "signal-event email-update" - the "email-update" never finishes (I've left it running for a couple hours).  I have figured out a bunch of grep / sed commands to let me manually add "deny" rules to the appropriate "Inbound_TCP_####" chain. (I know that these rules will go away if I reboot; that's OK)
 
 --------------------------------------------------------------------
 
 I believe that this behavior results from a denial of service attack against my server that uses a sort of reverse-tarpit scheme to connect to my server then hold the connection open using a TCP window size of 0 bytes, but I don't know how to prove this.  If I am correct this seems guaranteed to succeed as a DoS technique since qpsmtpd allows each host to connect before beginning to apply the various plugins -- unless there is a qpsmtpd plugin that can manage a timed database of hosts who are not even allowed to open TCP connections - a sort of rbl list that is tied to iptables instead of qpsmtpd.
 
 Hardware
 Dell Dimension L933r
 256MB RAM
 40GB IDE HDD
 
 Software
 SME 7 Final installed from scratch
 Email forwarded to internal exchange server
 smeserver-sme7admin-1.1.0-1.noarch.rpm
 smeserver-spamassassin-features-0.0.2-0.noarch.rpm
 smeserver-saco-qmHandle-1.3.1-1.noarch.rpm
 smeserver-sarg-1.4.1-5.i386.rpm
 smeserver-sysmon-5.0-3.noarch.rpm
 sysstat-5.0.5-11.rhel4.i386.rpm
 mondo-2.0.9-2.rhel4.i586.rpm
 
 I plan to upgrade the RAM to 512MB to see if that has any effect, and would appreciate any insights from the community on what else I could do to resolve this issue.
 
 - Michael
- 
				Michael,
 
 My server is also getting hammered at the moment, in bursts around an hour apart. I am not yet getting to the maximum number of smtp connections and I think the reason is that I am running a 3GHz processor and 1.5Gb of RAM so I can process the connections quickly.
 
 What I am noticing though is that the number of DNS queries that the server is having to make is killing my upload bandwidth (512Kb) during the mail bursts.
 
 There has been a huge increase in spam and virus emails in the past week and I am wondering if the DNSBL providers are getting hammered with DNS queries which in turn is slowing down their response time, hence connections are held until a response is received.
 
 Do a tcpdump on port 53 and see what is happening.
 
 I have been considering changing the order in which qpsmtpd calls the plugins. Currently the order is
 
 
 
 auth/auth_cvm_unix_local 
 check_earlytalker
 count_unrecognized_commands
 # bcc disabled
 check_relay
 check_norelay
 whitelist_soft
 require_resolvable_fromhost
 check_basicheaders
 rhsbl
 dnsbl
 check_badmailfrom
 check_badrcptto_patterns
 check_badrcptto
 check_spamhelo
 check_goodrcptto extn -
 rcpt_ok
 virus/pattern_filter
 tnef2mime
 spamassassin
 virus/clamav
 queue/qmail-queue
 
 
 We have done a lot of checking before we go to the DNSBL. I think the DNSBL should be near the beginning either before or after check_earlytalker. If the IP is on the DNSBL's we are using then it should be dropped straight away.
 
 Jon
- 
				I don't see much going on on port 53 -- but I'm not having any trouble right now, either.  I'll check it again the next time I get "hit".
 
 I only have a 384K SDSL connection; if your 512 is saturated with DNS traffic, my 384K could be, too... Luckily, I have more bandwidth on the way!
 
 I am seeing quite a few connections denied by my temporary iptables "deny" rules (3400 attempts on Monday, about 1800 each on Tue, Wed & Today...)  I suppose this would "solve" the problem temporarily whether it's my suspected denial-of-service attack or a dnsbl overload.
 
 I thought that tinydns would cache the dnsbl queries, reducing dns bandwidth requirements.  Is this not the case?
 
 Have you ever tried the "greylisting" plugin?  It looks like it would significantly reduce the load on all of the other plugins, and is included with SME 7 but I've been afraid to turn it on since I've never used it...
- 
				mmccarn
 
 > SME 7 Final installed from scratch
 
 Have you done a
 yum update
 Reconfigure & Reboot
 
 
 > smeserver-spamassassin-features-0.0.2-0.noarch.rpm
 
 What is that rpm for and where did you get it from ?
 
 
 > I plan to upgrade the RAM to 512MB to see if that has any effect
 
 256Mb is minimal for sme7, I'd even up it to 1Gb if you can
 
 You could tweak the settings, see
 config show smtpd
 config show qpsmtpd
 add/reduce RBL lists ?
 disable RHSBL if enabled
 adjust qmail/qpsmtpd concurrency settings ?
- 
				
 > smeserver-spamassassin-features-0.0.2-0.noarch.rpm
 
 What is that rpm for and where did you get it from ?
 
 
 
 From here...
 
 http://mirror.contribs.org/smeserver/contribs/michaelw/sme7/
- 
				In addition to Ray's help... a sideways thought on CPU cycles. Yes, you 
 don't seem to have very much RAM but what about CPU horsepower? (933)
 
 SpamAssassin uses up quite a bit of CPU. With such volumes of traffic you
 might consider switching off anything to do with SA, get all the connections
 and possible traffic through the server in real time. Yes, you'd need to rely
 on the workstations picking out any malware but perhaps the choking of
 the server might be addressed. The other RBL etc plugins do quite a bit
 of incredibly effective filtering... here on my site they are 100% effective
 and I haven't yet (!?) had the urge to turn on SA (ever).
 
 Just a thought.
- 
				Have you done a
 yum update
 Reconfigure & Reboot
 Yes; I forgot to mention that.
 
 > smeserver-spamassassin-features-0.0.2-0.noarch.rpm
 
 What is that rpm for and where did you get it from ?
 This rpm from Michael Weinberger is "an RPM that enables site-wide Bayesian Filter, enables blacklist testing and intodruces spamassassin properties BayesAutoLearnThresholdSpam and BayesAutoLearnThresholdNonspam".  For more info, see  Spam Filter (http://forums.contribs.org/index.php?topic=32158.0)
 
 smtpd=service
 Authentication=disabled
 DenyHosts=58.246.197.21
 Instances=25
 InstancesPerIP=5
 MaximumDateOffset=0
 PatternsScan=disabled
 Proxy=disabled
 TCPPort=25
 TCPProxyPort=25
 VirusScan=enabled
 access=public
 status=enabled
 tnef2mime=enabled
I've changed "Instances" from 40 to 25  which seems to work fine - as long as things are working OK I haven't had over 9 concurrent connections.  I tried "Instances=80", but that totally killed me!
 qpsmtpd=service
 Bcc=disabled
 BccUser=maillog
 DNSBL=enabled
 LogLevel=8
 MaxScannerSize=25000000
 RBLList=sbl-xbl.spamhaus.org,whois.rfc-ignorant.org,dnsbl.njabl.org,relays.ordb.org
 RHSBL=enabled
 RequireResolvableFromHost=yes
 SBLList=dsn.rfc-ignorant.org
 access=public
 status=enabled
 
 Ray: Why do you recommend disabling RHSBL checks?
 
 piran: I agree that it is likely that disabling SA would solve my problem...but what is the effect on the amount of SPAM delivered to end-users?  The email is passed to an internal Exchange server that is running AVG Antivirus for Exchange as well as [oxymoron]Microsoft Intelligent[/oxymoron] Message Filtering, so my users might not get hit too hard...
 
 If Ray ("up it to 1Gb if you can") and piran ("what about CPU horsepower") are correct, should the SME 7 minimum hardware requirements (400MHz CPU, 128MB RAM) be changed?  (OK, I'll answer this one myself: The SME 7 Manual (http://smeserver.sourceforge.net/sme70/manual#h72-1) says, speaking of the Minimum Hardware Requirements Note that we do not believe such a system will provide satisfactory performance for features such as webmail, remote access via PPTP, Virus and Spam Scanners, which are cpu intensive will not perform well on this platform.  It's amazing what you learn if you read the text instead of just glancing at the pictures and charts!).
- 
				mmccarn
 
 > Why do you recommend disabling RHSBL checks?
 
 There was some comments on devinfo (I think) about the effectiveness or otherwise of RHSBL, there are issues that you should be aware of if you choose to enable it.
 
 DNS RBL is OK to enable, but you should also be aware of the consequences too, ie potential loss of email messages from legitimate senders wrongly listed on RBL's, particularly if you use non conservative lists.
 
 Neither of these is enabled by default on sme7.
 
 It was also related to the comment
 "What I am noticing though is that the number of DNS queries that the server is having to make is killing my upload bandwidth (512Kb) during the mail bursts."
 
 So disabling one of those look up features may help improve speed, a case of try it and see.
- 
				I thought that I had better reply to this to report that my issues have been solved.
 
 I had initially thought that at times the large number of DNS queries that the server was making, in response to the large number of spam emails I was receiving (> 2000/hour) was killing my upload bandwidth (512Kb). Internet access became impossible and mail was being delayed due to hosts not being found.
 
 Yesterday I finally figured out what was going on. It was my ADSL router that was choking the connections. It was basically running out of memory and cpu to process all the connections through the NAT firewall on the router.
 
 Here in NZ almost all ISP's use PPPoA for ADSL connections and bridging to disable the NAT firewall is not an option.
 
 However my ADSL router has an option called Half-bridge mode, which when enabled uses the DHCP server in the router to forward the public IP to the external interface of the server, bypassing the NAT firewall.
 
 Since I have reconfigured the ADSL router to half-bridge mode and the server external interface to use DHCP to get the IP address I have had no issues with the router choking the bandwidth.
 
 Jon
- 
				I tried upgrading RAM only to find that my new RAM caused freeze-ups (Aargh!)
 
 Of my two systems, I'm running one with RHSBL and Spamassassin disabled, and the other with an extensive IPTables "block" list for smtp servers that show up as early talkers or that are denied by DNSBL.
 
 Either solution seems to keep the server humming along.
 
 (both of these systems already had public IPs).
- 
				..... and the server external interface to use DHCP to get the IP address ....
 Jon
 
 
 How did you do that Jon?
 
 Rob
- 
				Rob,
 
 You need to log into the console as admin and reconfigure the external interface to use option 2
 
 2.    Use DHCP (send ethernet address as client identifier)
 
 The ADSL router in Half-bridge mode will use the DHCP server in the router to forward the public IP to the external interface of the SME, bypassing the NAT firewall on the router.
 
 Jon
- 
				A followup note on my original problems in order to help others with limited hardware resources:1. I never found any RAM that worked well in these systems although I didn't look too hard once the system performance stabilized.
 2. One system (PIII/256MB) has run OK for 35 days now with RHSBL disabled but with spamassassin enabled.
 3. A second system (PIII/192MB) has run OK for 35 days with both RHSBL and spamassassin disabled.
 4. I re-enabled spamassassin on the 192MB system 2 days ago, resulting in a clear increase in CPU, swap, & memory use, but the system has been fine so far.
 5. This client is heavily involved in US politics, so my original problems could well have been related to heavier-than-normal email or other traffic related to the 2006 US elections.
 
- 
				Rob,
 
 You need to log into the console as admin and reconfigure the external interface to use option 2
 
 2.    Use DHCP (send ethernet address as client identifier)
 
 The ADSL router in Half-bridge mode will use the DHCP server in the router to forward the public IP to the external interface of the SME, bypassing the NAT firewall on the router.
 
 Jon
 
 
 So an ifconfig at the console will show the public IP as the IP for the 'external' NIC?
- 
				Rob,
 
 Correct. If you like give me a call 021 326419.
 
 Jon