Obsolete Releases > SME 8.x Contribs
a script to visual analze emailheaders using country geoip
Knuddi:
Just an interesting image of the countries that produce spam based on the volume I see at ScanMailX:
https://www.scanmailx.com/index.php?option=com_content&view=article&id=6&Itemid=34&lang=en
This is updated hourly and yes, USA is often scoring quite high :-)
purvis:
charlie
there are ALREADY lines of code to catch those you mentioned.
here are all the lines that help to remove private ip addresses.
--- Code: ---tempcount=$(echo "-$item" | LC_ALL=C grep -c "\-127\.0\.0\.1\|\-192\.168\.\|\-10\.\|\-169\.254\.")
if [ $tempcount -gt 0 ];then continue;fi
tempcount=$(echo "-$item" |LC_ALL=C grep -c "\-172\.1[6-9]\.\|\-172\.2[0-9]\.\|\-172\.3[0-1]\.")
if [ $tempcount -gt 0 ];then continue;fi
--- End code ---
There other ip address ranges that I am not sure about that need culling out as well listed here.
http://en.wikipedia.org/wiki/List_of_assigned_/8_IPv4_address_blocks
Originally my code checked for numeric values of ip addresses and that meant the ip addresses had to be converted into numeric values which also required more somewhat complicated computational formulas and comparisons. I did not do timing test to see what was faster. Grep matching vs value comparison. But Grep looked easier to view and edit from a programming point of view to me.
I will do some further testing of these other ip addresses listed in the above web page and if i find anything worthy. I will not hesitate to post quickly.
Stefano:
purvis, thank you for you effort, but I'd suggest you to use perl.. it has some modules that can help you and, most important, you don't have to reinvent the wheel
google -> "perl mail header parser" ;-)
purvis:
The above script was designed to do the job but not be efficient, just get the job done of recognizing unwanted countries.
There are going to only be two more scripts. They are both written to reduce the fat, functions, and increase speed where it could find it.
I think i identified 2 places of code where code might can be improved in the script code as it is now.
One is creating the semailheader string where the email header lines are created without line breaks(wraps). Every email header lines is suppose to be somewhere around 72 characters and that means most all email headers include some header lines that where wrapped. If sed could be used as the only tool to retrieve the email file and arrange the header with one sed command and multiple actions. That one line would increase the speed a lot.
The other is speeding up grep. I am making use of the LC_ALL for good or bad to speed grep up some. How this affects other locales, i do not know. You can remove the "LC_ALL" where it is located if you like.
This script is more of an no hand holding script that brings all the code above to one simple function and removes as much variable passing as i get out with my experience and and modifies lines of code that hopefully increase speed.
There is no perl here. I wanted something most all people can learn to do and modify with basic linux utilities tools found on most all linux machines.
On my older machine, single processor and slow single drive, this routine processes the same email that are short in length but have a substantial length header of about 28.5 per second or 1710 per minute processing 10,000 of the email files that are identical and the directory holds 100,000 of those email files.
But this script is not really the final desired result. This script just provides for a tool where somebody can scan some emails and add or take away from the code in a simple programming fashion. Most anybody should be able to remove code and add code to get their desired results wanted.
Not this script but the next script i will post will hopefully be the final one that is will become a script to check a single email file for whether it has a ip address from an unwanted country or not. We will see where that goes when it gets created today, but here is this one. I am all about speed so if there is noticeable speed up inside the function in the script. I am all ears. No perl in this code please. Actually the next script will have a near exact function and that function is where any speed up needs to be using basic linux utilities. As you can tell this bash script code is much more compact than the above.
--- Code: ---#!/bin/bash
safelistofcountries=" US, \| UM, \| GOV, \| MIL, "
wanipaddressofthisemailserver=""
function lookforrejectedcountry {
local iplocationrejected=0
local tempcount=0
local ipaddresses=""
local ipchecked=""
local semailheader=""
local filename=""
local item=""
filename=$1
ipchecked="-127.0.0.1-$wanipaddressofthisemailserver-"
semailheader=$(sed -e '/^$/q' $filename | tr -s '\t' '\040'| sed -e ':a;N;$!ba;s/\n / /g')
ipaddresses=$(echo "$semailheader" | LC_ALL=C grep "Received:" | LC_ALL=C grep -E -o '((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])')
if [ -z "$ipaddresses" ];then return 0;fi
set $ipaddresses
for item
do
tempcount=$(echo "$ipchecked" | LC_ALL=C grep -c "\-$item")
if [ $tempcount -gt 0 ];then continue;fi
ipchecked="$ipchecked$item-"
tempcount=$(echo "-$item" | LC_ALL=C grep -c "\-127\.0\.0\.1\|\-192\.168\.\|\-10\.\|\-169\.254\.")
if [ $tempcount -gt 0 ];then continue;fi
tempcount=$(echo "-$item" |LC_ALL=C grep -c "\-172\.1[6-9]\.\|\-172\.2[0-9]\.\|\-172\.3[0-1]\.")
if [ $tempcount -gt 0 ];then continue;fi
tempcount=$(geoiplookup "$item" | sed -e 's/.*://' | LC_ALL=C grep -i -c "$safelistofcountries")
if [ $tempcount -eq 0 ]
then
iplocationrejected=1
stemp=$(geoiplookup "$item"); echo "rejected $item $stemp"
fi
done
return $iplocationrejected
}
### -----------------------------------------------------------------------------------------------
### main program starts hear
###
/usr/bin/renice 19 -p $$ > /dev/null
/usr/bin/ionice -c3 -n7 -p $$ > /dev/null
echo "working ! readng emails ......................"
echo ""
countfile=0
IFS=$(echo -en "\n\b")
for filename in $(find /home/e-smith/files/users/*/Maildir/*/ -name "1*" -type f )
do
# echo $filename
lookforrejectedcountry "$filename"
if [ $? = 1 ];then echo "$filename";echo "rejected because of country";echo "";fi
let countfile+=1
done
echo ""
echo "total emails read $countfile"
echo ""
exit 0
--- End code ---
purvis:
This is pretty much getting to my final work on the finding ip addresses in emails that are not from accepted countries.
There was alot of work and testing put into this baby to make it fly in as a bash script.
I made it as flexible as possible for others but still needed efficient code.
Basically the bash script is nothing more than a function that is written for another program to call.
I did make it where you could override the default country list inside the bash script by making it an optional parameter.
You call this program from another program by passing it a filename and list of country abbreviations in parenthesis.
Do not forget to pass those parameters enclosed in parenthesis.
ex: bashscript "/home/e-smith/files/users/john doe/new/1abcde.eml" "US,GB,IT,"
Place a comma behind all country abbreviations, even the last one. Countries are matched on with the comma at the end.
This code runs much faster and I am sure there could be still some improvement.
There is a single "break" statement and it is use to exit the bash script upon the first country not in list of countries supplied to the script.
You can remove the "break" statement but it will add more time in processing the full list of ip addresses that came from the Received: line of a email's files header.
The code is short in length
--- Code: ---
#!/bin/bash
if [ ! -f "$1" ];then exit 255 ;fi
ipaddresses=$(grep -B10000 -m1 ^$ $1 | formail -cX "" | LC_ALL=C grep "^Received:" | \
LC_ALL=C grep -E -o '((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])')
ipaddresses=$(echo "$ipaddresses" | sort -r -u )
if [ -z "$ipaddresses" ];then exit 254;fi
listofcountries="US,UM,GOV,MIL,"
if [ ! -z "$2" ];then listofcountries=$2;fi
iplocationrejected=0
for item in $ipaddresses
do
set ${item//./ }
let "ipdecnumber=$1 * 256 ** 3 + $2 * 256 ** 2 + $3 * 256 + $4"
case $1 in
127) if [ $ipdecnumber -eq 2130706433 ];then continue;fi
;;
10) if [[ $ipdecnumber -gt 167772159 && $ipdecnumber -lt 184549376 ]];then continue;fi
;;
169) if [[ $ipdecnumber -gt 2851995647 && $ipdecnumber -lt 2852061184 ]];then continue;fi
;;
172) if [[ $ipdecnumber -gt 2886729727 && $ipdecnumber -lt 2887778304 ]];then continue;fi
;;
192) if [[ $ipdecnumber -gt 3232235519 && $ipdecnumber -lt 3232301056 ]];then continue;fi
;;
esac
stemp="$(geoiplookup $ipdecnumber | sed -e 's/.*: //g' -e 's/,.*//'),"
if [ $(echo $listofcountries | grep -c "$stemp") -eq 0 ]
then
let iplocationrejected+=1
break # comment out the break to not exit on the first hit of a country not in the list
fi
done
exit $iplocationrejected
--- End code ---
here is a calling code from another bash script program
i will leave it up to you to create a better routine in the following code such as a case clause
--- Code: ---/test/emailrrr "/home/e-smith/files/ibays/data/files/emails/$filename" "US,GB,IT,"
retval=$?
if [ $retval -eq 0 ];then echo "no ip address found outside of countries wanted";fi
if [ $retval -gt 0 ];then
if [ $retval -eq 255 ];then echo "file not found $filename";fi
if [ $retval -eq 254 ];then echo "no ip addresses found in received headers";fi
if [ $retval -lt 254 ];then echo "$retval ip address(s) outside countries range";fi
fi
--- End code ---
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version