Koozali.org: home of the SME Server

robots/slurpers/spiders/site-downloaders

Craig Jensen

robots/slurpers/spiders/site-downloaders
« on: October 31, 2003, 02:29:46 AM »
Hi.

I have been studying the options available through my SME 5.6 server to 'control' those who insist on downloading my entire site with their slurping applications.  I have noted a few that offer limited help.  

I have lately been stopping httpd for a given time and then re-starting, just so I can use a bit of MY OWN bandwidth for awhile :-)  (BTW, does anyone else find that this seems to be more and more of a problem?)

Please if you will, enlighten me on methods you have used that efficiently 'choke' these bandwidth hogs...

Thank you for your replies.

Craig Jensen

Reinhold

Re: robots/slurpers/spiders/site-downloaders
« Reply #1 on: October 31, 2003, 02:51:33 AM »
Craig:

For those automatic grabbing tools you may want to make a robots.txt file...
for more info have a look at http://www.searchengineworld.com/robots/robots_tutorial.htm

Michael Smith

Re: robots/slurpers/spiders/site-downloaders
« Reply #2 on: October 31, 2003, 10:33:08 AM »
Not a problem here, because I'm not making anything available that anybody would want much of ... no binaries!

Kirk Ferguson

Re: robots/slurpers/spiders/site-downloaders
« Reply #3 on: October 31, 2003, 08:10:50 PM »
Hello.  We're using a .htaccess in the primary, something like is discussed at this site:

http://www.webmasterworld.com/forum13/687-3-15.htm

We copied 90e-smithAccess10primary to /etc/e-smith/templates-custom/etc/httpd/conf/httpd.conf/90e-smithAccess10primary

and added :

in

Options +FollowSymLinks

and changed:

AllowOverride None to :

AllowOverride All

Expanded and restarted httpd

This allows you to choose which spiders you want to allow, and to redirect those you don't want to a 404 page or another site.

Craig Jensen

Re: robots/slurpers/spiders/site-downloaders
« Reply #4 on: November 01, 2003, 12:47:59 AM »
Thank you for the responses.  I can see use for each option mentioned.

Thanks again

Craig Jensen

Charlie Brady

Re: robots/slurpers/spiders/site-downloaders
« Reply #5 on: November 01, 2003, 02:03:13 AM »
Kirk Ferguson wrote:

> Hello.  We're using a .htaccess in the primary, something
> like is discussed at this site:

There's never any need to use .htaccess, since you can use template fragments.

> This allows you to choose which spiders you want to allow,
> and to redirect those you don't want to a 404 page or another
> site.

You can do that with a custom template fragment. Using .htaccess means that you have your spider information in two places rather than one, and if you are not careful, might open up a security hole.

Charlie

Kirk Ferguson

Re: robots/slurpers/spiders/site-downloaders
« Reply #6 on: November 01, 2003, 02:58:26 AM »
Charlie,

Thanks for the help.   So I should try to include the commands (or something like them) in a custom template fragment rather than using that .htaccess file?  

Should this be part of the httpd.conf section for the primary?  

I sure don't need any more security worries, but the server this runs on is a a slow line and seems to attract more spiders than other servers I work with for some reason.

Kirk