Koozali.org: home of the SME Server

Contribs.org Forums => Koozali SME Server 10.x => Topic started by: holck on September 10, 2023, 10:27:01 AM

Title: How to block ill-behaved spiders?
Post by: holck on September 10, 2023, 10:27:01 AM
I want to block all requests from the "Bytespider" robot from Bytedance, but I can't make it work.

I've made a custom template fragment in /etc/e-smith/templates-custom/etc/httpd/conf/httpd.conf/VirtualHosts/26RewriteTraceAndTrack like this:
Code: [Select]
{
    $OUT =<<'HERE';
    RewriteEngine on
    RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK)
    RewriteRule .* - [F]
# Block Bytespider from Bytedance
    RewriteCond %{HTTP_USER_AGENT} "Bytespider"
    RewriteRule .* - [F,L]
#

HERE
}
And then made a signal-event e-smith-apache-update. The file /etc/httpd/conf/httpd.conf looks all right.
But the block doesn't seem to have any effect.
Title: Re: How to block ill-behaved spiders?
Post by: ReetP on September 10, 2023, 02:28:30 PM
Not sure but the line might need to earlier like here.

https://stackoverflow.com/questions/51972679/how-to-block-a-specific-user-agent-in-apache

Similar to this straight after RewriteEngine on (JP will undoubtedly point out how wring I am!)

Code: [Select]
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT}  ^.*BingPreview.*$
RewriteRule . - [R=403,L]

And probably needs the regex too. Eg

Code: [Select]
^.*Bytespider.*$

If you have webapps installed run

Code: [Select]
signal-event webapps-update

Test the conf with

Code: [Select]
apachectl -t
Let us know.
Title: Re: How to block ill-behaved spiders?
Post by: holck on September 10, 2023, 09:56:42 PM
Sorry, my bad, it works smoothly.

I didn't notice that all requests from a Bytespider now get a 403 reply, just as they should.

Sorry!
Title: Re: How to block ill-behaved spiders?
Post by: Jean-Philippe Pialasse on September 11, 2023, 07:19:26 PM
another approach would be a local rule for fail2ban. 
i use to deploy one with a long list of spiders.  you might need to remove some according to your needs (could block yum in some cases if you have a repo on your server)