Koozali.org: home of the SME Server

How to block ill-behaved spiders?

Offline holck

  • ****
  • 317
  • +1/-0
How to block ill-behaved spiders?
« on: September 10, 2023, 10:27:01 AM »
I want to block all requests from the "Bytespider" robot from Bytedance, but I can't make it work.

I've made a custom template fragment in /etc/e-smith/templates-custom/etc/httpd/conf/httpd.conf/VirtualHosts/26RewriteTraceAndTrack like this:
Code: [Select]
{
    $OUT =<<'HERE';
    RewriteEngine on
    RewriteCond %{REQUEST_METHOD} ^(TRACE|TRACK)
    RewriteRule .* - [F]
# Block Bytespider from Bytedance
    RewriteCond %{HTTP_USER_AGENT} "Bytespider"
    RewriteRule .* - [F,L]
#

HERE
}
And then made a signal-event e-smith-apache-update. The file /etc/httpd/conf/httpd.conf looks all right.
But the block doesn't seem to have any effect.
......

Offline ReetP

  • *
  • 3,740
  • +5/-0
Re: How to block ill-behaved spiders?
« Reply #1 on: September 10, 2023, 02:28:30 PM »
Not sure but the line might need to earlier like here.

https://stackoverflow.com/questions/51972679/how-to-block-a-specific-user-agent-in-apache

Similar to this straight after RewriteEngine on (JP will undoubtedly point out how wring I am!)

Code: [Select]
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT}  ^.*BingPreview.*$
RewriteRule . - [R=403,L]

And probably needs the regex too. Eg

Code: [Select]
^.*Bytespider.*$

If you have webapps installed run

Code: [Select]
signal-event webapps-update

Test the conf with

Code: [Select]
apachectl -t
Let us know.
...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline holck

  • ****
  • 317
  • +1/-0
Re: How to block ill-behaved spiders?
« Reply #2 on: September 10, 2023, 09:56:42 PM »
Sorry, my bad, it works smoothly.

I didn't notice that all requests from a Bytespider now get a 403 reply, just as they should.

Sorry!
......

Offline Jean-Philippe Pialasse

  • *
  • 2,767
  • +11/-0
  • aka Unnilennium
    • http://smeserver.pialasse.com
Re: How to block ill-behaved spiders?
« Reply #3 on: September 11, 2023, 07:19:26 PM »
another approach would be a local rule for fail2ban. 
i use to deploy one with a long list of spiders.  you might need to remove some according to your needs (could block yum in some cases if you have a repo on your server)