Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agroportal trafic analysis #173

Closed
syphax-bouazzouni opened this issue Jan 6, 2022 · 7 comments
Closed

agroportal trafic analysis #173

syphax-bouazzouni opened this issue Jan 6, 2022 · 7 comments
Assignees
Labels

Comments

@syphax-bouazzouni
Copy link
Contributor

syphax-bouazzouni commented Jan 6, 2022

The 06/01/2022 we had this issue (see capture below) when accessing to http://agroportal.lirmm.fr/

image

The cause

This may because we are spammed by bot IPs (see capture below of the count of request send to Agroportal the 06/01/2022)

image

Logs

And there is what we can see an the access log

image

We can observe that they target this two routes :

  • /feedback
  • /login
@syphax-bouazzouni syphax-bouazzouni changed the title agroportal trafic analys agroportal trafic analysis Jan 6, 2022
@jonquet
Copy link
Contributor

jonquet commented Jan 10, 2022

A noter l'existence d'un fichier robots.txt:
http://agroportal.lirmm.fr/robots.txt
http://bioportal.lirmm.fr/robots.txt

@syphax-bouazzouni Pourrait tu regarder si ces fichiers sont differents de ceux que j'avais customisé il y a qq mois. Me semble que j'avais mis plus de bot

@jonquet
Copy link
Contributor

jonquet commented Jan 18, 2022

Ce jour de STI-RX:
Je viens de mettre en place mod_security sur le apache de agroportal. Toutes les règles sont désactivée sauf la surveillance de feedback avec des critères un peu stricts.

La conf du module est sous /etc/httpd/modsecurity.d et /etc/httpd/conf.d/mod_security.conf
et pour /feedback une entrée est rajoutée dans /etc/httpd/conf.d/10-appliance.ontoportal.org_non-tls.conf

@jonquet
Copy link
Contributor

jonquet commented Jan 18, 2022

Le 'bon ' (celui edité en fevrier 2021) fihier robots.txt est ici:
https://github.com/ontoportal-lirmm/bioportal_web_ui/blob/development/config/robots/appliance.txt

@syphax-bouazzouni
Copy link
Contributor Author

syphax-bouazzouni commented Jan 24, 2022

Summary of the latest updates

STI-RX changes :

They have added mod_security and enabled only for the /feedback
The configuration are in /etc/httpd/conf.d/10-appliance.ontoportal.org_non-tls.conf
image

Did it work ?

it seem that no, prouve the following screenshoot of the /var/log/httpd/appliance.ontoportal.org_non-tls_access.log file for the 24/Jan/2022 (got using the following command grep "24/Jan/2022" appliance.ontoportal.org_non-tls_access.log | less )
image

What next ?

In the above screenshot of the access we remark that they are from robots and more specifically from the AhrefsBots.

So the next step is to disallow robots for some paths, like feedback

Update robot.txt file

Did it work ?

yes it worked. Below the screenshoot of the /var/log/httpd/appliance.ontoportal.org_non-tls_access.log file for the 26/Jan/2022 and we can see that there is no more /feedback or AhrefsBots.

image

But still we see that the Googlebot is still there (which is not a problem of him self) and index unwanted paths like /ajax.
So we need to :

@jonquet
Copy link
Contributor

jonquet commented Jan 26, 2022

Similar to the /ajax exclusion we can do an exclusion for /javacsripts which is also very frequent in the logs

Capture d’écran 2022-01-26 à 16 26 22

@syphax-bouazzouni
Copy link
Contributor Author

syphax-bouazzouni commented Jan 27, 2022

27/01/2022 update

A new bot to disalow DataForSeoBot
image
And add the disalow /javacsripts and /widgets/

@syphax-bouazzouni
Copy link
Contributor Author

Issue resolved, there is no more bots calling /feedback or /login in the access logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants