-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ModSecurity: collections_remove_stale: Failed deleting collection #576
Comments
Original reporter: zertux |
zertux: Can this process be disabled ? or is it a necessity for mod_Security's operation ? |
I am having the exact same issue here on 2.7.5 |
Seeing similar behaviour on FreeBSD 8.4, Apache/2.2.25, mod_sec 2.7.5. mod_sec debug.log shows a couple of these: |
Hi, is there any update on this issue? Let me know if I can provide any further information. |
Same problem with 2.7.6 :-( |
same issue. is there a particular way to disable this behavior altogether? |
Same issue 2.7.7, any news? |
Soon after that error message, we get a few "Failed reading DBM file" and Apache is unreachable until the process is killed. |
We're using ZFS on FreeBSD. How about the other people affected by this bug? |
Same problem with Redhat, CentOS & Windows |
Thanks! |
I think we are having the same issue on ModSecurity 2.7.7 on FreeBSD 9.2 UFS with a reasonably busy server (but not SUPER busy, e.g. 100 httpd clients but lots of heavy database work). A few times a week, "something" happens that makes all Apache processes seem to spin on disk activity. Disk I/O by Apache shoots up to crazy levels (100MB/s) with CPU time spent mostly in system. In Apache status, children get stuck in "Logging" (L) state. It stays that way for 10-20 minutes, then calms down. I narrowed it down to the user of collections. The problem completely goes away when I disable an anti-bruteforce rule which stores a block flag/timer in a collection. When I disable this rule, the problem is immediately gone. When I kill Apache and re-enable the rule, the problem is back. When looking in
I now think that this condition probably happens when some clients hit the collection concurrently. The issue has become more problematic since switching to a twin-site SAN (I imagine that it takes longer before finishing a sync write because of the speed of light), so it might depend on certain I/O latency to happen. The fact that the issue always disappears by itself after a number of minutes (but NOT by restarting Apache!), leads me to think that probably it ends by some collection item being deprecated. I have no reliable way to reproduce the problem unfortunately, besides using collections in production and waiting until the site is crippled. |
stil no answer ? |
I don't have this problem myself anymore since we're now not using persistent collections on that server anymore. However, I found that the server was using Could other people experiencing this issue try the following:
I wonder if this makes the issue go away for you. |
finding in SecDataDir ip.pag and clearing it out helps ... but not sure why apache cannot do it itslef ... despite the fact that it is creating that file |
@antoxxa Thanks. Had the exact same problem with an ip.pag file that had grown to 115MB. (FreeBSD 9.3, Apache 2.2) |
Do we know if this is being worked on? Seems like a but like this thats been open for about 2 years would have been investigated by now. |
Same issue here with version 2.8.0. But error message is diffrent ModSecurity: collection_retrieve_ex: Failed deleting collection (name "ip", key "217.254.155.4_6246985e080008849486b7f63d4773c0b4b3f2dd") ... Also tried to delete the ip.pag file in the SecDataDir. But it do not help |
Has anyone seen this issue in 2.9.0? I don't have the time to test it. |
"ModSecurity: collections_remove_stale: Failed deleting collection XXX" 2.8.0, apache 2.2 mpm worker fixed thread count, suphp. Suggestions on solving this bug are welcome. Modsec devs, you might want to run a special separate cleaning thread/process or create another solution maybe? |
Hi @celesteking, You can cleanup your sdbm using a separated process: https://github.com/SpiderLabs/modsec-sdbm-util Still, you may want to reduce the timeout and the amount of data to save in the collections. If you try to save a lot of data for a extended period of time, it eventually fails. There are some studies to use memcache to centralize the storage here: https://github.com/SpiderLabs/ModSecurity/tree/memcache_collections |
That might work, thanks, we're gonna incorporate that util to run periodically. |
Speaking of, when modsec is in detectiononly mode, will the stale/expired entries be removed? |
We are also having this issue with mod_security 2.9.0, any update/plan on this bug? |
I've had to resort to deleting the file in a cronjob nightly. O_o |
if a IP address is detected to exceed the maximum number of IP request per Sec, request from this IP address will be blocked by Mod_security. refer to Mod_security Notes, collection's default expired time is one hour. |
@zcwang3 Did you ever figure out what causes this issue? Just noticed that its happening on one of our srvs. Modsec 2.9.0. The ip.pag file has hit 3.1G |
@dhaupin one of my colleague find that "collections_remove_stale: Failed deleting collection" happens when mod_security use apr_sdbm_delete() method to delete expired keys. Mod_security try to delete an sdbm record by key with Apache Portable Runtime Utility Library, but we do not know what causing this failure. Issue existing in the storage of mod_security, alternative solution is change to use other storage like Redis or Memcached. Unfortunately, refer to Implement Redis support as Collection backend on libmodsecurity, it is not implemented. Some one suggest using cron job to delete pag file every day and then restart apache, but it does not make sense to me. Since this issue exists 3 years without solution, and we find other similar issues, like: IP persistence storage seems to not clean up . We are struggling if it is reasonable to keep using mod_security in our production. Any other suggestion to fix this issue is appreciated! related source code is as following: |
Hi @zcwang3, We have already implemented support for LMDB [#1141] and InMemory back-end inside ModSecurity version 3. The support for Redis [#1139] and Memcache [#1140] are on our backlog, to be implemented soon. I think the root cause of this problem is that we are abusing the usage of the apr_sdbm implementation. IHMO was not designed to be used so intensively. That is why we advice to reduce the expiration time to a lower value, therefore reducing the amount of register in a given collection. Anything that you can change in the rules in that sense will reduce the size of the collection. The persistent collection keep a in-memory cache and from time to time it sync the memory content with the sdbm storage. Notice that each process from your webserver [worker, if you prefer] has a different in-memory collection that is synced with the sdbm from time to time. The error message that you saw, most likely is not really related to the size (in disk) of your collection. It is just saying that it is trying to delete a collection that was already deleted, probably by other process [that is one possibility]. The part of the code that calls the memory cleanup is available here: There was a initial/experimental support for memcache built on top of 2.x, which is available here: |
Felipe, |
Hi @marcstern, I don't think so, the warning is just a warning. It does not means that there is an error. In order circumstances the warning may be more meaningful. |
Not sure what kind of mpm is used in case of corruption, but in case of threaded mpm's the file locking does not work for threads belonging to the same process. In such case threads can be doing different operations on the sdbm although those operations presume they are protected by lock. I have a patch (#1224) that solves persistent storage data corruption issues, so it could probably fix this one as well in case the issue is observed with threaded (worker, event) mpm's. |
Hello, I had the same problem (debian 8.5, modsecurity 2.8.0). |
This issue is resolved by @mturk's #1224 pull request. The issue is that multiple threads are trying to delete the same collection. I dropped in some debug logging to prove this. I added the following (line offsets may be a bit off from master): @@ -676,6 +699,11 @@ int collections_remove_stale(modsec_rec *msr, const char *col_name) {
}
if (expiry_time <= now) {
+ if (msr->txcfg->debuglog_level >= 9) {
+ msr_log(msr, 9, "collections_remove_stale: Attempting to delete collection (name \"%s\", "
+ "key \"%s\")", log_escape(msr->mp, col_name),
+ log_escape_ex(msr->mp, key.dptr, key.dsize - 1));
+ }
rc = apr_sdbm_delete(dbm, key);
if (rc != APR_SUCCESS) {
msr_log(msr, 1, "collections_remove_stale: Failed deleting collection (name \"%s\", "
@@ -698,13 +726,14 @@ int collections_remove_stale(modsec_rec *msr, const char *col_name) { I ran a simple test using serial requests and saw this logging output (scroll horizontally to see session ID):
As you can see, the same |
…ging of collection delete problem in audit log when log level < 9 in audit log [Issue #576 - Marc Stern]
…ging of collection delete problem in audit log when log level < 9 in audit log [Issue #576 - Marc Stern]
…ging of collection delete problem in audit log when log level < 9 in audit log [Issue #576 - Marc Stern]
The fix in PR 1224 has been outstanding for a while with no comments. These entries make parsing our audit logs for real threats very very difficult without heavily modifying our tools. Is there at least a way to suppress the warnings for this in the audit log using a configuration directive until some fix is applied to the threading issues? |
@alexgoldsilver recently #1380 was merged into our mainline. |
You may want to have a look at: #1224 by enabling the global mutex for collections, this problem should be resolved. |
MODSEC-428: When server is under load of 1500 concurrent users Modsecurity starts to show the below error in apache's error log and the server CPU usage goes up to dangerous levels.
[Wed Oct 02 21:21:52 2013] [error] [client 109.127.81.10] ModSecurity: collections_remove_stale: Failed deleting collection (name "ip", key "109.127.81.138_9e4f93d096e2ca3744251c41dde47a1a7b26fa75"): Internal error [hostname "www.regayzanko.com"] [uri "/php5-fcgi/setting/student_collage_list.php"] [unique_id "UkxkMMaIMEoAADg9zw0AAAgr"]
I can see that this issue was patched see: https://www.modsecurity.org/tracker/browse/MODSEC-97 but i am using the latest modsecurity version "2.7.5" so i do not know why it is happening.
The text was updated successfully, but these errors were encountered: