-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sr_watch is leaking in flow test because of pclean tests #208
Comments
Note1: It is easy to reproduce just make pclean fail (ie in f91) by not adding the next header (ie in f90) then let sr_watch and pclean shovel continue to run. One can also run a big sample test (which is much longer to reproduce). |
Note1: Finally, the issue was not that easy to reproduce as both test couldnt suply clues on the problem, I will restart the same test but without the .inc reject enabled |
Note1: I managed to reproduce the problem with a 100 000 limit and a 2 days run.
Note3: The diff result of flow checks numbers done with a few hours of intervals, still unsure of the relevance of those results, except that senders and pclean module are still active after everything is done. Digging in the logs of those components will help me to understand.
Note3: After I stopped all processes I counted 837830 post_log, thus it should mean that there has been 837830 inotify events (unsure?)
Then, close to 94K msg have been ignored and I stil don't know why. Here is the code being executed in sr_winnow(aka sr_subscribe):
|
Note1: Here is a script to trace the propagation of one file:
Here is the route of that file with timing:
The routing seems ok in this particular case as we see that winnows ignores a msg that is already there and it goes through the flow test in 1m30. Ignored msgs do not seem to be related to the current issue. I should now find one that does not complete pclean to see how the system behave in this case. |
Note1: I added a switch to disable the summaries in the flow_check and tried to accelerate the computation
The snippet
The results
The snippet
The results
|
Note1: I continued yesterday's analysis:
Note2: I refined my script to classify watch posts with file extensions Note3: What I see now is that it is not only a problem with pclean tests and I need to investigate what was going on after shovels stopped |
Note 1: I think I am done with this task. Basically I removed the retry functionnality from the pclean plugin and add a msg delay in the config and it stabilized the flow test which was unmanageable because of the retry popping up constantly and duplicating a lot of msgs. |
I discovered that after fixing error and warning summary(#206), as I am now more sure of the counts, with big samples tests. It seems that pclean f90 generate inotify events, that are translated to sr_watch posts, when it is executing random tests which creates new files with new extension (S P C (this one is disable), hlink, slink, ...) which are then propagated and checked in other pclean tests...
Here is the output of flow check:
The text was updated successfully, but these errors were encountered: