-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance problems with "Assign Role" or "Remove Assigned Role" #2036
Comments
On my laptop on commit ab0b268 (pre-4.1 release) I loaded up production data and played around with granting and revoking a role on a dataverse ("togo") with 32 datasets and 194 files. To me it's unacceptably slow even with Solr is down, taking nearly 10 seconds:
Obviously, at ~25 seconds it's crazy slow when Solr is up, which is related to #50. I'm not sure what our target is for response time for the operation to complete but I guess we'll try to make it as fast as we can. ab0b268 contains the script I'm testing with. |
Based on the profiler (screenshot below), I'm pretty sure the slowdowns are in:
I'm not sure how to make the instantiation of datasets faster. This issue was noted by @raprasad in #1822 and he suggested storing a version of the dataset as JSON. I could dig into query counts like he did but I trust him. :) As for repetitive commits to Solr, I could look into batching the operations, similar in spirit to the batching we've talked about in the past. |
@pdurbin is an index created automatically for primary keys? |
The profiler shows significantly better performance when we reduce the number of calls to findDvObject so we pass around the DvObject itself more often, rather than looking it up over and over. The profiler also suggests that batching the commits to Solr may also yield a good performance gain but that's for a different commit. The index API endpoint "missing" is removed as part of this commit since it relies on code that was rewritten as part of this commit but it was little used anyway since there's a newer "skipIndexed" boolean for other index endpoints for catching up an index. We can revisit if we want to re-add the "missing" endpoint. Finally the createSolrDoc method was used to SearchUtil and tests were written for it. Here are the numbers for before and after this commit from running scripts/issues/2036/grant-role-then-revoke | Solr Up | Solr Down ------- | --------- | ------- grant | 0m25.653s | 0m9.699s revoke | 0m23.539s | 0m9.968s | Solr Up | Solr Down ------- | --------- | ------- grant | 0m7.422s | 0m10.348s revoke | 0m6.758s | 0m10.341s
In 75e8dba in a new branch I reduced the number of calls to findDvObject but I'd like to review it with @scolapasta before merging it. See the long commit message for details but it end with much shorter times than what I reported at #2036 (comment) before I started working on this issue. The numbers are from running scripts/issues/2036/grant-role-then-revoke
What lead me here was the profiler, which now no longer suggests that findDvObject is the culprit. Rather, the next thing to look at would be somehow batching the commits to Solr, but this was not the lower hanging fruit (findDvObject was): This screenshot can be compared with the "before" screenshot at #2036 (comment) |
The profiler shows significantly better performance when we reduce the number of calls to findDvObject so we pass around the DvObject itself more often, rather than looking it up over and over. The profiler also suggests that batching the commits to Solr may also yield a good performance gain but that's for a different commit. The index API endpoint "missing" is removed as part of this commit since it relies on code that was rewritten as part of this commit but it was little used anyway since there's a newer "skipIndexed" boolean for other index endpoints for catching up an index. We can revisit if we want to re-add the "missing" endpoint. Finally the createSolrDoc method was used to SearchUtil and tests were written for it. Here are the numbers for before and after this commit from running scripts/issues/2036/grant-role-then-revoke | Solr Up | Solr Down ------- | --------- | ------- grant | 0m25.653s | 0m9.699s revoke | 0m23.539s | 0m9.968s | Solr Up | Solr Down ------- | --------- | ------- grant | 0m7.422s | 0m10.348s revoke | 0m6.758s | 0m10.341s
@scolapasta as we discussed 75e8dba seemed safe enough to merge into the 4.0.2 branch (which will become 4.1) so I'm passing this to QA to check for any regressions around search permissions. I'll continue to look at the code locally however and may commit again if I can figure out a good way to batch the Solr commits. |
@sbarbosadataverse Can you check this out for performance and pass it back to me to check the other stuff? Thx. |
@kcondon I've been dealing with permission performance all week on production so of course I'll check;)) |
To reflect the fact that I see this issue as lower priority for 4.2.1 than #2648 I'm removing the "Critical" label. |
As we just decided in a meeting, this issue is like #2472 in that we think the performance improvements would be great but we don't want to hold up 4.2.1 so I'm pulling it out of that milestone. Recent changes have all been in branches so there is nothing to QA for 4.2.1. No changes. |
I demoed to @kcondon a fix I just pushed to a separate branch in 1770995 that makes the indexing of permissions asynchronous when granting or revoking roles. Per the output below, the time to grant was reduced from 14.5 seconds to 0.2 seconds. @scolapasta I'm going put this in the 4.2.3 milestone so you can review the code and decide if you want it in. I know in the past we've had to back out of async attempts but this one is smaller and more focused so I hope it doesn't cause a problem. Please note that this commit does not attempt to address the "6-8 seconds to look up each name" issues reported above which is an entirely different part of the code. It sounds like @scolapasta @michbarsinai and @landreev chatted about this the other day. Perhaps it should be treated as its own issue. Below is the before and after of my fix. The madagascar dataverse contains 46 datasets and a total of 472 files as of my testing. I'm not actually sure why this one is so slow. I speculated to @sekmiller that perhaps it's because it has a greater than average number of role assignments. BEFORE
AFTER
|
@kcondon @scolapasta as we discussed this morning I just merged the 2036-async-grant branch into 4.2.3. It contains a fairly small commit (1770995) that I described previously at #2036 (comment) . In short, the change is make the indexing part of changing role assignments asynchronous. Passing to QA. Some testing around assigning and removing roles would be appreciated, especially paying attention to how long it takes for known slow cases like the madagascar dataverse. I have not looked into memory usage, as previously mentioned at #2036 (comment) as "characteristic 6GB heap allocation/ free". Please note that while " 6-8 seconds to look up each name" has been mentioned at #2036 (comment) this is an entirely different part of the code and probably deserves its own issue. @scolapasta and @landreev discussed that slowness with @michbarsinai when he was recently in town. |
6-8s to look up user |
When we try to remove an Assigned Role it spins and when we refresh the page it does nothing. Discovered while meeting with @mcrosas when looking into AJPS.
The text was updated successfully, but these errors were encountered: