-
-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make adding all/multiple users to a given set more efficient #1846
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some initial observations after looking at the code. I haven't had a chance to test it yet though.
Please stop using foreach
. In Perl for
and foreach
are aliases for each other and there is no advantage whatsoever to using the longer word.
Also, please don't capitalize the first letter of a variable name. That is inconsistent with almost all other variable names in the code, and is just not standard for variable names in any language. Generally that is reserved for package/class names.
See the comments in the code, most of which are remarks about perlcritic and best practices in perl.
ec03bfb
to
6e381b1
Compare
I'm testing an additional small optimization, passing only an array of the the |
So looking at your approach here, I see why this doesn't cause as significant of a speed improvement as you expected. Really, you just moved the for loop that iterated over the users from the WeBWorK::ContentGenerator::Instructor package to the DB package. From the perspective of database access, that isn't really anything different. It is still putting each row into the database separately. To obtain a real speed improvement, you would need to go deeper and use the To test this, naively change lines 2079-2086 that you have currently in DB.pm to |
Note that my suggestion still doesn't take it down to only one database access. But it does reuse the prepared SQL::Abstract statement, which does contribute a lot to slowing down things. To really take it down to one database insertion, methods would need to be added to Std.pm. |
So all of this error checking is for naught. None of it is reported back to the instructor doing the assigning in the case that a problem is actually already assigned to a user. The only thing that does happen is if an extreme error occurs (probably due to database failure or something), the content generator itself dies. If a problem is assigned to a user, it quietly fails, and the instructor would never know it. The return values of Clearly, this needs work. Not this pull request, but how this has been handled. |
@drgrice1 Thanks for pointing out how to insert multiple records "almost" at once. It might not be a single call, but reusing the prepared statement certainly makes a huge difference. Using Runtime for the same test cases is now 13-17 seconds. We are about three times faster at this than before. I put in a count based test to avoid doing the insert if there are already some problem records for the given user and set. There should not be any "reasonable" case where some user did not have the set assigned before the call and had some subset of problems assigned. Yes, the error is just breaking the page and displaying a single error message. Yes, I was aware that I essentially moved the loop over problems (for a given user) from |
Yeah, it seems that your first approach did help some. But we want better! Also, it seems that you can still catch the database "set already assigned" exception using the |
48a068b
to
4f0d255
Compare
@drgrice1 - I would appreciate a bit more advice. There are two more linked ideas which might further improve performance. I am not sure they are a good idea, as it would be grouping DB changes for multiple students into "grouped transactions" (even though at present they are not really transactions) so problems would possible effect several students if something went wrong.
The downside is that if something goes wrong (ex. process getting killed or some DB error breaking the chain of work in the middle) the database records for a group of students will be messed up. (Ex. The entire group could all get user-set records, but some might not get the problem records, so their assignments would be broken.) In the current approach, damage would likely be restricted to the records for a single student. What do you think - is the possible speedup enough of a benefit to take the risk of bigger problems when something fails? If we were to start using InnoDB and transactions, grouping everything in a "real" transaction would probably avoid the issue, at least to a great extent. |
I think it is worth experimenting with at least. The thing is that if something goes wrong like a DB error or process being killed, that will still all be the same. Those are extreme failures where you can't stop data corruption if that is a result. The assignments are going to be broken anyway. |
I think the speed benefits are worth it. The database failures should be rare in most cases. Transactions with InnoDB would be nice! |
Thanks - I will try to work on coding it next week. I'll try to set it up so if transactions are supported it will be done as a transaction. |
ec4e360
to
dc130f2
Compare
Now coded to handle inserts of problems for multiple students as groups, and also to create the UserSet records as a single batch. Runtime on the same test case is now 4-6 seconds, typically under 5 seconds. Total performance improvement relative the old code before this PR is close to a factor of 10 better.
I tried to code something to add transaction support and group the revised "batches" into transactions. Your comments on the approach taken would be appreciated. It seems to work fine in both a course where the course tables are myISAM (as the current default) and a test course created after I set In the course using InnoDB I tested after importing the class list (again 862 students) and with an assignment with 25 problems, as before.
|
Yeah, innoDB is not as good when you have many database writes occurring in separate queries. One of innoDB's weakpoints in general is writing large amounts of data to the database as I understand. However, if with the approach of adding multiple sets and problems is close in speed between the two engines, then innoDB does have the huge advantage of being able to roll back the changes when something goes wrong. Potentially preventing database corruption. With myISAM if a failure occurs part of the way through the job, there is no way rollback, and that could leave the database in an unstable state. I will try to look at this soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I didn't time it exactly, a course with about 500 users and a set of 25 problems, this sped up the assignment about 10 times.
@pstaab This was not working after the changes to develop. |
I tested the new one and still works. With a course of about 800 users and a set of 25 problems, this went from 4+mins to 1+mins. I wonder if the following will work to get the git stuff straightened out. Can you delete the local branch of |
@pstaabp: Your suggested method does work. I have used that before to change the branch for a pull request. It is really only the name of the pull request branch that matters and can't be changed. |
dc130f2
to
2c9e76f
Compare
This is updated to a single clean commit onto the current develop, and I added the transaction style modifications. I have updated (in the newer commits):
I have not tested the changes to the WebworkWebservice code, as I am not sure how it can be tested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few things that should be changed, but overall this is looking good.
659c7e7
to
d1e5dca
Compare
I have encountered a problem that is introduced by this pull request. This occurs with dropped users on the UsersAssignedToSet.pm page. If you check one of these users for an assignment on that page, and then click on "Save" it gives the error The same happens from the 'Instructor Tools' page (which uses Assigner.pm) if you select a dropped user and a set, and click on "Assign" selected users to selected sets. On the develop branch this works. |
There is another problem here. If you go to the UsersAssignedToSet.pm page, and click on 'Assign to All Current Users', and then click on that again you get the error |
69edca9
to
f277520
Compare
and add basic transaction support and use it when assigning sets/problems to multiple users. Add code to allow inserting multiple records for a single user and set into the problem_user table as a batch, which reduces the number of test calls made to the database, and inserts them a batch so the prepared statement gets reused. Make assignSetToGivenUsers so that assignSetsToUsers can use that instead of looping over the users and calling assignSetToUser. Add assignMultipleProblemsToGivenUsers which allows inserting problems records for a given set to many users in one call. Provide the setID and an array of problem_id values instead of an array of globalProblem records. Modify addUserMultipleProblems of lib/WeBWorK/DB.pm to use eval { $self->{problem_user}->insert_records([@problemList]); }; so that a single DB call makes all the inserts. Test that no records for this user and set exist before making that call. These changes greatly speed up the process of "assign set to all users". Have the UsersAssignedToSet code use the code to add users to sets more efficiently (grouped DB operations). Have the WebworkWebservice use the new methods when processing assignSetToUsers and addProblem calls.
Corrections by @drgrice1 Co-authored-by: Glenn Rice <[email protected]>
f277520
to
78c7364
Compare
@drgrice1 reported several bugs:
Fixed. The The code will also now exit when The suggested changes to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Those issues are fixed. Thanks for this. This looks good now.
Checked this again. I'm not seeing any of the errors noted above and the improvement is still about 75%. Will merge. |
The updated version of this PR is far more efficient than the earlier versions. Thanks to @drgrice1 for pointing out how to do multiple inserts in a single call to the lower layer which at least gets the prepared statement to be reused.
The improvement based on some very rough bench-marking for an assignment with 25 problems is almost 10 fold, once the inserts for all students are batched into a single group.
The improvement is almost certainly a function of the size of the assignment as the improvement is mainly due to getting the actual user-problem records to be inserted sequentially using one request of the prepared statement.
Add code to allow adding multiple records to the
problem_user
table (for a single set for a single user) as a batch, which allows reducing the number of test calls made to the database. It would be better to be able to add all records in a single database call, but at present the underlying database abstraction layer does not seem to have any support for inserting multiple records in one call.For a large sample course (862 user accounts) and test assignments with 25 problems each, I observed the "until rendered" time of assigning a set to all users as taking 44-47 seconds with the old code and 38-44 seconds with the new code (on my laptop, using a Docker based setup). That seems to be about a 10% increase in performance. I had hoped for better, as the number of test calls made in the process decreased from 50 per assignment (2 per problem) to 1 per assignment, and there are less calls to subroutines in the process, but clearly that is not that larger bottleneck.