Make adding all/multiple users to a given set more efficient #1846

taniwallach · 2022-12-01T16:09:21Z

The updated version of this PR is far more efficient than the earlier versions. Thanks to @drgrice1 for pointing out how to do multiple inserts in a single call to the lower layer which at least gets the prepared statement to be reused.

The improvement based on some very rough bench-marking for an assignment with 25 problems is almost 10 fold, once the inserts for all students are batched into a single group.

The improvement is almost certainly a function of the size of the assignment as the improvement is mainly due to getting the actual user-problem records to be inserted sequentially using one request of the prepared statement.

Add code to allow adding multiple records to the problem_user table (for a single set for a single user) as a batch, which allows reducing the number of test calls made to the database. It would be better to be able to add all records in a single database call, but at present the underlying database abstraction layer does not seem to have any support for inserting multiple records in one call.

For a large sample course (862 user accounts) and test assignments with 25 problems each, I observed the "until rendered" time of assigning a set to all users as taking 44-47 seconds with the old code and 38-44 seconds with the new code (on my laptop, using a Docker based setup). That seems to be about a 10% increase in performance. I had hoped for better, as the number of test calls made in the process decreased from 50 per assignment (2 per problem) to 1 per assignment, and there are less calls to subroutines in the process, but clearly that is not that larger bottleneck.

drgrice1

Here are some initial observations after looking at the code. I haven't had a chance to test it yet though.

Please stop using foreach. In Perl for and foreach are aliases for each other and there is no advantage whatsoever to using the longer word.

Also, please don't capitalize the first letter of a variable name. That is inconsistent with almost all other variable names in the code, and is just not standard for variable names in any language. Generally that is reserved for package/class names.

See the comments in the code, most of which are remarks about perlcritic and best practices in perl.

lib/WeBWorK/ContentGenerator/Instructor.pm

lib/WeBWorK/DB.pm

taniwallach · 2022-12-01T19:48:21Z

I'm testing an additional small optimization, passing only an array of the the problem_id values, as that is the only thing the new assignMultipleProblemsToUser loop really needs. There is no need to pass the original array of the problem records and extract the problem_id each time.

drgrice1 · 2022-12-01T20:59:12Z

So looking at your approach here, I see why this doesn't cause as significant of a speed improvement as you expected. Really, you just moved the for loop that iterated over the users from the WeBWorK::ContentGenerator::Instructor package to the DB package. From the perspective of database access, that isn't really anything different. It is still putting each row into the database separately. To obtain a real speed improvement, you would need to go deeper and use the insert_fields method (or the insert_fields_i method for iterator usage -- which might actually help here) of the DB::Schema::NewSQL::Std package. Well, maybe not that deep, the insert_records (or insert_records_i) method of that same package would be deep enough.

To test this, naively change lines 2079-2086 that you have currently in DB.pm to eval { $self->{problem_user}->insert_records([ @problemList ]); };. Make sure you don't try to include users that already have the set assigned, because obviously there is no error checking here, but you should see a significant improvement. Getting the error checking to work per problem will take some more investigation, and may require using insert_records_i instead.

drgrice1 · 2022-12-01T21:05:27Z

Note that my suggestion still doesn't take it down to only one database access. But it does reuse the prepared SQL::Abstract statement, which does contribute a lot to slowing down things. To really take it down to one database insertion, methods would need to be added to Std.pm.

drgrice1 · 2022-12-01T21:52:05Z

So all of this error checking is for naught. None of it is reported back to the instructor doing the assigning in the case that a problem is actually already assigned to a user. The only thing that does happen is if an extreme error occurs (probably due to database failure or something), the content generator itself dies. If a problem is assigned to a user, it quietly fails, and the instructor would never know it. The return values of assignSetsToAllUsers and assignSetsToUsers are completely ignored even though the return values have this information.

Clearly, this needs work. Not this pull request, but how this has been handled.

taniwallach · 2022-12-01T21:56:32Z

@drgrice1 Thanks for pointing out how to insert multiple records "almost" at once. It might not be a single call, but reusing the prepared statement certainly makes a huge difference. Using eval { $self->{problem_user}->insert_records([ @problemList ]); }; makes a huge improvement.

Runtime for the same test cases is now 13-17 seconds. We are about three times faster at this than before.

I put in a count based test to avoid doing the insert if there are already some problem records for the given user and set. There should not be any "reasonable" case where some user did not have the set assigned before the call and had some subset of problems assigned. Yes, the error is just breaking the page and displaying a single error message.

Yes, I was aware that I essentially moved the loop over problems (for a given user) from lib/WeBWorK/ContentGenerator/Instructor.pm to lib/WeBWorK/DB.pm but that enabled skipping many repeated database calls used as tests, which I hoped would help some.

drgrice1 · 2022-12-01T22:01:44Z

Yeah, it seems that your first approach did help some. But we want better!

Also, it seems that you can still catch the database "set already assigned" exception using the insert_record method. However, you can't pinpoint which problem was already assigned. It succeeds in assigned problems up until it finds one that is assigned, and then throws that exception. That is probably fine though since none of it is actually looked at anyway.

taniwallach · 2022-12-02T09:19:29Z

@drgrice1 - I would appreciate a bit more advice.

There are two more linked ideas which might further improve performance. I am not sure they are a good idea, as it would be grouping DB changes for multiple students into "grouped transactions" (even though at present they are not really transactions) so problems would possible effect several students if something went wrong.

I could extended the code of assignMultipleProblemsToUser and addUserMultipleProblems so that data for multiple students could be handled together one call thereby grouping many more insertions into each insert_records call.
If doing so, the addUserSet calls would be similarly grouped (same group of students).

The downside is that if something goes wrong (ex. process getting killed or some DB error breaking the chain of work in the middle) the database records for a group of students will be messed up. (Ex. The entire group could all get user-set records, but some might not get the problem records, so their assignments would be broken.) In the current approach, damage would likely be restricted to the records for a single student.

What do you think - is the possible speedup enough of a benefit to take the risk of bigger problems when something fails?

If we were to start using InnoDB and transactions, grouping everything in a "real" transaction would probably avoid the issue, at least to a great extent.

drgrice1 · 2022-12-02T11:41:54Z

I think it is worth experimenting with at least. The thing is that if something goes wrong like a DB error or process being killed, that will still all be the same. Those are extreme failures where you can't stop data corruption if that is a result. The assignments are going to be broken anyway.

drgrice1 · 2022-12-02T11:51:00Z

I think the speed benefits are worth it. The database failures should be rare in most cases. Transactions with InnoDB would be nice!

taniwallach · 2022-12-02T12:41:30Z

I think the speed benefits are worth it. The database failures should be rare in most cases. Transactions with InnoDB would be nice!

Thanks - I will try to work on coding it next week. I'll try to set it up so if transactions are supported it will be done as a transaction.

taniwallach · 2022-12-04T20:44:57Z

I think the speed benefits are worth it. The database failures should be rare in most cases.

Now coded to handle inserts of problems for multiple students as groups, and also to create the UserSet records as a single batch.

Runtime on the same test case is now 4-6 seconds, typically under 5 seconds. Total performance improvement relative the old code before this PR is close to a factor of 10 better.

Transactions with InnoDB would be nice!

I tried to code something to add transaction support and group the revised "batches" into transactions.

See taniwallach@ff4c7ec

Your comments on the approach taken would be appreciated.

It seems to work fine in both a course where the course tables are myISAM (as the current default) and a test course created after I set $database_storage_engine = 'innodb'; in conf/site.conf and restarted the server. (I manually checked that the tables were in fact created as InnoDB tables.)

In the course using InnoDB I tested after importing the class list (again 862 students) and with an assignment with 25 problems, as before.

For the old develop code, "assign to all users" took 1.2 minutes (in each of 3 tests).
- I conclude that InnoDB is clearly significantly slower when many single row inserts are being done.
- I take that to indicate that we should assume that single row inserts are slower with InnoDB overall.
Using the code from this PR and the additional code from commit adding transaction support - "assign set to all users" took 4-6 seconds (in each of 5 tests), similar to that for the MyISAM based course without the additional transaction support, and more the 10 times faster than the old code for the InnoDB tables.
- Once the inserts are handled a large batches, there is no noticeable different in the run-time of MyISAM and InnoDB tables for these "assign set to all users" operation.

drgrice1 · 2022-12-04T22:53:54Z

Yeah, innoDB is not as good when you have many database writes occurring in separate queries. One of innoDB's weakpoints in general is writing large amounts of data to the database as I understand. However, if with the approach of adding multiple sets and problems is close in speed between the two engines, then innoDB does have the huge advantage of being able to roll back the changes when something goes wrong. Potentially preventing database corruption. With myISAM if a failure occurs part of the way through the job, there is no way rollback, and that could leave the database in an unstable state.

I will try to look at this soon.

pstaabp

Although I didn't time it exactly, a course with about 500 users and a set of 25 problems, this sped up the assignment about 10 times.

taniwallach · 2023-04-05T14:26:25Z

@pstaab This was not working after the changes to develop.
I have a branch at https://github.com/taniwallach/webwork2/tree/group-problems-adds-v2 which I think is now working.
One of the problems was that creating an empty set was failing before the last commit there.
Can you take a look at it. It is still not finalized.

pstaabp · 2023-04-06T15:07:39Z

I tested the new one and still works. With a course of about 800 users and a set of 25 problems, this went from 4+mins to 1+mins.

I wonder if the following will work to get the git stuff straightened out. Can you delete the local branch of group-problems-add, then rename the v2 one to this name, then force-push?

drgrice1 · 2023-04-06T15:41:19Z

@pstaabp: Your suggested method does work. I have used that before to change the branch for a pull request. It is really only the name of the pull request branch that matters and can't be changed.

taniwallach · 2023-04-06T18:52:44Z

This is updated to a single clean commit onto the current develop, and I added the transaction style modifications.

I have updated (in the newer commits):

lib/WebworkWebservice/SetActions.pm
lib/WeBWorK/ContentGenerator/Instructor/UsersAssignedToSet.pm
to make use of the same type of changes. In particular, adding many users to a set manually will now use the more efficient approach.

I have not tested the changes to the WebworkWebservice code, as I am not sure how it can be tested.

lib/WebworkWebservice/SetActions.pm

lib/WeBWorK/Utils/Instructor.pm

drgrice1

There are a few things that should be changed, but overall this is looking good.

lib/WeBWorK/Utils/Instructor.pm

lib/WeBWorK/DB.pm

lib/WeBWorK/ContentGenerator/Instructor/UsersAssignedToSet.pm

drgrice1 · 2023-04-10T13:10:43Z

I have encountered a problem that is introduced by this pull request. This occurs with dropped users on the UsersAssignedToSet.pm page. If you check one of these users for an assignment on that page, and then click on "Save" it gives the error assignSetToGivenUsers: error during addMultipleUserSets: argument 1 must be of type WeBWorK::DB::Record::UserSet at /opt/webwork/webwork2/lib/WeBWorK/Utils/Instructor.pm line 395.

The same happens from the 'Instructor Tools' page (which uses Assigner.pm) if you select a dropped user and a set, and click on "Assign" selected users to selected sets.

On the develop branch this works.

drgrice1 · 2023-04-10T13:54:22Z

There is another problem here. If you go to the UsersAssignedToSet.pm page, and click on 'Assign to All Current Users', and then click on that again you get the error assignSetToGivenUsers: error during addMultipleUserSets: argument 1 must be of type WeBWorK::DB::Record::UserSet at /opt/webwork/webwork2/lib/WeBWorK/Utils/Instructor.pm line 395. The problem is that @userSetsToAdd in assignSetToGivenUsers is the empty array because the set has already been assigned to all users. But the code doesn't exit and proceeds on to the addMultipleUserSets call anyway, and that fails.

lib/WeBWorK/DB.pm

and add basic transaction support and use it when assigning sets/problems to multiple users. Add code to allow inserting multiple records for a single user and set into the problem_user table as a batch, which reduces the number of test calls made to the database, and inserts them a batch so the prepared statement gets reused. Make assignSetToGivenUsers so that assignSetsToUsers can use that instead of looping over the users and calling assignSetToUser. Add assignMultipleProblemsToGivenUsers which allows inserting problems records for a given set to many users in one call. Provide the setID and an array of problem_id values instead of an array of globalProblem records. Modify addUserMultipleProblems of lib/WeBWorK/DB.pm to use eval { $self->{problem_user}->insert_records([@problemList]); }; so that a single DB call makes all the inserts. Test that no records for this user and set exist before making that call. These changes greatly speed up the process of "assign set to all users". Have the UsersAssignedToSet code use the code to add users to sets more efficiently (grouped DB operations). Have the WebworkWebservice use the new methods when processing assignSetToUsers and addProblem calls.

@drgrice1

Corrections by @drgrice1 Co-authored-by: Glenn Rice <[email protected]>

taniwallach · 2023-04-10T20:01:49Z

@drgrice1 reported several bugs:

I have encountered a problem that is introduced by this pull request. This occurs with dropped users on the UsersAssignedToSet.pm page. If you check one of these users for an assignment on that page, and then click on "Save" it gives the error assignSetToGivenUsers: error during addMultipleUserSets: argument 1 must be of type WeBWorK::DB::Record::UserSet at /opt/webwork/webwork2/lib/WeBWorK/Utils/Instructor.pm line 395.

The same happens from the 'Instructor Tools' page (which uses Assigner.pm) if you select a dropped user and a set, and click on "Assign" selected users to selected sets.

Fixed. The assignSetToGivenUsers method now has an additional argument $alwaysInclude which is usually true (1) but it set to false in the call from assignSetToAllUsers. As a result, the test $ce->status_abbrev_has_behavior($User->status, "include_in_assignment") which was originally only in assignSetToAllUsers will be used to skip dropped users only in the call to assignSetToGivenUsers from assignSetToAllUsers.

The code will also now exit when @usersToProcess is empty to avoid requesting that the database layer handle a request with an empty list of additions.

The suggested changes to lib/WeBWorK/DB.pm were merged. Thanks for fixing the bug and improving the tests.

drgrice1

Yep. Those issues are fixed. Thanks for this. This looks good now.

pstaabp · 2023-04-11T18:19:13Z

Checked this again. I'm not seeing any of the errors noted above and the improvement is still about 75%. Will merge.

taniwallach requested review from pstaabp and drgrice1 December 1, 2022 16:09

drgrice1 reviewed Dec 1, 2022

View reviewed changes

taniwallach force-pushed the group-problems-adds branch from ec03bfb to 6e381b1 Compare December 1, 2022 19:14

taniwallach changed the title ~~Attempt to make adding all users to a set a bit more efficient~~ Make adding all/multiple users to a given set more efficient Dec 1, 2022

taniwallach force-pushed the group-problems-adds branch from 48a068b to 4f0d255 Compare December 1, 2022 22:07

taniwallach added Do Not Merge Yet PR to allow others to inspect -- not ready for prime time Enhancement enhances the software labels Dec 2, 2022

taniwallach mentioned this pull request Dec 2, 2022

LTI mass update via Mojo::IOLoop causes server response issues #1841

Open

taniwallach force-pushed the group-problems-adds branch from ec4e360 to dc130f2 Compare December 4, 2022 17:32

pstaabp approved these changes Jan 10, 2023

View reviewed changes

taniwallach force-pushed the group-problems-adds branch from dc130f2 to 2c9e76f Compare April 6, 2023 17:59

taniwallach requested review from pstaabp and drgrice1 April 6, 2023 18:00

taniwallach removed the Do Not Merge Yet PR to allow others to inspect -- not ready for prime time label Apr 6, 2023

drgrice1 reviewed Apr 9, 2023

View reviewed changes

lib/WebworkWebservice/SetActions.pm Outdated Show resolved Hide resolved

drgrice1 reviewed Apr 9, 2023

View reviewed changes

lib/WeBWorK/Utils/Instructor.pm Outdated Show resolved Hide resolved

drgrice1 reviewed Apr 9, 2023

View reviewed changes

taniwallach force-pushed the group-problems-adds branch 3 times, most recently from 659c7e7 to d1e5dca Compare April 10, 2023 12:41

drgrice1 reviewed Apr 10, 2023

View reviewed changes

lib/WeBWorK/DB.pm Outdated Show resolved Hide resolved

taniwallach force-pushed the group-problems-adds branch 2 times, most recently from 69edca9 to f277520 Compare April 10, 2023 19:12

taniwallach and others added 2 commits April 10, 2023 22:40

Update lib/WeBWorK/DB.pm

78c7364

Corrections by @drgrice1 Co-authored-by: Glenn Rice <[email protected]>

taniwallach force-pushed the group-problems-adds branch from f277520 to 78c7364 Compare April 10, 2023 19:41

drgrice1 approved these changes Apr 10, 2023

View reviewed changes

pstaabp merged commit 607a817 into openwebwork:develop Apr 11, 2023

taniwallach deleted the group-problems-adds branch April 12, 2023 18:46

pstaabp mentioned this pull request Apr 27, 2023

LibraryBrowser in develop #1960

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make adding all/multiple users to a given set more efficient #1846

Make adding all/multiple users to a given set more efficient #1846

taniwallach commented Dec 1, 2022 •

edited

Loading

drgrice1 left a comment

taniwallach commented Dec 1, 2022

drgrice1 commented Dec 1, 2022

drgrice1 commented Dec 1, 2022

drgrice1 commented Dec 1, 2022

taniwallach commented Dec 1, 2022

drgrice1 commented Dec 1, 2022

taniwallach commented Dec 2, 2022 •

edited

Loading

drgrice1 commented Dec 2, 2022

drgrice1 commented Dec 2, 2022

taniwallach commented Dec 2, 2022

taniwallach commented Dec 4, 2022 •

edited

Loading

drgrice1 commented Dec 4, 2022

pstaabp left a comment

taniwallach commented Apr 5, 2023

pstaabp commented Apr 6, 2023

drgrice1 commented Apr 6, 2023

taniwallach commented Apr 6, 2023 •

edited

Loading

drgrice1 left a comment

drgrice1 commented Apr 10, 2023

drgrice1 commented Apr 10, 2023

taniwallach commented Apr 10, 2023

drgrice1 left a comment

pstaabp commented Apr 11, 2023 •

edited

Loading

Make adding all/multiple users to a given set more efficient #1846

Make adding all/multiple users to a given set more efficient #1846

Conversation

taniwallach commented Dec 1, 2022 • edited Loading

drgrice1 left a comment

Choose a reason for hiding this comment

taniwallach commented Dec 1, 2022

drgrice1 commented Dec 1, 2022

drgrice1 commented Dec 1, 2022

drgrice1 commented Dec 1, 2022

taniwallach commented Dec 1, 2022

drgrice1 commented Dec 1, 2022

taniwallach commented Dec 2, 2022 • edited Loading

drgrice1 commented Dec 2, 2022

drgrice1 commented Dec 2, 2022

taniwallach commented Dec 2, 2022

taniwallach commented Dec 4, 2022 • edited Loading

drgrice1 commented Dec 4, 2022

pstaabp left a comment

Choose a reason for hiding this comment

taniwallach commented Apr 5, 2023

pstaabp commented Apr 6, 2023

drgrice1 commented Apr 6, 2023

taniwallach commented Apr 6, 2023 • edited Loading

drgrice1 left a comment

Choose a reason for hiding this comment

drgrice1 commented Apr 10, 2023

drgrice1 commented Apr 10, 2023

taniwallach commented Apr 10, 2023

drgrice1 left a comment

Choose a reason for hiding this comment

pstaabp commented Apr 11, 2023 • edited Loading

taniwallach commented Dec 1, 2022 •

edited

Loading

taniwallach commented Dec 2, 2022 •

edited

Loading

taniwallach commented Dec 4, 2022 •

edited

Loading

taniwallach commented Apr 6, 2023 •

edited

Loading

pstaabp commented Apr 11, 2023 •

edited

Loading