Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make adding all/multiple users to a given set more efficient #1846

Merged
merged 2 commits into from
Apr 11, 2023

Conversation

taniwallach
Copy link
Member

@taniwallach taniwallach commented Dec 1, 2022

The updated version of this PR is far more efficient than the earlier versions. Thanks to @drgrice1 for pointing out how to do multiple inserts in a single call to the lower layer which at least gets the prepared statement to be reused.

The improvement based on some very rough bench-marking for an assignment with 25 problems is almost 10 fold, once the inserts for all students are batched into a single group.

The improvement is almost certainly a function of the size of the assignment as the improvement is mainly due to getting the actual user-problem records to be inserted sequentially using one request of the prepared statement.


Add code to allow adding multiple records to the problem_user table (for a single set for a single user) as a batch, which allows reducing the number of test calls made to the database. It would be better to be able to add all records in a single database call, but at present the underlying database abstraction layer does not seem to have any support for inserting multiple records in one call.

For a large sample course (862 user accounts) and test assignments with 25 problems each, I observed the "until rendered" time of assigning a set to all users as taking 44-47 seconds with the old code and 38-44 seconds with the new code (on my laptop, using a Docker based setup). That seems to be about a 10% increase in performance. I had hoped for better, as the number of test calls made in the process decreased from 50 per assignment (2 per problem) to 1 per assignment, and there are less calls to subroutines in the process, but clearly that is not that larger bottleneck.

Copy link
Member

@drgrice1 drgrice1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some initial observations after looking at the code. I haven't had a chance to test it yet though.

Please stop using foreach. In Perl for and foreach are aliases for each other and there is no advantage whatsoever to using the longer word.

Also, please don't capitalize the first letter of a variable name. That is inconsistent with almost all other variable names in the code, and is just not standard for variable names in any language. Generally that is reserved for package/class names.

See the comments in the code, most of which are remarks about perlcritic and best practices in perl.

lib/WeBWorK/ContentGenerator/Instructor.pm Outdated Show resolved Hide resolved
lib/WeBWorK/DB.pm Outdated Show resolved Hide resolved
lib/WeBWorK/DB.pm Outdated Show resolved Hide resolved
lib/WeBWorK/DB.pm Outdated Show resolved Hide resolved
lib/WeBWorK/DB.pm Outdated Show resolved Hide resolved
lib/WeBWorK/DB.pm Outdated Show resolved Hide resolved
@taniwallach
Copy link
Member Author

I'm testing an additional small optimization, passing only an array of the the problem_id values, as that is the only thing the new assignMultipleProblemsToUser loop really needs. There is no need to pass the original array of the problem records and extract the problem_id each time.

@drgrice1
Copy link
Member

drgrice1 commented Dec 1, 2022

So looking at your approach here, I see why this doesn't cause as significant of a speed improvement as you expected. Really, you just moved the for loop that iterated over the users from the WeBWorK::ContentGenerator::Instructor package to the DB package. From the perspective of database access, that isn't really anything different. It is still putting each row into the database separately. To obtain a real speed improvement, you would need to go deeper and use the insert_fields method (or the insert_fields_i method for iterator usage -- which might actually help here) of the DB::Schema::NewSQL::Std package. Well, maybe not that deep, the insert_records (or insert_records_i) method of that same package would be deep enough.

To test this, naively change lines 2079-2086 that you have currently in DB.pm to eval { $self->{problem_user}->insert_records([ @problemList ]); };. Make sure you don't try to include users that already have the set assigned, because obviously there is no error checking here, but you should see a significant improvement. Getting the error checking to work per problem will take some more investigation, and may require using insert_records_i instead.

@drgrice1
Copy link
Member

drgrice1 commented Dec 1, 2022

Note that my suggestion still doesn't take it down to only one database access. But it does reuse the prepared SQL::Abstract statement, which does contribute a lot to slowing down things. To really take it down to one database insertion, methods would need to be added to Std.pm.

@drgrice1
Copy link
Member

drgrice1 commented Dec 1, 2022

So all of this error checking is for naught. None of it is reported back to the instructor doing the assigning in the case that a problem is actually already assigned to a user. The only thing that does happen is if an extreme error occurs (probably due to database failure or something), the content generator itself dies. If a problem is assigned to a user, it quietly fails, and the instructor would never know it. The return values of assignSetsToAllUsers and assignSetsToUsers are completely ignored even though the return values have this information.

Clearly, this needs work. Not this pull request, but how this has been handled.

@taniwallach
Copy link
Member Author

@drgrice1 Thanks for pointing out how to insert multiple records "almost" at once. It might not be a single call, but reusing the prepared statement certainly makes a huge difference. Using eval { $self->{problem_user}->insert_records([ @problemList ]); }; makes a huge improvement.

Runtime for the same test cases is now 13-17 seconds. We are about three times faster at this than before.

I put in a count based test to avoid doing the insert if there are already some problem records for the given user and set. There should not be any "reasonable" case where some user did not have the set assigned before the call and had some subset of problems assigned. Yes, the error is just breaking the page and displaying a single error message.

Yes, I was aware that I essentially moved the loop over problems (for a given user) from lib/WeBWorK/ContentGenerator/Instructor.pm to lib/WeBWorK/DB.pm but that enabled skipping many repeated database calls used as tests, which I hoped would help some.

@taniwallach taniwallach changed the title Attempt to make adding all users to a set a bit more efficient Make adding all/multiple users to a given set more efficient Dec 1, 2022
@drgrice1
Copy link
Member

drgrice1 commented Dec 1, 2022

Yeah, it seems that your first approach did help some. But we want better!

Also, it seems that you can still catch the database "set already assigned" exception using the insert_record method. However, you can't pinpoint which problem was already assigned. It succeeds in assigned problems up until it finds one that is assigned, and then throws that exception. That is probably fine though since none of it is actually looked at anyway.

@taniwallach
Copy link
Member Author

taniwallach commented Dec 2, 2022

@drgrice1 - I would appreciate a bit more advice.

There are two more linked ideas which might further improve performance. I am not sure they are a good idea, as it would be grouping DB changes for multiple students into "grouped transactions" (even though at present they are not really transactions) so problems would possible effect several students if something went wrong.

  1. I could extended the code of assignMultipleProblemsToUser and addUserMultipleProblems so that data for multiple students could be handled together one call thereby grouping many more insertions into each insert_records call.
  2. If doing so, the addUserSet calls would be similarly grouped (same group of students).

The downside is that if something goes wrong (ex. process getting killed or some DB error breaking the chain of work in the middle) the database records for a group of students will be messed up. (Ex. The entire group could all get user-set records, but some might not get the problem records, so their assignments would be broken.) In the current approach, damage would likely be restricted to the records for a single student.

What do you think - is the possible speedup enough of a benefit to take the risk of bigger problems when something fails?

If we were to start using InnoDB and transactions, grouping everything in a "real" transaction would probably avoid the issue, at least to a great extent.

@taniwallach taniwallach added Do Not Merge Yet PR to allow others to inspect -- not ready for prime time Enhancement enhances the software labels Dec 2, 2022
@drgrice1
Copy link
Member

drgrice1 commented Dec 2, 2022

I think it is worth experimenting with at least. The thing is that if something goes wrong like a DB error or process being killed, that will still all be the same. Those are extreme failures where you can't stop data corruption if that is a result. The assignments are going to be broken anyway.

@drgrice1
Copy link
Member

drgrice1 commented Dec 2, 2022

I think the speed benefits are worth it. The database failures should be rare in most cases. Transactions with InnoDB would be nice!

@taniwallach
Copy link
Member Author

I think the speed benefits are worth it. The database failures should be rare in most cases. Transactions with InnoDB would be nice!

Thanks - I will try to work on coding it next week. I'll try to set it up so if transactions are supported it will be done as a transaction.

@taniwallach
Copy link
Member Author

taniwallach commented Dec 4, 2022

I think the speed benefits are worth it. The database failures should be rare in most cases.

Now coded to handle inserts of problems for multiple students as groups, and also to create the UserSet records as a single batch.

Runtime on the same test case is now 4-6 seconds, typically under 5 seconds. Total performance improvement relative the old code before this PR is close to a factor of 10 better.

Transactions with InnoDB would be nice!

I tried to code something to add transaction support and group the revised "batches" into transactions.

See taniwallach@ff4c7ec

Your comments on the approach taken would be appreciated.

It seems to work fine in both a course where the course tables are myISAM (as the current default) and a test course created after I set $database_storage_engine = 'innodb'; in conf/site.conf and restarted the server. (I manually checked that the tables were in fact created as InnoDB tables.)

In the course using InnoDB I tested after importing the class list (again 862 students) and with an assignment with 25 problems, as before.

  • For the old develop code, "assign to all users" took 1.2 minutes (in each of 3 tests).
    • I conclude that InnoDB is clearly significantly slower when many single row inserts are being done.
    • I take that to indicate that we should assume that single row inserts are slower with InnoDB overall.
  • Using the code from this PR and the additional code from commit adding transaction support - "assign set to all users" took 4-6 seconds (in each of 5 tests), similar to that for the MyISAM based course without the additional transaction support, and more the 10 times faster than the old code for the InnoDB tables.
    • Once the inserts are handled a large batches, there is no noticeable different in the run-time of MyISAM and InnoDB tables for these "assign set to all users" operation.

@drgrice1
Copy link
Member

drgrice1 commented Dec 4, 2022

Yeah, innoDB is not as good when you have many database writes occurring in separate queries. One of innoDB's weakpoints in general is writing large amounts of data to the database as I understand. However, if with the approach of adding multiple sets and problems is close in speed between the two engines, then innoDB does have the huge advantage of being able to roll back the changes when something goes wrong. Potentially preventing database corruption. With myISAM if a failure occurs part of the way through the job, there is no way rollback, and that could leave the database in an unstable state.

I will try to look at this soon.

Copy link
Member

@pstaabp pstaabp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I didn't time it exactly, a course with about 500 users and a set of 25 problems, this sped up the assignment about 10 times.

@taniwallach
Copy link
Member Author

@pstaab This was not working after the changes to develop.
I have a branch at https://github.com/taniwallach/webwork2/tree/group-problems-adds-v2 which I think is now working.
One of the problems was that creating an empty set was failing before the last commit there.
Can you take a look at it. It is still not finalized.

@pstaabp
Copy link
Member

pstaabp commented Apr 6, 2023

I tested the new one and still works. With a course of about 800 users and a set of 25 problems, this went from 4+mins to 1+mins.

I wonder if the following will work to get the git stuff straightened out. Can you delete the local branch of group-problems-add, then rename the v2 one to this name, then force-push?

@drgrice1
Copy link
Member

drgrice1 commented Apr 6, 2023

@pstaabp: Your suggested method does work. I have used that before to change the branch for a pull request. It is really only the name of the pull request branch that matters and can't be changed.

@taniwallach taniwallach force-pushed the group-problems-adds branch from dc130f2 to 2c9e76f Compare April 6, 2023 17:59
@taniwallach taniwallach requested review from pstaabp and drgrice1 April 6, 2023 18:00
@taniwallach
Copy link
Member Author

taniwallach commented Apr 6, 2023

This is updated to a single clean commit onto the current develop, and I added the transaction style modifications.

I have updated (in the newer commits):

  • lib/WebworkWebservice/SetActions.pm
  • lib/WeBWorK/ContentGenerator/Instructor/UsersAssignedToSet.pm
    to make use of the same type of changes. In particular, adding many users to a set manually will now use the more efficient approach.

I have not tested the changes to the WebworkWebservice code, as I am not sure how it can be tested.

@taniwallach taniwallach removed the Do Not Merge Yet PR to allow others to inspect -- not ready for prime time label Apr 6, 2023
Copy link
Member

@drgrice1 drgrice1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few things that should be changed, but overall this is looking good.

lib/WeBWorK/Utils/Instructor.pm Outdated Show resolved Hide resolved
lib/WeBWorK/Utils/Instructor.pm Outdated Show resolved Hide resolved
lib/WeBWorK/Utils/Instructor.pm Outdated Show resolved Hide resolved
lib/WeBWorK/Utils/Instructor.pm Outdated Show resolved Hide resolved
lib/WeBWorK/Utils/Instructor.pm Outdated Show resolved Hide resolved
lib/WeBWorK/DB.pm Outdated Show resolved Hide resolved
lib/WeBWorK/DB.pm Outdated Show resolved Hide resolved
@taniwallach taniwallach force-pushed the group-problems-adds branch 3 times, most recently from 659c7e7 to d1e5dca Compare April 10, 2023 12:41
@drgrice1
Copy link
Member

I have encountered a problem that is introduced by this pull request. This occurs with dropped users on the UsersAssignedToSet.pm page. If you check one of these users for an assignment on that page, and then click on "Save" it gives the error assignSetToGivenUsers: error during addMultipleUserSets: argument 1 must be of type WeBWorK::DB::Record::UserSet at /opt/webwork/webwork2/lib/WeBWorK/Utils/Instructor.pm line 395.

The same happens from the 'Instructor Tools' page (which uses Assigner.pm) if you select a dropped user and a set, and click on "Assign" selected users to selected sets.

On the develop branch this works.

@drgrice1
Copy link
Member

There is another problem here. If you go to the UsersAssignedToSet.pm page, and click on 'Assign to All Current Users', and then click on that again you get the error assignSetToGivenUsers: error during addMultipleUserSets: argument 1 must be of type WeBWorK::DB::Record::UserSet at /opt/webwork/webwork2/lib/WeBWorK/Utils/Instructor.pm line 395. The problem is that @userSetsToAdd in assignSetToGivenUsers is the empty array because the set has already been assigned to all users. But the code doesn't exit and proceeds on to the addMultipleUserSets call anyway, and that fails.

lib/WeBWorK/DB.pm Outdated Show resolved Hide resolved
@taniwallach taniwallach force-pushed the group-problems-adds branch 2 times, most recently from 69edca9 to f277520 Compare April 10, 2023 19:12
taniwallach and others added 2 commits April 10, 2023 22:40
and add basic transaction support and use it when assigning sets/problems
to multiple users.

Add code to allow inserting multiple records for a single
user and set into the problem_user table as a batch, which
reduces the number of test calls made to the database,
and inserts them a batch so the prepared statement gets reused.

Make assignSetToGivenUsers so that assignSetsToUsers can use that
instead of looping over the users and calling assignSetToUser.

Add assignMultipleProblemsToGivenUsers which allows
inserting problems records for a given set to many users in one call.

Provide the setID and an array of problem_id values instead of an
array of globalProblem records.

Modify addUserMultipleProblems of lib/WeBWorK/DB.pm to use
   eval { $self->{problem_user}->insert_records([@problemList]); };
so that a single DB call makes all the inserts.
Test that no records for this user and set exist before making
that call.

These changes greatly speed up the process of "assign set to all users".

Have the UsersAssignedToSet code use the code to add users to
sets more efficiently (grouped DB operations).

Have the WebworkWebservice use the new methods when processing
assignSetToUsers and addProblem calls.
Corrections by @drgrice1

Co-authored-by: Glenn Rice <[email protected]>
@taniwallach taniwallach force-pushed the group-problems-adds branch from f277520 to 78c7364 Compare April 10, 2023 19:41
@taniwallach
Copy link
Member Author

@drgrice1 reported several bugs:

I have encountered a problem that is introduced by this pull request. This occurs with dropped users on the UsersAssignedToSet.pm page. If you check one of these users for an assignment on that page, and then click on "Save" it gives the error assignSetToGivenUsers: error during addMultipleUserSets: argument 1 must be of type WeBWorK::DB::Record::UserSet at /opt/webwork/webwork2/lib/WeBWorK/Utils/Instructor.pm line 395.

The same happens from the 'Instructor Tools' page (which uses Assigner.pm) if you select a dropped user and a set, and click on "Assign" selected users to selected sets.

Fixed. The assignSetToGivenUsers method now has an additional argument $alwaysInclude which is usually true (1) but it set to false in the call from assignSetToAllUsers. As a result, the test $ce->status_abbrev_has_behavior($User->status, "include_in_assignment") which was originally only in assignSetToAllUsers will be used to skip dropped users only in the call to assignSetToGivenUsers from assignSetToAllUsers.

The code will also now exit when @usersToProcess is empty to avoid requesting that the database layer handle a request with an empty list of additions.

The suggested changes to lib/WeBWorK/DB.pm were merged. Thanks for fixing the bug and improving the tests.

Copy link
Member

@drgrice1 drgrice1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Those issues are fixed. Thanks for this. This looks good now.

@pstaabp
Copy link
Member

pstaabp commented Apr 11, 2023

Checked this again. I'm not seeing any of the errors noted above and the improvement is still about 75%. Will merge.

@pstaabp pstaabp merged commit 607a817 into openwebwork:develop Apr 11, 2023
@taniwallach taniwallach deleted the group-problems-adds branch April 12, 2023 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement enhances the software
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants