-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulk organization user creation/addition feature #3651
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #3651 +/- ##
===========================================
+ Coverage 69.01% 69.32% +0.31%
===========================================
Files 54 54
Lines 7478 7623 +145
===========================================
+ Hits 5161 5285 +124
- Misses 2317 2338 +21 ☔ View full report in Codecov by Sentry. |
Reviewing this with @bensteinberg (running locally), we noticed a few minor details with the CSV handling form.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, @christiansmith and I just looked at this together. I see this is added to the test for permissions, but otherwise has no tests.
Also, we were just discussing this with @rebeccacremona, who pointed out that we might prefer to assemble all the work and run it in a single database query, rather than going line-by-line. It might also make sense to fail if there are any validation errors, rather than create some users and not others.
@christiansmith @bensteinberg Pushed a couple of commits for the desired behavior and also added a test. I am checking with Clare about below to see what her preference would be. My initial approach was that it is good to process the good data and ignore the rest, as in something is better than nothing if it makes sense. Will check with Becky on the point of running a single db query.
|
@christiansmith @bensteinberg I made some changes to the PR since you last looked, I'd appreciate it if you could look again. Clare preferred to process the valid users and notify the requestor for those that were invalid, and I updated the PR description and UI messaging to reflect that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great! This feature has been a lot of hard work, and it shows!
I noticed one tiny bug: if the CSV includes email addresses of existing users and uses capital letters, we will not find them and will try to create a new user with the capitalized email address (which will fail).
I think that bug is worth fixing before merging. But that is the only one I saw!
I also left some thoughts on readability, a few tweaks a person might consider making, a few things we might want to think about in the future, etc.
Please take or leave that feedback 🙂 I don't think any of it, even if it's something we want to do, should stand in the way of this being merged 🙂.
|
||
# validate the rows | ||
seen = set() | ||
row_count = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed something fun. If you want, you actually don't need the seen
set or the row_count
integer! if email in self.user_data
is the same as if email in seen
, and row_count
is the same as len(self.user_data)
🙂 But if you prefer the aesthetics of seen
and row_count
, no objections here!
expires_at=expires_at | ||
) | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I regret am having a little trouble reading this section. I think it is partly because of variable names.
created_user_affiliations
is a list of affiliation objects (which makes sense to me) but, the similarly named updated_user_affiliations
is a list of user objects.
similarly with preexisting_affiliations_set
and all_user_affiliations
; both are sets, and both are sets of user objects.
I am confused whether updated_user_affiliations
and all_user_affiliations
are in fact needed. Is this true? updated_user_affiliations
is only used to make all_user_affiliations
, and all_user_affiliations
is the same as set(self.updated_users.values())
? If that's true, I think this would be easier to read if you used that directly.
|
||
# create or update the affiliations of existing users | ||
# affiliations that already exist | ||
preexisting_affiliations = (UserOrganizationAffiliation.objects.filter(user__in=updated_user_affiliations, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a thought about readability. Do you think it would be easier to read if, right here after you get the existing affiliation objects, you did the update? Then, you could do the calculations to figure out if any new affiliation objects need to be created, and then create them, in its own section with its own comment. I think that might be clearer than doing them together, but you may disagree 🙂
preexisting_affiliations = (UserOrganizationAffiliation.objects.filter(user__in=updated_user_affiliations, | ||
organization=organization)) | ||
|
||
preexisting_affiliations_set = set(affiliation.user for affiliation in preexisting_affiliations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, here's a good Django + database thing to know about! Since you are iterating through the UserOrganizationAffiliation
queryset and fetching each object's related user
object, this will make one database query per object. Django includes a utility for situations like this, select_related
: https://docs.djangoproject.com/en/5.1/ref/models/querysets/#select-related. If you call UserOrganizationAffiliation.objects.filter(...).select_related('user')
, then Django will get all those user objects in one single query!
email_new_user(*args, obj, email_template, extra_context) | ||
else: | ||
send_user_email(obj.raw_email, email_template, extra_context) | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this error handling!
As mentioned, I am a little worried that, when we are actually calling out to the mailgun API and actually sending emails, that it will be too slow to do this inline, resulting in users who upload large CSVs seeing a 502 despite the request succeeding (like happens with large link batches). I think we might need to make this async.
Possibly to be discussed!! Even if we do, that definitely doesn't have to be part of this PR!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will test in stage to see at what point we start seeing failures. Depending on that, I can add a limit to the rows.
This feature adds a new button to the organization user management view (/manage/organization-users):
Upon clicking, the bulk upload form comes up:
User makes the organization, affiliation expiration, and CSV file selections. Form cannot be submitted without an organization selection or with a non .csv file extension.
It is also invalid if it doesn't include all of the column headers, if it doesn't have any users at all, if it doesn't include the email data for a given user, if it has duplicate users, and if an email address is invalid.
The affiliation field can be toggled just like in the single user flow.
Once submitted, users that don't exist will be created; those that do will have their organization affiliation updated. New users will receive new user email with account activation email. Existing users will receive the user added to organization email.
A generic success message will be displayed at the end if there are no validation errors:
If some of the users get processed, but some can't due to those being an admin or registrar, below will be displayed:
If all users in the CSV are admins or registrars, the below error message will be displayed:
This PR also fixes a bug where the existing user email wasn't displaying the organization name.
Disclaimer: Any behavior and the wording here are open to feedback. This was my initial take on it :)
EDIT:
As of Nov 20th, I put this feature behind a feature flag so we can do some more testing on it in stage before a possible production deployment.