Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk organization user creation/addition feature #3651

Merged
merged 32 commits into from
Dec 4, 2024

Conversation

teovin
Copy link
Contributor

@teovin teovin commented Nov 7, 2024

This feature adds a new button to the organization user management view (/manage/organization-users):

image

Upon clicking, the bulk upload form comes up:

image

User makes the organization, affiliation expiration, and CSV file selections. Form cannot be submitted without an organization selection or with a non .csv file extension.

It is also invalid if it doesn't include all of the column headers, if it doesn't have any users at all, if it doesn't include the email data for a given user, if it has duplicate users, and if an email address is invalid.

image

image

image

image

image

image

image

The affiliation field can be toggled just like in the single user flow.

image

Once submitted, users that don't exist will be created; those that do will have their organization affiliation updated. New users will receive new user email with account activation email. Existing users will receive the user added to organization email.

image

image

A generic success message will be displayed at the end if there are no validation errors:

image

If some of the users get processed, but some can't due to those being an admin or registrar, below will be displayed:

image

If all users in the CSV are admins or registrars, the below error message will be displayed:

image

This PR also fixes a bug where the existing user email wasn't displaying the organization name.

image

Disclaimer: Any behavior and the wording here are open to feedback. This was my initial take on it :)

EDIT:

As of Nov 20th, I put this feature behind a feature flag so we can do some more testing on it in stage before a possible production deployment.

Copy link

codecov bot commented Nov 7, 2024

Codecov Report

Attention: Patch coverage is 91.11111% with 12 lines in your changes missing coverage. Please review.

Project coverage is 69.32%. Comparing base (1c1fd71) to head (cf5882d).
Report is 68 commits behind head on develop.

Files with missing lines Patch % Lines
perma_web/perma/views/user_management.py 83.72% 7 Missing ⚠️
perma_web/perma/forms.py 94.50% 5 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3651      +/-   ##
===========================================
+ Coverage    69.01%   69.32%   +0.31%     
===========================================
  Files           54       54              
  Lines         7478     7623     +145     
===========================================
+ Hits          5161     5285     +124     
- Misses        2317     2338      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@teovin teovin marked this pull request as ready for review November 7, 2024 17:06
@teovin teovin requested a review from a team as a code owner November 7, 2024 17:06
@teovin teovin requested review from bensteinberg and removed request for a team November 7, 2024 17:06
@christiansmith
Copy link

Reviewing this with @bensteinberg (running locally), we noticed a few minor details with the CSV handling form.

  1. The CSV upload fails without a header row or with incorrect column names.
    1. We could provide a template for users to download and edit
    2. We could also handle the case of a missing header row and validate the structure of the provided data rows.
  2. When there is a validation error, instructions should remain on the screen along with the error message.
  3. If a malformed CSV is uploaded, it would be good to have better feedback for the user than the current stack trace.

Copy link
Contributor

@bensteinberg bensteinberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, @christiansmith and I just looked at this together. I see this is added to the test for permissions, but otherwise has no tests.

Also, we were just discussing this with @rebeccacremona, who pointed out that we might prefer to assemble all the work and run it in a single database query, rather than going line-by-line. It might also make sense to fail if there are any validation errors, rather than create some users and not others.

@teovin
Copy link
Contributor Author

teovin commented Nov 15, 2024

@christiansmith @bensteinberg Pushed a couple of commits for the desired behavior and also added a test. I am checking with Clare about below to see what her preference would be. My initial approach was that it is good to process the good data and ignore the rest, as in something is better than nothing if it makes sense. Will check with Becky on the point of running a single db query.

It might also make sense to fail if there are any validation errors, rather than create some users and not others.

@teovin
Copy link
Contributor Author

teovin commented Nov 18, 2024

@christiansmith @bensteinberg I made some changes to the PR since you last looked, I'd appreciate it if you could look again. Clare preferred to process the valid users and notify the requestor for those that were invalid, and I updated the PR description and UI messaging to reflect that.

@teovin teovin requested a review from bensteinberg November 19, 2024 14:27
@bensteinberg bensteinberg requested review from rebeccacremona and removed request for bensteinberg November 22, 2024 15:41
Copy link
Contributor

@rebeccacremona rebeccacremona left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! This feature has been a lot of hard work, and it shows!

I noticed one tiny bug: if the CSV includes email addresses of existing users and uses capital letters, we will not find them and will try to create a new user with the capitalized email address (which will fail).

I think that bug is worth fixing before merging. But that is the only one I saw!

I also left some thoughts on readability, a few tweaks a person might consider making, a few things we might want to think about in the future, etc.

Please take or leave that feedback 🙂 I don't think any of it, even if it's something we want to do, should stand in the way of this being merged 🙂.

perma_web/perma/forms.py Outdated Show resolved Hide resolved

# validate the rows
seen = set()
row_count = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed something fun. If you want, you actually don't need the seen set or the row_count integer! if email in self.user_data is the same as if email in seen, and row_count is the same as len(self.user_data) 🙂 But if you prefer the aesthetics of seen and row_count, no objections here!

perma_web/perma/forms.py Outdated Show resolved Hide resolved
perma_web/perma/forms.py Outdated Show resolved Hide resolved
perma_web/perma/forms.py Outdated Show resolved Hide resolved
expires_at=expires_at
)
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I regret am having a little trouble reading this section. I think it is partly because of variable names.

created_user_affiliations is a list of affiliation objects (which makes sense to me) but, the similarly named updated_user_affiliations is a list of user objects.

similarly with preexisting_affiliations_set and all_user_affiliations; both are sets, and both are sets of user objects.

I am confused whether updated_user_affiliations and all_user_affiliations are in fact needed. Is this true? updated_user_affiliations is only used to make all_user_affiliations, and all_user_affiliations is the same as set(self.updated_users.values())? If that's true, I think this would be easier to read if you used that directly.


# create or update the affiliations of existing users
# affiliations that already exist
preexisting_affiliations = (UserOrganizationAffiliation.objects.filter(user__in=updated_user_affiliations,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a thought about readability. Do you think it would be easier to read if, right here after you get the existing affiliation objects, you did the update? Then, you could do the calculations to figure out if any new affiliation objects need to be created, and then create them, in its own section with its own comment. I think that might be clearer than doing them together, but you may disagree 🙂

preexisting_affiliations = (UserOrganizationAffiliation.objects.filter(user__in=updated_user_affiliations,
organization=organization))

preexisting_affiliations_set = set(affiliation.user for affiliation in preexisting_affiliations)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, here's a good Django + database thing to know about! Since you are iterating through the UserOrganizationAffiliation queryset and fetching each object's related user object, this will make one database query per object. Django includes a utility for situations like this, select_related: https://docs.djangoproject.com/en/5.1/ref/models/querysets/#select-related. If you call UserOrganizationAffiliation.objects.filter(...).select_related('user'), then Django will get all those user objects in one single query!

email_new_user(*args, obj, email_template, extra_context)
else:
send_user_email(obj.raw_email, email_template, extra_context)
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this error handling!

As mentioned, I am a little worried that, when we are actually calling out to the mailgun API and actually sending emails, that it will be too slow to do this inline, resulting in users who upload large CSVs seeing a 502 despite the request succeeding (like happens with large link batches). I think we might need to make this async.

Possibly to be discussed!! Even if we do, that definitely doesn't have to be part of this PR!!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will test in stage to see at what point we start seeing failures. Depending on that, I can add a limit to the rows.

@teovin teovin merged commit 44802ea into harvard-lil:develop Dec 4, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants