Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate Wrapped user data #874

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open

Aggregate Wrapped user data #874

wants to merge 17 commits into from

Conversation

NIDHI2023
Copy link
Contributor

@NIDHI2023 NIDHI2023 commented Oct 13, 2024

Summary

  • Building on this PR, adds total time TAs helped students, a user's most visited OH id, and most visited TA id as part of the wrapped data
  • Adds extra check when pushing to Wrapped collection that user must be active
    • Active is defined as all users need at least one OH visit, and TAs need at least one session where they answered a question.
  • Also rounds some of the time-related numbers to ensure integer results. This meant while testing the numbers weren't exactly accurate, but they also weren't completely far off and were always integer.

Test Plan

  • Inactive student/TAs are not added to wrapped collection
  • Total time is an integer always, and no negative numbers or infinity
  • Favorite OH and TA is accurate in test collections
  • Appears that nothing breaks with undefined values and collection is populated

Notes

  • In the test database, there were not that many visits (which makes sense) so the personality was Independent for everyone. However on the real data if this is also happening, perhaps the thresholds could be lowered so more users get a variety of personalities

Breaking Changes

None

  • I have updated the documentation accordingly.
  • My PR adds a @ts-ignore

@NIDHI2023 NIDHI2023 requested a review from a team as a code owner October 13, 2024 01:39
@dti-github-bot
Copy link
Member

dti-github-bot commented Oct 13, 2024

[diff-counting] Significant lines: 524.

Copy link
Contributor

@rgu0114 rgu0114 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a couple comments

officeHourCounts[answererId] = new Map<string, number>();
taCounts[answererId] = new Map<string, number>();
// Checking if ta already showed up as student and now as an answerer
} else if (userStats[answererId] && userStats[answererId]?.timeHelpingStudents === undefined) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this else if redundant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of the case where while iterating through questions, a user who is actually a ta comes up first as the asker, so they get an instance created without the timeHelpingStudents (as undefined). Then if they come up later as an answerer, indicating that they are a ta, the if statement only checks and gives all the fields if the ta doesn't already have an instance, so I think they wouldn't actually get the timeHelpingStudents field without this extra check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see

src/scripts/wrapped-fa24.ts Show resolved Hide resolved
stats.favTaId = Array.from(taCounts[userId].entries()).reduce((a, b) => a[1] < b[1] ? b : a)[0];
}
stats.numVisits = officeHourCounts[userId].size;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea for this I think it's ok if not every session has TAs assigned

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to answer your question, let's use most common answererId for a student as a proxy for their favorite TA since there's many courses that don't assign TAs to office hours in QMI and work backwards from there to determine favorite class and day of the week since class should correspond with favorite TA. We can get class pretty easily from sessionId, but maybe we run a separate script to label each session with a day of the week – we'll leave day of the week for later.

@@ -13,6 +13,9 @@ console.log('Firebase admin initialized!');
const db = admin.firestore();

// Firestore Timestamps for the query range
// EDIT: possibly make these dates as new constants in constants.ts,
// would make it easier to edit for other years

const startDate = admin.firestore.Timestamp.fromDate(new Date('2023-08-20'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remember to change these dates to sp24 and fa24

@NIDHI2023 NIDHI2023 changed the title [Draft] Aggregate Wrapped user data Aggregate Wrapped user data Oct 21, 2024
Copy link
Contributor

@rgu0114 rgu0114 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, now that the functionality is more or less there I think it'll be helpful to start thinking about code style and readability – which should also make verification a bit easier as we get into the last few weeks of development.

  1. Separating concerns into helper functions, for example async functions handling processSessions(answererId, sessionId) and updateWrappedDocuments().
  2. await can sometimes be inefficient when repeatedly called, and we should start thinking a bit more about performance for this script. It might be better to do something like the following: const sessionPromises = questionsSnapshot.docs.map(doc => sessionsRef.doc(doc.sessionId).get());
    const sessionDocs = await Promise.all(sessionPromises);
    Of course, this is made difficult when we're (more intuitively) iterating through questions, which each have their own sessionId, so one possibility would be to do some pre-processing to group questions with the same sessionId together beforehand so we can do this type of batch processing.
  3. Using a for...of loop is generally better for async operations. So for the loop at the end where we use userStats to update the Wrapped collection, you might do something like for (const [userId, stats] of Object.entries(userStats)) {.
  4. For all error scenarios, especially since we'll be rolling this out to production soon, let's collect all userIds in a separate list and log those out at the end so we can investigate any issues more easily.

Try to refactor the code and make these changes incrementally, testing as you go along.

Copy link
Contributor

@rgu0114 rgu0114 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple small things to improve code maintainability

If a user is only a student, they need to have at least one OH visit.
If a user is a TA, they need to have at least one TA session AND at least one OH visit as a student.
*/
if ((stats.numVisits > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a confusing conditional. Let's make it something like:

const hasVisits = stats.numVisits > 0;
const isTaActive = stats.timeHelpingStudents !== undefined || (TAsessions[userId]?.length > 0);
const hasFavoriteTa = stats.favTaId !== "";

if (hasVisits && isTaActive && hasFavoriteTa) {

}
}

if (!officeHourSessions[askerId]?.includes(sessionId)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might lead to a weird edge case error if officeHourSessions[askerId] is undefined since we're using optional chaining (?.). Let's make this:
officeHourSessions[askerId] = officeHourSessions[askerId] || []; if (!officeHourSessions[askerId].includes(sessionId)) { officeHourSessions[askerId].push(sessionId); }

stats.favTaId = Array.from(taCounts[userId].entries()).reduce((a, b) => a[1] < b[1] ? b : a)[0];
}

if (stats.favTaId && stats.favTaId !== "") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block of code getting the user's favorite class and title seems correct and precise to me, but is it a bit inefficient? For instance, if we've already identified the favorite TA, there should only be one class/title and we don't need to do all this ordering. Idk, there might just be some edge cases I haven't thought of.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly included this extra if statement because I wasn't sure if it was possible for a user to encounter the same TA in two classes. But if we can confirm that a student always has one class in common with the favorite TA and is a guaranteed precondition, I definitely agree that this isn't needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants