Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Submissions - too many emails being sent #2404

Closed
clari182 opened this issue Nov 10, 2023 · 15 comments
Closed

Submissions - too many emails being sent #2404

clari182 opened this issue Nov 10, 2023 · 15 comments
Assignees
Labels
Priority: High This should be assigned and move in front of current feature developments Type: Bug Something isn't working

Comments

@clari182
Copy link
Collaborator

Editors are receiving emails from submissions they didn't submit

@clari182 clari182 self-assigned this Nov 10, 2023
@smcgregor smcgregor added the Priority: Critical Should be worked next, or now. All hands on deck! label Nov 11, 2023
@smcgregor
Copy link
Collaborator

I am also receiving duplicate email notifications for the new incidents. I first received incidents 600 through 603 Nov 10, 2023, 10:57 PM, then again today at 10:52 AM

I am going to stop builds from running until this is diagnosed.

Adding the critical label.

@smcgregor
Copy link
Collaborator

I checked the builds. Last night failed on the Deploy Site step: https://app.netlify.com/sites/aiid/deploys/654f1883958de3000880cf4f

10:57:28 PM: /plugins/netlify-plugin-process-notifications (onSuccess event)  
10:57:28 PM: ───────────────────────────────────────────────────────────────────
10:57:28 PM: ​
10:57:28 PM: Processing pending notifications...
10:58:46 PM: Plugin error: since "onSuccess" happens after deploy, the build has already succeeded and cannot fail anymore. This plugin should either:
10:58:46 PM: - use utils.build.failPlugin() instead of utils.build.failBuild() to clarify that the plugin failed, but not the build.
10:58:46 PM: - use "onPostBuild" instead of "onSuccess" if the plugin failure should make the build fail too. Please note that "onPostBuild" (unlike "onSuccess") happens before deploy.
10:58:46 PM: ​
10:58:46 PM: Plugin "/plugins/netlify-plugin-process-notifications" failed  
10:58:46 PM: ─────────────────────────────────────────────────────────────────
10:58:46 PM: ​
10:58:46 PM:   Error message
10:58:46 PM:   ApolloError: Error message
10:58:46 PM:   Response not successful: Received status code 400
10:58:46 PM: ​
10:58:46 PM:   Plugin details
10:58:46 PM:   Package:        /plugins/netlify-plugin-process-notifications
10:58:46 PM:   Version:        0.0.1
10:58:46 PM: ​
10:58:46 PM:   Error location
10:58:46 PM:   In "onSuccess" event in "/plugins/netlify-plugin-process-notifications" from netlify.toml

The subsequent build made it through this step and displayed,

10:51:41 AM: Processing pending notifications...
10:53:03 AM: 58 notifications were processed!

But when I look at MongoDB, I am not sure whether all the notification types have been saved. Incident 598 has 5 entries {submission-promoted, new-incidents, entity, incident-updated, new-report-incident} in MongoDB for the notifications collection, while incident 601 has 3 {submission-promoted, new-incidents, entity}.

@smcgregor
Copy link
Collaborator

Steps needed before starting the build:

  1. Identify whether building the site is likely to trigger a third duplicate set of notifications
  2. Address (1) in the event it will produce a duplicate
  3. Improve the build script so that the notification process is more atomic. It should be very unlikely that we send notifications and fail to record that the notifications have been sent

@smcgregor
Copy link
Collaborator

The cron triggering build is in the codebase, which means if I remove the cron values it will trigger a build. The cron triggers the build by hitting a netlify endpoint, and netlify then runs the build. I changed the GitHub secret, NETLIFYBUILDURL, to be a placeholder value. After we address the issue, we can restore the secret to the value found here https://app.netlify.com/sites/aiid/configuration/deploys

Then the build will run again.

@clari182
Copy link
Collaborator Author

I think this is a different issue (but still critical).
The notifications are only markes as processed: true if the email has been sent correctly. It looks like the emails were sent but Mongo updates failed and the entries were still markes as processed: false on the second run.
I'm not sure why this could happen I'll have to look into it. @cesarvarela or @pdcp1 do you know why this could happen? Could it be related with the amount of pending notifications?

Maybe we could change the logic, and set the notifications to true before sending the emails, and revert to false if the email fails. That way we wouldn't encounter this issue again.

Otherwise, regarding submitters receiving emails for submission they didn't create the fix is ready for review.

@kepae
Copy link
Collaborator

kepae commented Nov 13, 2023

@clari182 how many notifications are pending/processed: false currently? Do we know if the most recent build successfully set the notification status this time, even if it ended up sending duplicates?

This might be something we want to monitor over time in Atlas or SendGrid.

@clari182
Copy link
Collaborator Author

@kepae there are no more processed: false notifications in prod at the moment, so the last build has correctly set the notifications status. On the next deploy it shouldn't send duplicates but only new notifications that might be created before deploy.

I also found another location where duplicate subscriptions/notifications were created. I think this is an edge case but needs fixing anyway.

@kepae
Copy link
Collaborator

kepae commented Nov 14, 2023

Thanks, per our discussion today of resuming builds and pushing the fixes to production, I'll be updating NETLIFYBUILDURL in the action secrets and pushing production (#2395).

Be on the lookout for notifications. If all goes well, we can downgrade this priority and evaluate if its fixed. :-)

@kepae kepae added Type: Bug Something isn't working and removed Priority: Critical Should be worked next, or now. All hands on deck! labels Nov 15, 2023
@smcgregor
Copy link
Collaborator

I received duplicate emails again and it looks like the last production build failed consistent with the prior build failure so I am going to be receiving a third notification. I am killing the NETLIFYBUILDURL URL again.

@smcgregor smcgregor added the Priority: Critical Should be worked next, or now. All hands on deck! label Nov 24, 2023
@smcgregor
Copy link
Collaborator

Turned off build and added critical label.

@kepae
Copy link
Collaborator

kepae commented Nov 28, 2023

The Netlify post-build plugin to process notifications was disabled in #2437.
I also restored the NETLIFYBUILDURL.

This way we can push code changes to the rest of the site while we continue debugging.

@kepae kepae added Priority: High This should be assigned and move in front of current feature developments and removed Priority: Critical Should be worked next, or now. All hands on deck! labels Dec 11, 2023
@kepae
Copy link
Collaborator

kepae commented Dec 21, 2023

@clari182 @pdcp1 following the migrations we sent, we look to be in good shape without duplicates.

Anything else before we re-enable notifications with #2503?

I'd like to help mitigate our debugging in the future by adding date created information: #2511

@pdcp1
Copy link
Collaborator

pdcp1 commented Dec 21, 2023

@kepae I'm OK re-enabling the notifications with #2503
In addition to that, #2459 is pending review and merge and I think it's important to have it in Production as well.

I'd like to help mitigate our debugging in the future by adding date created information: #2511

Great suggestion about adding dates 👍

@pdcp1
Copy link
Collaborator

pdcp1 commented Jan 23, 2024

@kepae @smcgregor I think we can close this issue if no duplicated emails were sent in the past weeks.

@kepae kepae closed this as completed Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: High This should be assigned and move in front of current feature developments Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants