Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add new notification dispatcher mechanism #975

Open
wants to merge 32 commits into
base: stable
Choose a base branch
from

Conversation

linuxbandit
Copy link
Member

I tried to account for all the questions in README.md

@linuxbandit linuxbandit requested a review from WikiRik April 13, 2023 20:38
@WikiRik
Copy link
Member

WikiRik commented Apr 13, 2023

Nice work! I'll try to take a look at this tomorrow while I'm at the headoffice, but after that it might be a while before I can really check this out

@linuxbandit linuxbandit force-pushed the feat/add-notification-dispatcher branch from c26a688 to 0821a96 Compare April 24, 2023 19:56
Copy link
Member

@WikiRik WikiRik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agora has been keeping me busy so I didn't have the time yet to play around with it. But this looks great!

Did you already think about when we want to replace the mailer with this? I guess this might be after we migrated prod server to use the new infrastructure? Since that might also make it easier for us to set up a temporary staging server to test this on a server before it goes to prod.

dispatcher/README.md Outdated Show resolved Hide resolved
1. [x] include traefik configuration to have the mailhog and rabbit on a subdomain instead of `domain:port`
1. [x] When RabbitMQ quits or crashes it will forget the queues and messages unless you tell it not to: we need to mark both the queue and messages as durable
1. [ ] Add auto-retry (DLQ). rabbit is smart and doesn't let me process a message again unless i force it.. https://devcorner.digitalpress.blog/rabbitmq-retries-the-new-full-story/
1. [ ] add the telegram queue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have users interact with the Telegram queue initially already or is it first only for us to get monitoring alerts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in which way interact?
yes at first i was thinking of using it for us, to see how it scales in small numbers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interact was not the correct word. I just meant receiving messages

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say at the beginning is for us

# TODO: send a notification to someone about adding a template
print(f"template {msg['template']}.jinja2 not found")
return
# TODO: check if there is an auto-delete after 30 minutes for stuck un-acked messages
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to be alerted when messages are stuck? Or is that something we can see in the rabbit stats as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to alert someone to add the template. In rabbit we can see the messages that are un-acked (so it was tried to be processed, but then left there)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still have to add the mechanism that the messages are rerouted to retry at some point in the future (presumably when "a person" adds the missing template

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's just the missing template can we not just reject the message and send an error back that the template does not exist?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for messages that are missing some values

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh i thought i replied

the point is that for the dispatcher to reject a message it needs to be synchronous (like it is now) whereas the usage of a message queue is exactly to avoid this synchronicity. Which means that as it currently is, the errors are 2: "missing template" (solvable by human intervention, so the message gets requeued with a delay and operator gets informed to add a new template); and "missing parameter" (unsolvable as there's the need to recreate information, so the message is discarded and a new ticket is created for programmers to fix it). Keep in mind, missing parameter means that there was not a thorough integration testing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Fabri on the lack of integration testing: I would accept that some emails are never sent and that we quickly add the proper template, thanks to an automated ticket/a message somewhere else. (Not sure what the proper error reporting channel can be)

I don't think that we need a mechanism to resend emails that never got sent in a first version, or ideally ever.

How can we easily develop this integration test? Setup one account and 3 events, and send an application, modify, cancel, confirm, and see that we do get one email for each action?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey flooo!
yes I think it's how you say, bootstrap the 3 containers and send simple API commands after a small provisioning

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the proper reporting channel would be slack at the beginning i'd say (we have the small wrapper notify.sh iirc)

@WikiRik
Copy link
Member

WikiRik commented Apr 24, 2023

And additionally just as a note; we're now introducing Python as a language for one of the microservices while the rest all uses JavaScript. I don't think it's a big issue since Python is broadly used and the dispatcher will likely not need to get regular new features, but I did want to mention it. Holds for the inclusion of the terraform scripts as well, and redis in gsuite-wrapper; as we expand and use more technologies it also can make it more difficult for new developers to get properly onboarded. We'll have to keep that balance in mind, but I think we're still on the good side of it. Replacing elixir with python is definitely a better situation for future maintainers

@linuxbandit
Copy link
Member Author

Replacing elixir with python is definitely a better situation for future maintainers

yes that was definitely a consideration. Python is simple and especially suitable because this is a web-worker; not an API. If it needed some HTTP method then I would probably have done it with node like the gsuite-wrapper

About the redis in the wrapper: in the end we don't use it though, right? (which is a bad choice imho, but this is not the thread for discussing)
About terraform: well it's simplifying stuff for us, given that we don't have newcomers (and to be fair, it's less scary than going to the console)

@WikiRik
Copy link
Member

WikiRik commented Apr 25, 2023

About the redis in the wrapper: in the end we don't use it though, right? (which is a bad choice imho, but this is not the thread for discussing)

I thought we did, and I do see the use case for it so we should use it.

About terraform: well it's simplifying stuff for us, given that we don't have newcomers (and to be fair, it's less scary than going to the console)

That actually is true. Newcomers need to learn stuff either way and building on current terraform scripts might be easier

That's why I mentioned I believe that we're only adding new technologies for good reasons but we should not forget to take into account what the impact for newcomers might be when we do so

Copy link

@HassanCehef HassanCehef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments on testing and on follow-up tasks.
You left TODOs for the error management already: there will go my next batch of questions.

as for improving the templates to make them maintainable, the source is https://github.com/AEGEE/mailer/blob/stable/lib/omsmailer_web/templates/page/confirm_email.html.eex, so I agree with your decision to purely copy them here.

dispatcher/docker/Dockerfile Outdated Show resolved Hide resolved
# TODO: send a notification to someone about adding a template
print(f"template {msg['template']}.jinja2 not found")
return
# TODO: check if there is an auto-delete after 30 minutes for stuck un-acked messages

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Fabri on the lack of integration testing: I would accept that some emails are never sent and that we quickly add the proper template, thanks to an automated ticket/a message somewhere else. (Not sure what the proper error reporting channel can be)

I don't think that we need a mechanism to resend emails that never got sent in a first version, or ideally ever.

How can we easily develop this integration test? Setup one account and 3 events, and send an application, modify, cancel, confirm, and see that we do get one email for each action?

<td align="center" style="padding: 56px 56px 28px 56px;" valign="top">
<a style="color: #3498DB; text-decoration: none;" href="my.aegee.eu"
target="_blank"><img style="width: 56px; height: 56px; border: 0;" alt="Logo"
src="https://my.aegee.eu/images/logo.png">

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

23.65Kb for an image that we display in a 56*56 pixels box. We should host a more compressed version logo_email.png, or even consider base64'ing the image in the email.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lulz

creates between 1 and 8 fake emails and puts in the queue
OR
tests all templates it finds in the folder
"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
Works by creating a queue that the dispatcher/main.py can read from, without setting up a local RabbitMq.
"""

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that you may think that as I define the queues, I am somehow creating them; but this program simply sends to the queues so we still need a local rabbit. I define the queues because it's how rabbit works

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, I thought the rabbit was created in-process in send.py, but no, we expect one to exist already. My suggestion is wrongly worded, but it would still be worth to explain that.

if(file_count != templates_tested):
print()
print()
print(f"Tested {templates_tested} templates but the folder contains {file_count}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we add a "os.Exit(1)" here?

I guess what I'm looking for, is a section in the README "how to dev locally?". And I'd say, if you modify the python, then you want to set ALL_TEMPLATES_TEST=false, and RANDOM_AMOUNT_TEST=true, but if you modify the templates, you surely want ALL_TEMPLATES_TEST=true.

When testing all templates, I think we should mention that we catch some errors (invalid jinja syntax, template not found) but that a lot of things are not covered by this test:

  • validity of HTML generated: does it look legible? (also on mobile, and without images)
  • validity of content: do we send the proper content: does a new candidature email contain the string " A new candidature to position"? I wonder if for this purpose (checking the content), some sort of golden tests would be nice: we don't have dynamic (dates) content, while we use random names for the fields, we could make generate_fake_payload static/deterministic.
  • finally, that the service calling the dispatcher does pass enough data to the mailer, so that no field is empty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still don't understand your points here, I read them once again but I am lost :) can you rephrase? also your bullet list to me seems unrelated to the first part of the comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair.

could we add a "os.Exit(1)" here?

right now, send.py always returns 0, regardless of whether tests passed or not. When we detect that one test case failed, we should return to the OS an exit code !=0 to notify it of a test failure. This can then be integrated with other tooling (in a fish terminal, this will show a red prompt for instance, to catch attention)

I guess what I'm looking for, is a section in the README "how to dev locally?". ...

We should explain the usefulness of each option, and when one wants to use ALL_TEMPLATES_TEST and when one wants to use RANDOM_AMOUNT_TEST.

When testing all templates,...

I would like to see written what this test can catch, and what this test doesn't cover at all. This way, a developer knows that they need to perform some extra manual testing to ensure that they didn't break anything: this script won't do it all for them (which is totally fine, as long as it's explicit)

@@ -0,0 +1,81 @@

<!DOCTYPE html>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pick this file, but this applies to all: it's a lot of copy and paste, and I guess we'd like to better template this in the future?
It's always the same logic of title/body: 2 to 5 lines change between each file.

The second point, also for a follow-up task, is to remove the massive tableS (plural 😭 ) and use proper HTML to align the content. Our emails are not content heavy, so this makes a nice follow-up item.

The two points are intertwined, but can be done in independent PRs: the HTML fix ideally would only be done once the HTML code has been factored in a single file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep in mind (but you know, as per your comment in the PR thread) this is only a blind copy paste from the mailer templates that were in elixir

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, I did some research after posting this.
I don't expect action here, but I'd leave it open, until tracked in Jira.

@linuxbandit linuxbandit force-pushed the feat/add-notification-dispatcher branch from 1f1d20d to bdf6d32 Compare September 20, 2023 06:35
@linuxbandit
Copy link
Member Author

linuxbandit commented Sep 20, 2023

I rebased as I know nobody is working on it (@HassanCehef fyi, just in case you were trying it)

@linuxbandit
Copy link
Member Author

with the last commits it's ready to be used now (at least code-wise). I will have to test it with gmail, or anyone can now anyway

had to downgrade python of one minor version to make it work
with the simple library slack_notifications
@WikiRik
Copy link
Member

WikiRik commented Sep 23, 2023

Before I review this; can you also make a PR on core that changes the mailer function to this? So I can deploy and test it on my own hosted instance. No need to do all the other microservices, they all use the same mailer function so I can just copy-paste it locally

@linuxbandit
Copy link
Member Author

linuxbandit commented Sep 23, 2023

just pushed the branch feat/featureflag, I will give you the tokens you need in private. But I may as well simplify it for you, so look at feat/featureflag-rik branch

Copy link
Member

@WikiRik WikiRik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go through all mails manually, but I already took a quick look at the file names to see if all mails are present. Just two comments

The new network_board_welcome email is missing; https://github.com/AEGEE/mailer/blob/stable/lib/omsmailer_web/templates/page/network_board_welcome.html.eex (I still need to integrate that into network, but I worked on the email already)

@@ -0,0 +1,71 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filename should be membership_expired

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find the corresponding entry anywhere in core 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use it in the 'new' core, so I'm fine with removing it. We can add a new template when we add the functionality again

@WikiRik
Copy link
Member

WikiRik commented Sep 29, 2023

I still need to go through the summeruniversity emails but I did all other submodules. These are the notes of which I'm pretty sure are not this way in current mailer;

  • custom.html works for discount codes, but not statutory massmailer. Likely something with markdown HTML, since I think the replacement data is done by statutory (not sure on that)
  • events_applied.html has the fields 'AEGEE experience', 'Ideal SU' and 'Motivation' which don't always exist on 'regular' events applications (but all three do for SUs). events_edited.html is fine

Apart from that; in the current mailer empty fields are just empty (as in '') but in dispatcher they show up as 'None'. Likewise 'true' and 'false' are 'True' and 'False'. But that's just python vs javascript. Either way we should make that nicer in a separate PR. It helps that mailhog is now easy to use for development

@WikiRik
Copy link
Member

WikiRik commented Sep 30, 2023

A bug in summeruniversity also allowed me to test the Slack notification message. I got the following error; IsADirectoryError: [Errno 21] Is a directory: '/usr/app/src/../tokens.json'. Might be my fault, might be because Docker volumes can't be a file but have to be a directory. Haven't looked into this

Copy link
Member

@WikiRik WikiRik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments about the various templates. Most comments are issues in current mailer as well or things for future PRs. I'll try to make some Jira issues from those later

return f"Agora {faker.sentence(nb_words=1)}"

# from constants.js (at least for core..)
MAIL_SUBJECTS = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the subject of the email. You have also set macros.head in all emails. Which one do you want to use?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! They are misaligned. But also, keep in mind that the macros.head gives the HTML title of the email, which of course should be the subject. However if they're not in sync, it is not a problem; also as a reminder (and this is valid for some other comments below), I was merely copy-pasting-translating the already-inefficient templates

<tbody>
<tr>
<td align="center" style="padding: 56px 56px 28px 56px;" valign="top">
<a style="color: #3498DB; text-decoration: none;" href="my.aegee.eu"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<a style="color: #3498DB; text-decoration: none;" href="my.aegee.eu"
<a style="color: #3498DB; text-decoration: none;" href="https://my.aegee.eu"

This should be fixed in all templates. Without the https:// it wanted to redirect to mailhog.{server}/my.aegee.eu

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i left it out of the batch because it's a bigger change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing general remarks about the templates in this file, since it's the first one.

my.aegee.eu is hardcoded in a lot of these which we should fix in a future PR so the proper domain is used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by proper you mean https, or also a variable (for our dev server)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable

<html lang="en">

{% import "snippets/macros.jinja2" as macros %}
{{ macros.head("MyAEGEE: New candidature submitted") }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are not being used, I have not reviewed them yet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think they are only used by the sender?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a related comment, we're using the subjects from constants instead of this title

<tbody>
<tr>
<td align="center" style="padding: 28px 56px;" valign="top">
<div style="font-family: &quot;lato&quot;, &quot;Helvetica Neue&quot;, Helvetica, Arial, sans-serif; line-height: 28px;font-size: 16px; line-height: 24px; color: #A7ADB5;">This email was autogenerated by MyAEGEE mailer.</div>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a future PR; might be nice to reference the helpdesk here for if people have questions

Comment on lines +49 to +50
If you have been accepted, the organizers of the Summer University will be in touch with you soon and provide you with more information.
If you have been rejected, there might be other Summer Universities with open calls. If there are any at this moment you can find them on MyAEGEE.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future PR; we should make this conditional

<td align="center" style="padding: 0 56px 28px 56px;" valign="top">
<div>
<a href="https://my.aegee.eu/summeruniversity/boardview" style="background-color:#1468C5;border-radius:50px;color:#ffffff;display:inline-block;font-family: 'lato', 'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:18px;font-weight: bold;line-height:40px;text-align:center;text-decoration:none;width:270px;-webkit-text-size-adjust:none;"
target="_blank">Go to the Summer University board view</a>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future PR; the length of this text is not good for the button

dispatcher/templates/summeruniversity_submitted.jinja2 Outdated Show resolved Hide resolved
dispatcher/tests/__init__.py Outdated Show resolved Hide resolved
@linuxbandit
Copy link
Member Author

The discussion is getting huge :D

in short what i have to do is to check
this and this right? because for the rest, it's optimisation of the templates that can wait another PR

Do i forget anything?

@WikiRik
Copy link
Member

WikiRik commented Sep 30, 2023

I believe so, the massmailer and token.json file/folder. And then in the python files there's cleanup and error handling and other TODOs still there. Not sure how much you want to tackle in this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants