-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add new notification dispatcher mechanism #975
base: stable
Are you sure you want to change the base?
Conversation
Nice work! I'll try to take a look at this tomorrow while I'm at the headoffice, but after that it might be a while before I can really check this out |
c26a688
to
0821a96
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agora has been keeping me busy so I didn't have the time yet to play around with it. But this looks great!
Did you already think about when we want to replace the mailer with this? I guess this might be after we migrated prod server to use the new infrastructure? Since that might also make it easier for us to set up a temporary staging server to test this on a server before it goes to prod.
1. [x] include traefik configuration to have the mailhog and rabbit on a subdomain instead of `domain:port` | ||
1. [x] When RabbitMQ quits or crashes it will forget the queues and messages unless you tell it not to: we need to mark both the queue and messages as durable | ||
1. [ ] Add auto-retry (DLQ). rabbit is smart and doesn't let me process a message again unless i force it.. https://devcorner.digitalpress.blog/rabbitmq-retries-the-new-full-story/ | ||
1. [ ] add the telegram queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to have users interact with the Telegram queue initially already or is it first only for us to get monitoring alerts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in which way interact?
yes at first i was thinking of using it for us, to see how it scales in small numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interact was not the correct word. I just meant receiving messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say at the beginning is for us
dispatcher/dispatcher/main.py
Outdated
# TODO: send a notification to someone about adding a template | ||
print(f"template {msg['template']}.jinja2 not found") | ||
return | ||
# TODO: check if there is an auto-delete after 30 minutes for stuck un-acked messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to be alerted when messages are stuck? Or is that something we can see in the rabbit stats as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to alert someone to add the template. In rabbit we can see the messages that are un-acked (so it was tried to be processed, but then left there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i still have to add the mechanism that the messages are rerouted to retry at some point in the future (presumably when "a person" adds the missing template
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's just the missing template can we not just reject the message and send an error back that the template does not exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for messages that are missing some values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh i thought i replied
the point is that for the dispatcher to reject a message it needs to be synchronous (like it is now) whereas the usage of a message queue is exactly to avoid this synchronicity. Which means that as it currently is, the errors are 2: "missing template" (solvable by human intervention, so the message gets requeued with a delay and operator gets informed to add a new template); and "missing parameter" (unsolvable as there's the need to recreate information, so the message is discarded and a new ticket is created for programmers to fix it). Keep in mind, missing parameter means that there was not a thorough integration testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Fabri on the lack of integration testing: I would accept that some emails are never sent and that we quickly add the proper template, thanks to an automated ticket/a message somewhere else. (Not sure what the proper error reporting channel can be)
I don't think that we need a mechanism to resend emails that never got sent in a first version, or ideally ever.
How can we easily develop this integration test? Setup one account and 3 events, and send an application, modify, cancel, confirm, and see that we do get one email for each action?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey flooo!
yes I think it's how you say, bootstrap the 3 containers and send simple API commands after a small provisioning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also the proper reporting channel would be slack at the beginning i'd say (we have the small wrapper notify.sh iirc)
And additionally just as a note; we're now introducing Python as a language for one of the microservices while the rest all uses JavaScript. I don't think it's a big issue since Python is broadly used and the dispatcher will likely not need to get regular new features, but I did want to mention it. Holds for the inclusion of the terraform scripts as well, and redis in gsuite-wrapper; as we expand and use more technologies it also can make it more difficult for new developers to get properly onboarded. We'll have to keep that balance in mind, but I think we're still on the good side of it. Replacing elixir with python is definitely a better situation for future maintainers |
yes that was definitely a consideration. Python is simple and especially suitable because this is a web-worker; not an API. If it needed some HTTP method then I would probably have done it with node like the gsuite-wrapper About the redis in the wrapper: in the end we don't use it though, right? (which is a bad choice imho, but this is not the thread for discussing) |
I thought we did, and I do see the use case for it so we should use it.
That actually is true. Newcomers need to learn stuff either way and building on current terraform scripts might be easier That's why I mentioned I believe that we're only adding new technologies for good reasons but we should not forget to take into account what the impact for newcomers might be when we do so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments on testing and on follow-up tasks.
You left TODOs for the error management already: there will go my next batch of questions.
as for improving the templates to make them maintainable, the source is https://github.com/AEGEE/mailer/blob/stable/lib/omsmailer_web/templates/page/confirm_email.html.eex, so I agree with your decision to purely copy them here.
dispatcher/dispatcher/main.py
Outdated
# TODO: send a notification to someone about adding a template | ||
print(f"template {msg['template']}.jinja2 not found") | ||
return | ||
# TODO: check if there is an auto-delete after 30 minutes for stuck un-acked messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Fabri on the lack of integration testing: I would accept that some emails are never sent and that we quickly add the proper template, thanks to an automated ticket/a message somewhere else. (Not sure what the proper error reporting channel can be)
I don't think that we need a mechanism to resend emails that never got sent in a first version, or ideally ever.
How can we easily develop this integration test? Setup one account and 3 events, and send an application, modify, cancel, confirm, and see that we do get one email for each action?
<td align="center" style="padding: 56px 56px 28px 56px;" valign="top"> | ||
<a style="color: #3498DB; text-decoration: none;" href="my.aegee.eu" | ||
target="_blank"><img style="width: 56px; height: 56px; border: 0;" alt="Logo" | ||
src="https://my.aegee.eu/images/logo.png"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
23.65Kb for an image that we display in a 56*56 pixels box. We should host a more compressed version logo_email.png, or even consider base64'ing the image in the email.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lulz
creates between 1 and 8 fake emails and puts in the queue | ||
OR | ||
tests all templates it finds in the folder | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" | |
Works by creating a queue that the dispatcher/main.py can read from, without setting up a local RabbitMq. | |
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you may think that as I define the queues, I am somehow creating them; but this program simply sends to the queues so we still need a local rabbit. I define the queues because it's how rabbit works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, I thought the rabbit was created in-process in send.py, but no, we expect one to exist already. My suggestion is wrongly worded, but it would still be worth to explain that.
if(file_count != templates_tested): | ||
print() | ||
print() | ||
print(f"Tested {templates_tested} templates but the folder contains {file_count}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we add a "os.Exit(1)" here?
I guess what I'm looking for, is a section in the README "how to dev locally?". And I'd say, if you modify the python, then you want to set ALL_TEMPLATES_TEST=false, and RANDOM_AMOUNT_TEST=true, but if you modify the templates, you surely want ALL_TEMPLATES_TEST=true.
When testing all templates, I think we should mention that we catch some errors (invalid jinja syntax, template not found) but that a lot of things are not covered by this test:
- validity of HTML generated: does it look legible? (also on mobile, and without images)
- validity of content: do we send the proper content: does a new candidature email contain the string " A new candidature to position"? I wonder if for this purpose (checking the content), some sort of golden tests would be nice: we don't have dynamic (dates) content, while we use random names for the fields, we could make generate_fake_payload static/deterministic.
- finally, that the service calling the dispatcher does pass enough data to the mailer, so that no field is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i still don't understand your points here, I read them once again but I am lost :) can you rephrase? also your bullet list to me seems unrelated to the first part of the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair.
could we add a "os.Exit(1)" here?
right now, send.py always returns 0, regardless of whether tests passed or not. When we detect that one test case failed, we should return to the OS an exit code !=0 to notify it of a test failure. This can then be integrated with other tooling (in a fish terminal, this will show a red prompt for instance, to catch attention)
I guess what I'm looking for, is a section in the README "how to dev locally?". ...
We should explain the usefulness of each option, and when one wants to use ALL_TEMPLATES_TEST and when one wants to use RANDOM_AMOUNT_TEST.
When testing all templates,...
I would like to see written what this test can catch, and what this test doesn't cover at all. This way, a developer knows that they need to perform some extra manual testing to ensure that they didn't break anything: this script won't do it all for them (which is totally fine, as long as it's explicit)
@@ -0,0 +1,81 @@ | |||
|
|||
<!DOCTYPE html> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pick this file, but this applies to all: it's a lot of copy and paste, and I guess we'd like to better template this in the future?
It's always the same logic of title/body: 2 to 5 lines change between each file.
The second point, also for a follow-up task, is to remove the massive tableS (plural 😭 ) and use proper HTML to align the content. Our emails are not content heavy, so this makes a nice follow-up item.
The two points are intertwined, but can be done in independent PRs: the HTML fix ideally would only be done once the HTML code has been factored in a single file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep in mind (but you know, as per your comment in the PR thread) this is only a blind copy paste from the mailer templates that were in elixir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, I did some research after posting this.
I don't expect action here, but I'd leave it open, until tracked in Jira.
removing the auto unack, the program will not remove the message from the queue if it can't process it
not sure how it was not present at all
1f1d20d
to
bdf6d32
Compare
I rebased as I know nobody is working on it (@HassanCehef fyi, just in case you were trying it) |
with the last commits it's ready to be used now (at least code-wise). I will have to test it with gmail, or anyone can now anyway |
had to downgrade python of one minor version to make it work with the simple library slack_notifications
Before I review this; can you also make a PR on core that changes the mailer function to this? So I can deploy and test it on my own hosted instance. No need to do all the other microservices, they all use the same mailer function so I can just copy-paste it locally |
just pushed the branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll go through all mails manually, but I already took a quick look at the file names to see if all mails are present. Just two comments
The new network_board_welcome email is missing; https://github.com/AEGEE/mailer/blob/stable/lib/omsmailer_web/templates/page/network_board_welcome.html.eex (I still need to integrate that into network, but I worked on the email already)
@@ -0,0 +1,71 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filename should be membership_expired
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't find the corresponding entry anywhere in core 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use it in the 'new' core, so I'm fine with removing it. We can add a new template when we add the functionality again
I still need to go through the summeruniversity emails but I did all other submodules. These are the notes of which I'm pretty sure are not this way in current mailer;
Apart from that; in the current mailer empty fields are just empty (as in '') but in dispatcher they show up as 'None'. Likewise 'true' and 'false' are 'True' and 'False'. But that's just python vs javascript. Either way we should make that nicer in a separate PR. It helps that mailhog is now easy to use for development |
A bug in summeruniversity also allowed me to test the Slack notification message. I got the following error; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments about the various templates. Most comments are issues in current mailer as well or things for future PRs. I'll try to make some Jira issues from those later
return f"Agora {faker.sentence(nb_words=1)}" | ||
|
||
# from constants.js (at least for core..) | ||
MAIL_SUBJECTS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the subject of the email. You have also set macros.head in all emails. Which one do you want to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! They are misaligned. But also, keep in mind that the macros.head gives the HTML title of the email, which of course should be the subject. However if they're not in sync, it is not a problem; also as a reminder (and this is valid for some other comments below), I was merely copy-pasting-translating the already-inefficient templates
<tbody> | ||
<tr> | ||
<td align="center" style="padding: 56px 56px 28px 56px;" valign="top"> | ||
<a style="color: #3498DB; text-decoration: none;" href="my.aegee.eu" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<a style="color: #3498DB; text-decoration: none;" href="my.aegee.eu" | |
<a style="color: #3498DB; text-decoration: none;" href="https://my.aegee.eu" |
This should be fixed in all templates. Without the https:// it wanted to redirect to mailhog.{server}/my.aegee.eu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i left it out of the batch because it's a bigger change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing general remarks about the templates in this file, since it's the first one.
my.aegee.eu is hardcoded in a lot of these which we should fix in a future PR so the proper domain is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by proper you mean https, or also a variable (for our dev server)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable
<html lang="en"> | ||
|
||
{% import "snippets/macros.jinja2" as macros %} | ||
{{ macros.head("MyAEGEE: New candidature submitted") }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are not being used, I have not reviewed them yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think they are only used by the sender?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a related comment, we're using the subjects from constants instead of this title
<tbody> | ||
<tr> | ||
<td align="center" style="padding: 28px 56px;" valign="top"> | ||
<div style="font-family: "lato", "Helvetica Neue", Helvetica, Arial, sans-serif; line-height: 28px;font-size: 16px; line-height: 24px; color: #A7ADB5;">This email was autogenerated by MyAEGEE mailer.</div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a future PR; might be nice to reference the helpdesk here for if people have questions
If you have been accepted, the organizers of the Summer University will be in touch with you soon and provide you with more information. | ||
If you have been rejected, there might be other Summer Universities with open calls. If there are any at this moment you can find them on MyAEGEE. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future PR; we should make this conditional
<td align="center" style="padding: 0 56px 28px 56px;" valign="top"> | ||
<div> | ||
<a href="https://my.aegee.eu/summeruniversity/boardview" style="background-color:#1468C5;border-radius:50px;color:#ffffff;display:inline-block;font-family: 'lato', 'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:18px;font-weight: bold;line-height:40px;text-align:center;text-decoration:none;width:270px;-webkit-text-size-adjust:none;" | ||
target="_blank">Go to the Summer University board view</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future PR; the length of this text is not good for the button
Co-authored-by: Rik Smale <[email protected]>
I believe so, the massmailer and token.json file/folder. And then in the python files there's cleanup and error handling and other TODOs still there. Not sure how much you want to tackle in this PR |
I tried to account for all the questions in README.md