Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make canarie-api independent of cron and proxy #18

Merged
merged 18 commits into from
Aug 29, 2023

Conversation

mishaschwartz
Copy link
Collaborator

  • Separate the application, cron job, and proxy (nginx) into separate containers so that they can be run independently.
  • Add option to independently enable/disable the two cron jobs that can be run (monitor, parse logs).
  • Do not handle log rotation for nginx anymore. Nginx should handle this on its own.

Copy link
Collaborator

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments

I like the idea of splitting everything up as it should be.
Let's see how the tests do. Maybe more items will have to be validated regarding the log parsing strategy.

self-note for deploy

  • major revision 1.0.0 required

Comment on lines 17 to 26
volumes:
- access_log:/opt/local/src/access_file.log

proxy:
image: nginx
volumes:
- access_log:/var/log/nginx/access.log

volumes:
access_log:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering log rotation that could contain multiple files, it would probably be better to mount a dedicated directory.

I think it would be wise also to provide an example (i.e.: what is expected) of a configured log rotation of nginx here using the docker service definition for reference (e.g.: https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/config/proxy/docker-compose-extra.yml#L3-L8, but not using driver: json-file so the parsing still works in CanarieAPI).

I think the log rotation aspect could be improved (not necessarily in this PR, but setup for it).
Log parsers should probably count everything in the "current log" and add them to call counts from older logs, which are gradually wiped.
When the cron job is called, only log entries after the latest update would be parsed to avoid re-counting the same entries twice.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that all makes a lot of sense! But... from our previous discussion I got the sense that the log monitoring isn't used much, if at all. Do you think that it is worth the effort to make the log rotation work really well if there is a chance we might drop support for that feature in the near future?

If you think there is a use-case for it still I'll leave it in and make the changes you suggest. Let me know what you think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically for the case of nginx requests, there are a lot of logs. There must be some form of log rotation, or the proxy will very quickly run into full disk error. This is the reason the referenced logging definition was originally added.

I don't mind not putting too much effort on parsing the log rotation, but I think it could be added easily.
The issue I can see with the PR is that since CanarieAPI is no longer in charge of managing the rotation, logs that will be parsed when the cron is called will not be in sync with whichever rotation is applied on the proxy docker. But like you said, those stats are not used that much, so maybe not that much of an issue.

canarieapi-cron Outdated
Comment on lines 1 to 2
* * * * * root [ -n "$CANARIE_PARSE_LOGS" ] && /usr/local/bin/python3 -c 'from canarieapi import logparser; logparser.cron_job()' 2>&1 | /usr/bin/tee -a /var/log/logparser_cron.log >/proc/1/fd/1 2>/proc/1/fd/1
* * * * * root [ -n "$CANARIE_MONITOR" ] && /usr/local/bin/python3 -c 'from canarieapi import monitoring; monitoring.cron_job()' 2>&1 | /usr/bin/tee -a /var/log/monitoring_cron.log >/proc/1/fd/1 2>/proc/1/fd/1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand what is happening with those redirects.
Only the logs from canarie-api-cron itself will be visible (i.e.: the logger.info(...) calls within parse_log), and not the logs from nginx?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's right. This is essentially a way of making cron logs visible on the stdout of the main process (the one running in docker). The problem is that even if you run cron in the foreground with cron -f stdout and stderr are still redirected to the cron log files. This is a way of getting around that and sending the logs to actual stdout (as one might expect from a docker container)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds good.
This docker will also need the log rotation config applied to avoid it blowing up the docker-compose logs.

Comment on lines 20 to 21
if not os.path.isfile(filename):
return {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should raise instead of failing silently, and also log an error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I mostly did this because the tests assume that this file is present (it was previously created as an empty file in the rotate_logs function). I can update the tests though so that they don't fail by default.

cron-entrypoint Outdated
@@ -0,0 +1,5 @@
#!/bin/sh
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repository root is starting to be cluttered with too many docker-specific definitions.
Can you move them under a docker directory?


To run the the monitoring job, add the following to a crontab file::

* * * * * python3 -c 'from canarieapi import monitoring; monitoring.cron_job()' 2>&1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commands differ from the sample cron definitions.

Here, you could use .. literalinclude:: canarieapi-cron to avoid maintaining both places
(https://devopstutodoc.readthedocs.io/en/stable/documentation/doc_generators/sphinx/rest_sphinx/code/literalinclude/literalinclude.html). Use :lines: to select each instruction individually.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example here would be for running cron on the host machine (not on docker) so the stdout/stderr redirects and the conditional execution isn't as necessary.

I'm assuming, based on the rest of this file, that this documentation is intended as "basic usage when running canarie-api locally" and is different from the docker usage. I didn't notice that there were docker specific usage documentation but if I've missed it let me know.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there was no docker specific usage.
Even this doc is outdated because ../bin/gunicorn is not even in the repo anymore (just run gunicorn installed in the python env).

It's ok to leave those examples as is. Personally, I debug calling the python commands directly instead of using cron. I don't expect many usages outside of Docker except for developers.

Dockerfile Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Apr 17, 2023

Codecov Report

Patch coverage: 70.90% and project coverage change: -0.91% ⚠️

Comparison is base (4070852) 80.58% compared to head (386028d) 79.68%.
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #18      +/-   ##
==========================================
- Coverage   80.58%   79.68%   -0.91%     
==========================================
  Files          11       11              
  Lines         577      576       -1     
  Branches       89       95       +6     
==========================================
- Hits          465      459       -6     
+ Misses         84       83       -1     
- Partials       28       34       +6     
Files Changed Coverage Δ
canarieapi/schema.py 73.17% <65.38%> (-10.62%) ⬇️
canarieapi/logparser.py 69.44% <68.42%> (-3.18%) ⬇️
canarieapi/api.py 89.69% <87.50%> (-0.21%) ⬇️
canarieapi/default_configuration.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mishaschwartz mishaschwartz marked this pull request as draft April 18, 2023 19:13
@mishaschwartz
Copy link
Collaborator Author

I'm making this back into a draft for now so that I can think about it a bit more.

The main issue is that if nginx runs in its own container then the logs are streamed to stdout/stderr and so are not written to a file directly in order to rotate them.

One option would be to make a custom nginx image that writes logs to a file instead but then there's not much point in running this as a separate service since one of the main advantages is being able to inspect the logs for individual processes.

Another option is to mount the container logs from the host machine as a volume but the log format that docker produces isn't necessarily stable so it would be a pain to rely on them.

@tlvu
Copy link

tlvu commented Apr 19, 2023

Sorry for being late to the party. Some thought, ignore if too naive since I do not know the complexity of the code here.

  • Separate the application, cron job, and proxy (nginx) into separate containers so that they can be run independently.

The proxy will be vanila nginx image?

Parse logs cronjob share data-volume with nginx proxy container to parse the nginx logs?

Monitor cronjob runs from inside the nginx proxy container? I think it should run inside the proxy container so it can effectively test that the proxy has working connection to all the other containers.

The cronjob do not have to be started by the proxy container itself. It can docker exec into the proxy container to do its job as well, maybe something like https://github.com/bird-house/birdhouse-deploy/blob/7dcb2081cdcd876ac9df10cbd4869e2e6a0b3d3a/birdhouse/pavics-compose.sh#L133

  • Add option to independently enable/disable the two cron jobs that can be run (monitor, parse logs).

Could be useful for other usage of Canarie-Api. I think PAVICS maybe will always want both of these cronjobs?

  • Do not handle log rotation for nginx anymore. Nginx should handle this on its own.

Would this affect the ease of log parsing? If Nginx log rotate itself, maybe the log parsing has to know this schedule and parse the logs before it is rotated?

@mishaschwartz
Copy link
Collaborator Author

@tlvu

The proxy will be vanila nginx image?

Yeah it could be, or at least could start as a vanilla nginx image that could be built on.

Parse logs cronjob share data-volume with nginx proxy container to parse the nginx logs?

Yes that is one solution but doesn't fully solve the problem of log rotation

Monitor cronjob runs from inside the nginx proxy container? I think it should run inside the proxy container so it can effectively test that the proxy has working connection to all the other containers.

Right now the monitor cronjob does run inside the same container as the proxy. This PR would change it so that the monitor cronjob runs in a different container.

The cronjob do not have to be started by the proxy container itself. It can docker exec into the proxy container to do its job as well, maybe something like https://github.com/bird-house/birdhouse-deploy/blob/7dcb2081cdcd876ac9df10cbd4869e2e6a0b3d3a/birdhouse/pavics-compose.sh#L133

Yeah that's a good idea. I was thinking about adding another docker-proxy component that can allow the cronjob container to query the nginx logs parsed by docker. This could also be used to rotate the logs in the nginx container with an exec.

  • Add option to independently enable/disable the two cron jobs that can be run (monitor, parse logs).

Could be useful for other usage of Canarie-Api. I think PAVICS maybe will always want both of these cronjobs?

Ok. I thought from our emails that the log parsing job was not needed/not used now that the funding from canarie is done. Correct me if I'm wrong.

  • Do not handle log rotation for nginx anymore. Nginx should handle this on its own.

Would this affect the ease of log parsing? If Nginx log rotate itself, maybe the log parsing has to know this schedule and parse the logs before it is rotated?

It makes it slightly more complex but not too bad really. My main concern with this is that canarie-api ties the log rollover to the monitoring very strictly and a sysadmin may want to rotate logs differently. I think that this can be handled gracefully by canarie-api without too much trouble.

@tlvu
Copy link

tlvu commented Apr 19, 2023

Ok. I thought from our emails that the log parsing job was not needed/not used now that the funding from canarie is done. Correct me if I'm wrong.

Opps sorry yes, log parsing for the stats counter not required for PAVICS. Francis said maybe we still need to provide the stats until Sept 2023. Might want to double check with David Huard.

@fmigneault
Copy link
Collaborator

@mishaschwartz

The main issue is that if nginx runs in its own container then the logs are streamed to stdout/stderr and so are not written to a file directly in order to rotate them.

Aren't logs stored in /var/log/nginx?
They should be because of these definitions:
https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/config/proxy/nginx.conf.template#L5
https://github.com/bird-house/birdhouse-deploy/blob/master/birdhouse/config/proxy/nginx.conf.template#L18-L22

I think the solution would be to mount a volume on /var/log/nginx within proxy that will be shared with canarieapi for monitoring.

@mishaschwartz
Copy link
Collaborator Author

Aren't logs stored in /var/log/nginx?

Yes but that is defined in birdhouse-deploy and isn't mentioned anywhere in this repo. It should at the very least be described or documented here.

I think the solution would be to mount a volume on /var/log/nginx within proxy that will be shared with canarieapi for monitoring.

Yes that is the current solution

@mishaschwartz
Copy link
Collaborator Author

Yes that is the current solution

oops sorry I realize I haven't pushed those edits yet

@mishaschwartz
Copy link
Collaborator Author

@tlvu @fmigneault To be honest I think that this has become more effort than it is worth to make the feature that parses the nginx logs work and isolate the processes (nginx, canarie, cron).

If you're ok with me removing the feature that parses nginx logs I can do that (I suggest we wait until September to roll that out so that the Canarie requirements are finished). I vote to do that.

If you want that feature kept but allow it to be disabled optionally, then I suggest we close this PR and go with #19 instead.

@fmigneault
Copy link
Collaborator

@mishaschwartz
I don't mind having the flag to disable, but I don't want to remove it entirely.
However, I would like to have the responses return something indicating that the features are disabled when those flags off.

@mishaschwartz
Copy link
Collaborator Author

mishaschwartz commented Apr 19, 2023

However, I would like to have the responses return something indicating that the features are disabled when those flags off.

This is implemented in the changes to api.py (see here and in #19)

I don't mind having the flag to disable, but I don't want to remove it entirely.

Ok then I propose we forget about this PR and #19. I would really like there to be an option where we have a service that just monitors the stack and provides information about the other services that are running and nothing else.

I think that canarie-api will do for now but I will introduce an alternative for the birdhouse stack later on (I will work on other things first). CRIM and Ouranos can continue using canarie-api as it is, and other deployments can use the alternative monitoring app if they prefer something simpler.

@fmigneault
Copy link
Collaborator

@mishaschwartz
Can you describe what is blocking you?
I can give it a try. I don't have the impression that the change is that complicated, but maybe there's something I'm not seeing with static code analysis.

@mishaschwartz
Copy link
Collaborator Author

You're right, it's not difficult. I've pushed an example solution (not tested just yet, I'll get to it tonight).

I propose an alternative because I don't want to require a component in the birdhouse stack where some of its features are mandatory but not others. I've thought about it a bit more and I think we can come to a solution if we make the monitoring job mandatory but the log parsing job optional.

I'll push a few more changes later tonight and add some clarifying documentation and we can discuss further from there

@tlvu
Copy link

tlvu commented Apr 19, 2023

If you're ok with me removing the feature that parses nginx logs I can do that (I suggest we wait until September to roll that out so that the Canarie requirements are finished). I vote to do that.

You have my vote.

I think logs parsing should be generalized to all components in the stack, not just nginx. There is an issue opened for that here bird-house/birdhouse-deploy#218

@mishaschwartz mishaschwartz marked this pull request as ready for review April 20, 2023 17:43
@mishaschwartz mishaschwartz requested a review from fmigneault May 5, 2023 18:12
@mishaschwartz
Copy link
Collaborator Author

Another reason this is useful is that if canarie-api fails (and exits) the proxy goes down as well (which is not good)

Copy link
Collaborator

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to pass linting and tests validations.

Looks good overall from static code analysis. I'll need to see it working within birdhouse-deploy for further validation.

service_stats.append(("invocations", cron_info["invocations"]))
monitor_info.append(("lastInvocationsUpdate", cron_info["last_log_update"]))

monitor_info.extend([
("lastAccess", cron_info["last_access"]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe consider having "Disabled" instead of the default "Never" so that it is more obvious that this status is expected when PARSE_LOGS = False was specified?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was not clear what "lastAccess" referred to in this case. If it refers to the last time this route was accessed by a user then it should be displayed conditionally if monitoring is enabled. I've moved it up into the if block (above) to reflect this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It refers more to when the logs were last accessed (aka checked). So, more like a "last updated".
I don't think the last access time of the endpoint is relevant for anyone. It is usually used to validate if the logs were parsed recently or are ~1min old (or more depending on crontab config). The "Never" is usually an indication that something is misconfigured, or that the instance encountered some kind of error.

I want to avoid an expected "Never" (because log parsing is disabled) to be misinterpreted as something going wrong in the service. Alternatively, you could output parseLogsEnabled: true|false directly in the response. That would be even more explicit than the "Disabled" previously suggested.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still remains to be addressed.

Comment on lines -15 to +19
"filename": "/opt/local/src/CanarieAPI/stats.db",
"access_log": "/var/log/nginx/access_file.log",
"log_pid": "/var/run/nginx.pid"
"filename": "/data/stats.db",
"access_log": "/logs/nginx-access.log"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes breaking changes.
Need to consider MAJOR revision.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

canarieapi/logparser.py Show resolved Hide resolved
canarieapi/logparser.py Show resolved Hide resolved
canarieapi-cron Outdated Show resolved Hide resolved
canarieapi/schema.py Outdated Show resolved Hide resolved
canarieapi/schema.py Outdated Show resolved Hide resolved
@fmigneault
Copy link
Collaborator

@tlvu
I don't know if there is a specific setting to adjust in the Ouranosinc organization.
I currently need to press "approve and run" for the CI to run each time @mishaschwartz applies new changes.
It would be more efficient if he doesn't need to wait for me to approve the runs to get the results.

@tlvu
Copy link

tlvu commented Jun 13, 2023

@tlvu I don't know if there is a specific setting to adjust in the Ouranosinc organization. I currently need to press "approve and run" for the CI to run each time @mishaschwartz applies new changes. It would be more efficient if he doesn't need to wait for me to approve the runs to get the results.

Sorry, I haven't been following this PR. What "approve and run" button are you talking about?

@fmigneault
Copy link
Collaborator

@tlvu

image

@tlvu
Copy link

tlvu commented Jun 14, 2023

@fmigneault This is odd I do not see what you see (that approve and run button)

This is my view on github.com

Screenshot from 2023-06-13 20-37-29

So, I am not sure what I can do to help! The reviewer that is requesting change is you by the way.

@mishaschwartz
Copy link
Collaborator Author

@tlvu This repo is likely set up to not allow workflow runs from outside contributors (https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks). If you add me to the repo as a member, my changes to this PR should trigger the workflow runs automatically.

@fmigneault
Copy link
Collaborator

@tlvu

This is odd I do not see what you see (that approve and run button)

It will show only if @mishaschwartz pushes another commit. For now, the latest commit was already approved by me, so you see the CI results of the corresponding run.

@mishaschwartz
Copy link
Collaborator Author

@tlvu I pushed another commit if you want to take another look. Thanks

@tlvu
Copy link

tlvu commented Jun 14, 2023

@tlvu This repo is likely set up to not allow workflow runs from outside contributors (https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks). If you add me to the repo as a member, my changes to this PR should trigger the workflow runs automatically.

Indeed ! You're added to this repo with write permission now.

If it still does not resolve the workflow not firing problem, you probably should push directly to this repo instead of to your fork.

@tlvu
Copy link

tlvu commented Jun 14, 2023

@tlvu I pushed another commit if you want to take another look. Thanks

Yes I see that "approve and run" button now and just clicked it.

Hopefully now you're added with write permission to this repo, the workflow will fire automatically on each push.

@mishaschwartz
Copy link
Collaborator Author

@fmigneault @tlvu only the labeler is failing and I'm pretty sure that's because I'm merging this from my own fork.

See for details: actions/labeler#12

What do you want me to do? Are you willing to approve this without the labeler running?

@tlvu
Copy link

tlvu commented Jun 19, 2023

@fmigneault @tlvu only the labeler is failing and I'm pretty sure that's because I'm merging this from my own fork.

See for details: actions/labeler#12

What do you want me to do? Are you willing to approve this without the labeler running?

I'll let @fmigneault approve the change since he knows the CanarieAPI more than I do. However I don't think the labeller is something critical.

@fmigneault fmigneault self-requested a review June 19, 2023 16:30
fmigneault
fmigneault previously approved these changes Jun 19, 2023
Copy link
Collaborator

@fmigneault fmigneault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor items to add.
Good to merge after.
You can ignore the labeler action.

CHANGES.rst Outdated
@@ -7,7 +7,8 @@ CHANGES
------------------------------------------------------------------------------------

* Separate the application, cron job, and proxy (nginx) into separate containers so that they can be run independently.
* Add option to independently enable/disable the parse_logs cron job.
* Add option to independently enable/disable the parse_logs cron job by setting the PARSE_LOGS configuration option to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing quotes for parse_logs and PARSE_LOGS.

canarieapi/schema.py Show resolved Hide resolved
@mishaschwartz
Copy link
Collaborator Author

@fmigneault this is good to merge I think (I don't have permission to merge it myself)

CHANGES.rst Outdated Show resolved Hide resolved
canarieapi/api.py Outdated Show resolved Hide resolved
@fmigneault fmigneault merged commit a5e1c8c into Ouranosinc:master Aug 29, 2023
@mishaschwartz mishaschwartz deleted the more-flexibility branch August 29, 2023 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants