-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create cron
utility extension
#3437
Comments
@pandemicsyn would #5962 resolve this issue? I think it would, since it could essentially do what I suggested here: https://gitlab.com/meltano/meltano/-/issues/3522#note_962685280. Though I'm not sure about having Meltano itself edit the user's crontab, rather than Meltano printing the crontab lines to stdout. |
Yep, i think #5962 would solve this. I'd probably lean toward not using user crontab entries and generating meltano cron file that can be dropped in /etc/cron.d but that seems like a user preferences thing and controllable through a cron export plugin config. |
Based on your input, I made a small python script to transform meltano schedules (both job and elt schedule, in json format) into a crontab file: I am currently using it this way:
It relies on a Happy if it can help (and open to feedback if you have any ofc). |
Ideally, this would be able to schedule and deschedule/disable the active list of scheduled items, perhaps with an option to specify which environment should be scheduled for. |
The pattern ideally would also work for:
|
I don't know if it will be out of scope for this feature request, but it might be valuable to have an option to integrate with cron / dead man's switch tools like https://healthchecks.io/. It might also be valuable for setups based on the proposal at #5962. The simplest way I can think of would be to add a way to have an option for an extra command that will be be added to the cron entry. So instead of generating
meltano generates the following cron
|
A simple first step towards #3437. TODO: - [ ] Add tests. - [ ] Open an issue to add a utility plugin using the new plugin SDK that invokes `meltano schedule list --format=cron`, and installs/uninstalls the output. - [ ] Open an issue to add `on_success` and `on_failure` hooks to the cron plugin, as suggested in #3437 (comment).
orchestrator
plugincron
utility extension
Brain dump about the cron utility extension:
meltano invoke cron [list] # List the Meltano schedule cron entries
meltano invoke cron install [schedule UUID...] # Install the Meltano schedule cron entries into the current user's crontab
meltano invoke cron uninstall [schedule UUID...] -a --all # Uninstall the Meltano schedule cron entries from the current user's crontab Template cron entry: cron_entry = f"{interval} (cd {meltano_project_dir} && export PATH={path} {venv_export} && {command}) 2>&1 | /usr/bin/logger -t meltano # <UUID>"
Each Meltano cron entry should have an inline comment with a UUID identifying the schedule entry. This will allow for its updating/removal later on. Cron does not permit in-line comments, but in practice they work fine because they're treated as part of the command, which is passed into the shell, which ignores everything after the If running as root, we could drop the entries into
If not running as root, we should install it as a user crontab using the crontab command: subprocess.run(('crontab', '-'), input='new crontab content (will overwrite old content)') For consistency and simplicity, we can probably get by with just the The Meltano crontab entries should have a header and footer that allows this extension to identify it, thereby preventing the extension from messing up other entries in the user crontab. If the header/footer is found, we should operate within it. Otherwise we should append a new Meltano section to the crontab.
Install/uninstall should install/uninstall all schedules listed by Because we may have to share a crontab with other entries (if not running as root), we cannot set environment variables for the whole crontab, nor can we depend on them having been set in any particular way. Notably this means that the command in each cron entry may be executed by any shell by default. We will want to get around this issue by explicitly running each entry with
With those set it should be the case that simply running Each cron entry command is going to be very long using the above approach, and quoting the command properly may be tricky. Putting the command content into a file is a common approach to keep it clean, but that introduces some additional issues. Notably, we need somewhere to store these files. With |
@aaronsteers - Note to self, review above ☝️ @edgarrmondragon, @pandemicsyn - fyi above for discussion/review. |
I, personally, would have preferred we not actually attempt to install/manage per user crontab's and just printed out the entries instead (because I think that might be more friendly when folks are deploying this via ansible/salt/etc), but I think if we're going to actually install and manage user crontab entries then what @WillDaSilva laid out seems solid 🤘
Big +1. not a huge fan of our behavior being very different for these two paths. I'd also rather we be consistent and predictable and use the per user crontab entries across the board.
This should probably replace the cron entry AND that wrapping script (incase it got deleted?). Definitely not required, but might be nice if |
@pandemicsyn That will be supported by running We probably also want to support
Good idea. Supporting
Not sure what you mean by "wrapping script" here. |
@WillDaSilva I meant these:
|
@pandemicsyn @aaronsteers My proposal (and current implementation) relies on each schedule having an associated UUID. Currently schedules provided by Meltano do not have any unique identifier. I see 3 options we can take:
Which approach do you think we should take? I'm partial to the third approach. |
Looks like we'll be forced into putting the command and env vars into a script, and managing that. The maximum length for a cron command is 999 characters. |
We probaly want to break the "begin/end Meltano section" blocks in the crontab up by project. We can use the project ID in the header and footer comments. Additionally, inline comments with the schedule ID are no longer required, as we can identify the schedules in the crontab based off of which script they're calling. The script path is |
@aaronsteers @pandemicsyn I've created https://github.com/meltano/cron-ext, and opened meltano/cron-ext#1 The license for the repository is currently MIT. Is that suitable, or should it be Apache-2.0 instead? We can start to use that repository to track features/bugs/issues for the cron extension going forwards. For instance, multi-project support by putting the project ID in the section header and footer is probably worth pulling into its own issue. |
The schedule name is "supposed to be" unique. Although in practice, not everything will complain if a schedule with a duplicate name is found:
The most likely way to end up with a dupe is if someone manually edits the yaml. If we encounter a dupe we should raise an error. Even without that, I'm not sure you strictly speaking - need a UUID. Could hash some subset of the schedule fields.
I'll defer to @aaronsteers for that but fwiw - i think most of the other repo's are Apache licensed. |
Closing as complete now that meltano/cron-ext#1 has been merged, and the outstanding features/fixes discussed here have all been logged at https://github.com/meltano/cron-ext/issues |
@WillDaSilva and @pandemicsyn - Just circling back here to confirm the above. MIT and Apache 2.0 are both okay. Apache 2.0 is indeed preferred. |
Migrated from GitLab: https://gitlab.com/meltano/meltano/-/issues/3522
Originally created by @alxthm on 2022-05-24 14:54:47
For simple meltano deployments, it would be great to have a cron-based orchestrator using the
meltano schedule
configuration, without having to set-up an Airflow instance with a db, users, etc.The meltano UI already makes it possible to manually run meltano schedules, so this would be a nice complement 😄
The text was updated successfully, but these errors were encountered: