Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding sink() to init_etl() #100

Closed
pbchase opened this issue Mar 14, 2023 · 4 comments
Closed

Adding sink() to init_etl() #100

pbchase opened this issue Mar 14, 2023 · 4 comments
Assignees

Comments

@pbchase
Copy link
Contributor

pbchase commented Mar 14, 2023

We want to dump all of the R log to test files to ensure a complete run/crash log. We will do these things:

  • Read an environment variable named SINK_DIR from env file that sets the root folder of the logged files.
  • Add SINK_DIR in the example env file setting the value to ~/rcc_sink_log
  • Add sink() to init_etl() so that existing calls to init_etl will get the new functionality without modification
  • Test for the environment variable. If it is missing, use ~/rcc_sink_log for the sink logging hierarchy. This will place folders named 2023, 2024, etc. in ~/rcc_sink_log.
  • Write files into a hierarchical structure with a year folder at the top level.
  • Name the files YYYYMMDD-HHMMSS-script_name.log
  • The default file path is $SINK_DIR/$YEAR/filename. An example of this with full defaults for the script foo_hotness run on 2023-03-14 would be ~/rcc_sink_log/2023/20230314-123456-foo_hotness.log.
  • Revise the cron example to show the mounting of a volume to persist the log files

Note, the discussion about this feature has underscored the value of using docker-compose and yaml files to run these containers. That will be addressed in another issue.

@ljwoodley
Copy link
Contributor

Would this be an option? This is an example with render_report.R run locally. Output attached.

docker run --rm --env-file .env \
  -v $(pwd):/home/rocker/redcapcustodian \
  redcapcustodian /bin/bash -c "Rscript report/render_report.R \
  report/sample_report.Rmd > /home/rocker/redcapcustodian/$(date +%Y%m%d-%H%M%S)-sample_report.log 2>&1"

20240311-192047-sample_report.log

@ChemiKyle
Copy link
Contributor

ChemiKyle commented Mar 12, 2024

Separate goal of switching out docker run ... crons to docker compose:

docker-compose.yaml

version: '3.8'

services:
  redcapcustodian:
    container_name: redcapcustodian
    env_file:
      - /path/to/default.env
    environment:
      - FOO=${FOO}
    image: redcapcustodian
    restart: "no"
    volumes:
      - /path/to/host/dir:/mnt/homedir

new cron format:

* * * * * root docker compose -f /path/to/redcapcustodian/docker-compose.yaml --env-file /path/to/override.env run --rm redcapcustodian Rscript report/sample_report.Rmd 

Note that override.env does not seem to actually end up replacing variables sourced from Sys.getenv if they aren't also specified in the "environment" section; however, using -e KEY=var in the docker compose string after run does work (--env-file cannot be placed after run). This is not the case for docker compose up.

@ljwoodley
Copy link
Contributor

Is this proceeding with the docker compose solution?

@ljwoodley ljwoodley self-assigned this Apr 1, 2024
@pbchase
Copy link
Contributor Author

pbchase commented Apr 30, 2024

On 2024-04-30 @ljwoodley, @pbc, and @saipavan10-git discussed using https://airflow.apache.org/docs/apache-airflow/stable/index.html to manage our redcapcustodian tasks. @ljwoodley will do a proof of concept to run one workflow under airflow and produce a log file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants