Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: filesystem watcher is untested #874

Open
djjuhasz opened this issue Mar 4, 2024 · 2 comments
Open

Problem: filesystem watcher is untested #874

djjuhasz opened this issue Mar 4, 2024 · 2 comments
Assignees

Comments

@djjuhasz
Copy link
Collaborator

djjuhasz commented Mar 4, 2024

Describe the problem

Enduro has code for a filesystem watcher that should support watching a local directory for new transfers to trigger the Enduro preservation workflow. I don't believe the filesystem watcher code is ever end-to-end tested though, as we have no automated end-to-end tests for Enduro and the dev environment does not include a filesystem watcher.

Possible solutions

  1. Add a filesystem watcher to the Enduro development environment to facilitate user testing
  2. Add an automated end-to-end test using a filesystem watcher to trigger processing
  3. Remove the filesystem watcher from Enduro

Additional context

  • The original Enduro project includes a filesystem watched dir in the dev environment. I assume the filesystem watcher in (this) Enduro is inherited from the original project. The original project implementation could be a helpful resource if we chose to implement a filesystem watcher in this project.
  • Removing the filesystem watcher (option 3) is pretty drastic and may block users of the original Enduro from using this project.
@sallain
Copy link
Collaborator

sallain commented Mar 4, 2024

Option 1 seems like the best first step; at very least we could confirm that Enduro would work for a potential user who uses local directories rather than MinIO. Is it possible to get that set up on one of our test sites?

FYI @fiver-watson this relates to what we discussed on Friday

@djjuhasz djjuhasz self-assigned this Mar 5, 2024
djjuhasz added a commit that referenced this issue Apr 2, 2024
Fixes #874

- Add `/home/enduro/sips` to the Enduro docker image
- Add an Enduro watcher on `/home/enduro/sips`
@djjuhasz
Copy link
Collaborator Author

djjuhasz commented Apr 4, 2024

I believe Enduro works like this when using Minio and a3m:

  1. The "enduro" and "enduro-internal" containers (which are identical except that "enduro" container enables API authentication, while "enduro-internal" does not) both start one watcher for each watcher configured in "enduro.toml" when the container starts
  2. When a file is added to the Minio "sips" bucket a message is added to the Redis queue
  3. Either an "enduro" or "enduro-internal" watcher, at random, will take the Redis message from the queue and start the "processing workflow" in Temporal
  4. The a3m-worker picks up the "processing workflow" job from the Temporal queue, downloads the transfer from Minio to the local container filesystem, does some pre-processing to create a SIP, calls a3m to process the SIP into an AIP
  5. The a3m-worker makes an HTTP request to the Enduro API to upload the AIP to the Minio "aips" bucket (the AIP is sent as a bitstream in the HTTP PUT request)
  6. The a3m-worker makes another HTTP request to move the AIP to the Minio "perma-aips-1" bucket

This architecture is pretty confusing and could be simplified, but it works. Getting Enduro to work with a filesystem watcher has a problem though between steps 3 and 4. The enduro or enduro-internal container watch for the transfer deposit, but the a3m worker container also needs access to the transfer to process it. In the Minio workflow, Minio is the bridge between the enduro, enduro-internal and a3m-worker containers. For the filesystem watcher we need another way to pass the transfer from the watched directory (in the enduro OR enduro-internal container) and the a3m-worker container.

The answer to https://stackoverflow.com/questions/31693529/how-to-share-storage-between-kubernetes-pods raises some good points about why sharing storage between Kubernetes pods may be problematic (Enduro mostly only has one container per pod, so we can treat “pod” and “container” as effectively synonymous in this case).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants