Cloud metadata #1130

wlandau · 2023-08-28T16:24:06Z

Prework

I understand and agree to the code of conduct and the contributing guidelines.
I have already submitted a discussion topic or issue to discuss my idea with the maintainer.

Related GitHub issues and pull requests

Ref: Metadata on the cloud #1109

Summary

In this PR, tar_make(), tar_make_clustermq(), and tar_make_future() gain the ability to continuously upload metadata to the cloud. As with local writes to e.g. _targets/meta/meta, the metadata gets uploaded to the cloud every seconds_meta seconds unless a deployment = "main" target is currently blocking the main process. The metadata files live in AWS S3 or GCP GCS, depending on the new repository_meta option in tar_option_set() in _targets.R, and they go to the bucket you set with the resources option in tar_option_set(). (repository_meta defaults to repository, so there is no need to manually opt in to this feature.) Locally on another machine, you can manage the cloud metadata with new functions tar_meta_download(), tar_meta_sync(), tar_meta_upload(), and tar_meta_delete().

These changes align targets with the idea that in cloud computing, you are renting the machines you work with, and you want your EC2 instances and EBS volumes to disappear as soon as possible. With all the metadata and all the target data in a bucket, the local file system that ran the pipeline is free to vanish as soon as the pipeline finishes. You could even set the data store (tar_config_set(store = "...")) to a node-specific temporary directory to be kinder to shared file systems (e.g. EFS) on enterprise architectures (FYI @rpodcast). Then on a different machine, simply pull the code, and then pull the metadata with tar_meta_download(). At that point, you can read objects, check the progress of a running pipeline, and even run the pipeline there if the original run finished. In other words, targets pipelines adopts a similarly decentralized model as Git/GitHub (although it can't realistically go quite that far).

Unfortunately, after this PR, cloud targets with custom prefixes may need to rerun. This is because I needed to shift target data to a PREFIX/objects/ location in order to make room for metadata in PREFIX/meta. But I think this change is worth this inconvenience, especially given that the solution to #1108 already invalidates existing targets.

wlandau-lilly and others added 30 commits August 27, 2023 07:25

Move methods and upload on dequeue

5274e24

repository_meta option

f64aa5a

news

15fa54b

Redesign db classes and mock sync

d122756

test mock sync()

93a3d3e

restore coverage

1d69313

test aws database methods

99aad68

sketch gcp db tests

1046766

test labels

e4b42d1

Start aws meta test

a5a2e00

rm link

0689fb5

Fix test

91a8b36

add to test

d1b6198

Bring back deduplicating storage

c4953c6

add gcp meta test

fdb9d13

prefer_local = TRUE in active algos

4d53110

rename an internal function

a5633fb

lint

43879e7

parse file

3f368b4

New process to get resources

bf35d6d

db methods delete()

2a9db14

tar_destroy() cloud meta

4eb72db

tar_destroy() gcp meta

770d56e

relax tar_destroy() script requirement

4ae13dd

docs

4fc80ca

fix tests

ce3caea

exempt lines

d6e5f6b

Add tar_meta_upload(), *_download(), *_sync(), and *_delete()

e04b002

Add unit tests for new tar_meta_*() functions

177bb9d

Fix tests

0f29a25

wlandau-lilly and others added 6 commits August 28, 2023 11:10

try to fix tests

55f9955

Fix #1109

97b6d22

Fix gcp db

1c366d9

fix a test

96bf4cd

test continuous metadata updates on gcp

75039ef

Test continuous metadata uploads on AWS

c16f945

wlandau merged commit b7d8183 into main Aug 28, 2023

wlandau deleted the 1109 branch August 28, 2023 17:16

wlandau mentioned this pull request Sep 1, 2023

Initial implementation of standard retry paws-r/paws#660

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cloud metadata #1130

Cloud metadata #1130

wlandau commented Aug 28, 2023

Cloud metadata #1130

Cloud metadata #1130

Conversation

wlandau commented Aug 28, 2023

Prework

Related GitHub issues and pull requests

Summary