Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support --output - to pull to stdout #346

Open
ndeloof opened this issue Jan 7, 2022 · 20 comments
Open

support --output - to pull to stdout #346

ndeloof opened this issue Jan 7, 2022 · 20 comments
Assignees
Labels
enhancement New feature or request question Further information is requested
Milestone

Comments

@ndeloof
Copy link

ndeloof commented Jan 7, 2022

My use case is to rely on oras CLI to restore a data cache, stored as tar.gz
on pull, I'd like to pipe directly the downloaded artifact to tar xz

@SteveLasker
Copy link
Contributor

Looks super interesting. Please open a PR for the proposal.

@sajayantony sajayantony added the help wanted Extra attention is needed label Apr 19, 2022
@shizhMSFT
Copy link
Contributor

This is an interesting one. How would you handle multiple files?

@FeynmanZhou
Copy link
Member

restore a data cache

Hi @ndeloof

Could you pls elaborate more on your use case?

Actually, ORAS CLI v0.15 will provide oras manifest fetch and oras blob fetch that might meet your need. You can check out this doc for details.

@FeynmanZhou FeynmanZhou added this to the future milestone Sep 7, 2022
@TerryHowe
Copy link
Member

Seems like the output of this should be a tgz so multiple files can be supported

@shizhMSFT
Copy link
Contributor

shizhMSFT commented Mar 15, 2023

One UX can be: oras pull localhost:5000/json-artifact:v1 --output - | jq and oras returns error if there are multiple blobs associated with the target manifest.

@shizhMSFT shizhMSFT modified the milestones: future, v1.1.0 Mar 22, 2023
@shizhMSFT shizhMSFT modified the milestones: v1.1.0, v1.2.0 Mar 22, 2023
@shizhMSFT shizhMSFT added enhancement New feature or request and removed help wanted Extra attention is needed labels Mar 22, 2023
@ProbstDJakob
Copy link

Are there any plans to implement this also for the input so that something like the following will be possible:

command-a | command-b | oras push localhost:5000/json-artifact:v1 -

...

oras pull localhost:5000/json-artifact:v1 --output - | jq

@shizhMSFT shizhMSFT modified the milestones: v1.2.0, future Sep 12, 2023
@shizhMSFT shizhMSFT added the question Further information is requested label Sep 12, 2023
@qweeah
Copy link
Contributor

qweeah commented Sep 12, 2023

@ProbstDJakob There is a plan to provide piped-command user experience in v1.2.0 by standardizing output, see #638

Still I have questions on below commands:

command-a | command-b | oras push localhost:5000/json-artifact:v1 - 

Since the layer content comes from stdin not file,

  1. What is the file name of the generated layer?
  2. How should we name the layer if user runs oras pull localhost:5000/json-artifact:v1?
oras pull localhost:5000/json-artifact:v1 --output - | jq

What if localhost:5000/json-artifact:v1 contains multiple layers?

@ProbstDJakob
Copy link

oras push could receive an additional option as follows:

--from-stdin[=file-path[:type]]
    oras will read data from the stdin and write it to `file-path` within the image. If `file-path` has not
    been supplied it defaults to `./stdin.blob` with the type `application/octet-stream`. This option can be
    used in conjunction with other files supplied via `<file>[:type] [...]` but does not need to. The only
    exception is that there must not be supplied another `-` file.

...

<file>[:type] [...]
    The files to include within the image. The special file `-[:type]` is equivalent to using the option
    `--form-stdin=./stdin.blob[:type]` where if no type has been supplied the type
    `application/octet-stream` will be used.

Regarding your second question, I am not that familiar with how OCI images work, thus I am currently unable to answer your question, but I am willing to study the docs in order to further elaborate your question if the answer above doesn't solve it implicitly.

For oras pull there might be a similar option:

--to-stdout[:<single|tar>][=file-path,...]
    Instead of writing the content of the image to a directory, the content will be written to stdout.

    When supplying `--to-stdout:single[=file-path]` the file found at `file-path` within the image will be
    written to stdout without converting it to an archive. If no `file-path` has been supplied and the image
    contains exactly one file this will be written out, otherwise the command will fail. If more than one
    `file-path` has been supplied the command will also fail.

    When supplying `--to-stdout:tar[=file-path,...]` the files found at `file-path,...` will be written to
    standard out by combining the files within an uncompressed tar archive. If no files have been supplied,
    all files within the image will be included in the archive.

    Aliases:
    `--to-stdout=<file-path>` => `--to-stdout:single=<file-path>`
    `--to-stdout=<file-path,...>` => `--to-stdout:tar=<file-path,...>`
    `--to-stdout` => `--to-stdout:single`
    
    This option is mutually exclusive with the `--output` option.

Regarding the penultimate line, I am not quite confident if this is the right choice (defaulting to single), but I think most users would try to pipe a single file instead of a whole archive.

@guettli
Copy link

guettli commented Nov 28, 2023

Just for the records, I found this solution for me to stream the content of an artifact to stdout:

oras blob fetch -o- ghcr.io/foo/test@$(oras manifest fetch ghcr.io/foo/test:0.0.1  | yq '.layers[0].digest')

I pushed the tgz like this:

oras push ghcr.io/foo/test:0.0.1 --artifact-type application/vnd.foo.machine-image.v1 image.tgz

This solves my use case, but it would be great to do that without yq (in a single oras call).

@qweeah
Copy link
Contributor

qweeah commented Nov 29, 2023

I pushed the tgz like this:

oras push ghcr.io/foo/test:0.0.1 --artifact-type application/vnd.foo.machine-image.v1 image.tgz

@guettli This is very interesting. May I know what's stored inside the image.tgz and how it is generated?

If you provided a folder but not a file, oras push can help pack and oras pull can unpack automatically. If your end-to-end scenario fits into this, you may try

oras push ghcr.io/foo/test:0.0.1 --artifact-type application/vnd.foo.machine-image.v1 image # pack and push all files in folder image
oras pull ghcr.io/foo/test:0.0.1 -o pulled # pull and unpack files into folder pulled/image

@guettli
Copy link

guettli commented Nov 29, 2023

@qweeah thank you for asking. The tgz contains a linux root file system. We booted Ubuntu on a VM, then we installed some tools and applied some configuration, and then we create a tgz, so that we have constant custom image. The image is about 1.8 GByte and contains 100k files.

I am happy to store the tgz as blob in an artifact. Nice to know that you could use oras for tar/untar, too. But at the moment I don't see big the benefit.

One drawback of the current method: We can't create the artifact via streaming. AFAIK something like this is not supported yet:

tar -czf- .... | oras push ...

@qweeah what benefit would we have if we would use oras instead of tar/untar?

@qweeah
Copy link
Contributor

qweeah commented Nov 29, 2023

Before upload any blob to a registry, the digest must be specified.

Unless you can get the digest before archiving is done, Otherwise it's not possible to do the streamed uploading.

@qweeah
Copy link
Contributor

qweeah commented Nov 29, 2023

@qweeah what benefit would we have if we would use oras instead of tar/untar?

Well, rather than using oras manifest fetch + oras blob fetch, you can use only one command oras pull to do the pulling.

@ProbstDJakob
Copy link

Before upload any blob to a registry, the digest must be specified.

Unless you can get the digest before archiving is done, Otherwise it's not possible to do the streamed uploading.

In order to circumvent this oras could buffer the input stream in memory until for example 64MiB and if this threshold has been reached oras pauses reading new input and first writes the 64MiB into a temporary file with narrow access rights and then resumes reading from the input and directly pipe it into the file. After reaching the EOF oras could calculate the digest either from the in memory buffer or if the content was too large from the file, pack it, and upload the image.

The buffering in memory would only be for performance (and security) reasons, but would mostly be a nice to have feature.

@qweeah
Copy link
Contributor

qweeah commented Nov 29, 2023

Before upload any blob to a registry, the digest must be specified.
Unless you can get the digest before archiving is done, Otherwise it's not possible to do the streamed uploading.

In order to circumvent this oras could buffer the input stream in memory until for example 64MiB and if this threshold has been reached oras pauses reading new input and first writes the 64MiB into a temporary file with narrow access rights and then resumes reading from the input and directly pipe it into the file. After reaching the EOF oras could calculate the digest either from the in memory buffer or if the content was too large from the file, pack it, and upload the image.

The buffering in memory would only be for performance (and security) reasons, but would mostly be a nice to have feature.

It's not sth oras can circumvent, you cannot get the checksum of the blob before tar finishes writing

@ProbstDJakob
Copy link

I know that is why I proposed the solution with buffering/writing to a temporary file. Thus the calculation of the digest can be done after tar finishes without the need of creating/deleting a temporary file by oneself and therefore support streaming.

@qweeah
Copy link
Contributor

qweeah commented Nov 29, 2023

Yes, the digest calculation can be done while packing and this optimization has already been applied in oras-go.

The question is after getting the digest, oras CLI still need to go through the archive file to upload it.

@ProbstDJakob
Copy link

Sorry for the late response. Maybe I do not know enough about how oras works, but wouldn't the proposed solution be equivalent to supplying files as arguments but instead of including the files from the arguments the only file to include is the buffer/temporary file?

Maybe the following pseudo script will help you understand my suggestion:

uint8[64MiB] buffer;

read(into=buffer, from=stdin);
Readable inputData;

if (peek(stdio) == EOF) {
  inputData = buffer;
else {
  File tmpFile = tmpFileCreate();
  write(to=tmpFile, from=buffer);
  readAll(into=tmpFile, from=stdin);

  seek(origin=START, offset=0, file=tmpFile);
  inputData = tmpFile;
}

call oras push registry.example:5000 inputData # Yes the CLI is not able to accept buffers, but I hope you get what I intend to say

@qweeah
Copy link
Contributor

qweeah commented Dec 11, 2023

@ProbstDJakob Besides from the seek operation, what you described is already implemented in here.

P.S. I think this discussion has gone too far from this issue and I have created #1200 so we can continue there.

@ProbstDJakob
Copy link

ProbstDJakob commented Dec 11, 2023

The following script is a real world example where streaming could come in handy.

Background

We fully manage the life cycle of an OpenShift cluster via a GitLab Pipeline. When creating a cluster with the openshift-install tool some files like terraform state and kube-configs will be created. Those files are needed during the whole life cycle of the cluster (not only in the current pipeline), thus they need to be stored persistently. In our case we use the existing GitLab registry and oras to create an image.

Current way to pull the artifacts from the registry

#!/usr/bin/env sh
set -eu

# [...] some preparations

tempDir="$(mktemp -d)"
oras pull --output "$tempDir" "$ENCRYPTED_OPENSHIFT_INSTALL_ARTIFACTS_IMAGE"
sops --decrypt --input-type binary --output-type binary "$tempDir/openshift-install-artifacts.tar.gz.enc" \
  | tar -xzC "$CI_PROJECT_DIR"
rm -rf "$tempDir"

Possible way to pull the artifacts from the registry with pipelining

#!/usr/bin/env sh
set -eu

# [...] some preparations

oras pull --output - "$ENCRYPTED_OPENSHIFT_INSTALL_ARTIFACTS_IMAGE" \
  | sops --decrypt --input-type binary --output-type binary /dev/stdin \
  | tar -xzC "$CI_PROJECT_DIR"

This way there is no need to create a temporary directory and to know how the file is called within the image (not a problem for us since we named it within the same repo).

Counterpart

See #1200 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
Status: No status
Development

No branches or pull requests

10 participants