Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 arguments #8

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

S3 arguments #8

wants to merge 9 commits into from

Conversation

kimpham54
Copy link
Collaborator

@kimpham54 kimpham54 commented Dec 19, 2024

new arguments for you

./bin/docker-run.sh s3-file -h
./bin/docker-run.sh s3 -h

Testing steps:

  1. install as usual,
python3 -m venv myenv
source myenv/bin/activate
export PYTHONPATH="${PYTHONPATH}:src"
pip install -r requirements.txt
python src/jp2_remediator/main.py -h

  • move test files in repo
  • mkdir logs
  • uncomment in .bin/docker-run.sh the option to pass aws credentials if you want to use aws
  • create aws credentials and place in .env file in bin directory:
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_SESSION_TOKEN=

5. ./bin/docker-build.sh

  1. docker entrypoint/docker exec

container has to be alive, e.g. docker run --rm --mount type=bind,source=${PWD},target=/data -it --entrypoint /bin/bash artifactory.huit.harvard.edu/lts/jp2_remediator $@

./bin/docker-run.sh file /data/[your_test_images_folder]/[testfile.jp2]
./bin/docker-run.sh directory /data/[your_test_images_folder]

INPUT BUCKET/PREFIX > OUTPUT BUCKET/PREFIX
only input bucket name is required, other arguments are optional and would refer to the default input bucket as output

./bin/docker-run.sh s3 \
[input-bucket-name] \
--input-prefix [input/paths] \
--output-bucket [output-bucket-name] \
--output-prefix [output/paths]

INPUT FILE > OUTPUT FILE

./bin/docker-run.sh s3-file \
lts-jp2-remediation-dev \
--input-key [input-path/to/file.jp2] \
--output-bucket lts-jp2-remediation-dev \
--output-key [output-path/to/file.jp2] 

INPUT FILE > OUTPUT PREFIX
specifying output prefix instead of output key, appends _modified_yyyymmdd.jp2 to single file:

./bin/docker-run.sh s3-file \
lts-jp2-remediation-dev \
--input-key [input-path/to/file.jp2] \
--output-bucket [output-bucket-name] \
--output-prefix [output/paths]
  • if the "folder"/path doesn't exist in the specified --output-prefix for both s3 and s3-file, it will get created
  • a custom filename can be used in --output-key, or it defaults to appending _modified_yyyymmdd.jp2


# Download the file from S3
download_path = f"/tmp/{os.path.basename(input_key)}"
print(f"Downloading file: {input_key} from bucket: {input_bucket}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace all print statements with self.logger.info(...)

"s3-file", help="Process a single JP2 file in S3"
)
s3_file_parser.add_argument(
"input_bucket", help="Name of the AWS S3 bucket containing the JP2 file"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change from:

"input_bucket", help="Name of the AWS S3 bucket containing the JP2 file"

to:

"--input_bucket", help="Name of the AWS S3 bucket containing the JP2 file", required=True

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this still be required after adding the -- flag? and it will be --input-bucket instead of --input_bucket

s3_file_parser.add_argument(
"--input-key", help="Key (path) of the JP2 file in the S3 bucket", required=True
)
s3_file_parser.add_argument(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make required

s3_file_parser.add_argument(
"--output-bucket", help="Name of the AWS S3 bucket to upload the modified file (optional)"
)
s3_file_parser.add_argument(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this argument

Copy link
Collaborator Author

@kimpham54 kimpham54 Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to leave it as an option or remove completely? if it is optional but not used, it defaults to the input bucket

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep this one, after discussion with andrew

s3_file_parser.add_argument(
"--output-prefix", help="Prefix for the uploaded file in the output bucket (optional)", default=""
)
s3_file_parser.add_argument(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make required

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now it is optional, is there a situation where you wouldn't want to use the --output-prefix option, such as if you wanted to just put it in the same input directory, or same bucket?

@kimpham54
Copy link
Collaborator Author

  • take out s3 (bucket only) option, not needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants