Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build: use rclone for sync #9842

Merged
merged 7 commits into from
Jan 25, 2023
Merged

Build: use rclone for sync #9842

merged 7 commits into from
Jan 25, 2023

Conversation

stsewd
Copy link
Member

@stsewd stsewd commented Dec 23, 2022

To test this, you need to re-build your docker containers.

Closes #9448

@stsewd stsewd force-pushed the rclone branch 6 times, most recently from 67fb57d to 4382b7c Compare January 11, 2023 23:24
@stsewd stsewd marked this pull request as ready for review January 11, 2023 23:39
@stsewd stsewd requested a review from a team as a code owner January 11, 2023 23:39
@stsewd stsewd requested a review from humitos January 11, 2023 23:39
Comment on lines 91 to 92
# TODO: Fail or let the caller decide what to do?
check=True,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this one, we could manually check the exit code or catch the exception

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to understand the doubt here. What are the different scenarios and their pros/cons?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the command exists with a status different than 0 with check=True it will raise an exception, if false (the default), then it won't raise an exception and continue normally (but you can still see check for the exit code manually)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should wrap it into a try/except block and raise a custom exception that we can handle from outside properly.

@humitos
Copy link
Member

humitos commented Jan 12, 2023

@stsewd

Excellent PR description 💯

We need to install rclone in our builders for this to work.

Do you want to create an issue in -ops and assign it to me? or do you want to modify the Salt recipe yourself?

Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did an initial and quick review and it looks good 💯 ! I have to come back and review some parts more in-depth yet, tho.

What's the plan to understand if it's faster, and how much, than the current approach? I suppose we should dump the current metrics we have in NR somewhere else. Then, enable this feature flag for some of those projects, dump the new data and compare.

readthedocs/projects/tasks/builds.py Outdated Show resolved Hide resolved
readthedocs/storage/rclone.py Outdated Show resolved Hide resolved
readthedocs/storage/rclone.py Outdated Show resolved Hide resolved
readthedocs/storage/rclone.py Outdated Show resolved Hide resolved
readthedocs/storage/rclone.py Show resolved Hide resolved
readthedocs/storage/rclone.py Outdated Show resolved Hide resolved
readthedocs/storage/rclone.py Show resolved Hide resolved
Comment on lines +27 to +31
provider = "AWS"
# If a custom endpoint URL is given and
# we are running in DEBUG mode, use minio as provider.
if self.endpoint_url and settings.DEBUG:
provider = "minio"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a setting instead of having this hardcoded here? Like RTD_RCLONE_PROVIDER so we can define different values depending on the environment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already possible by using a different storage backend.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rclone class should be attached to the storage backend chosen, so I don't think there need for a different setting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will ask this in a different way. Why do we need a settings.DEBUG here? I'd like to avoid using this setting in a conditional and be able to define this in a setting from the environment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minio is only used during development, doesn't seem to be a reason why we wouldn't use minion during development.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MinIO is fine. My point is about avoid using settings.DEBUG in the code.

readthedocs/builds/storage.py Show resolved Hide resolved
readthedocs/storage/rclone.py Show resolved Hide resolved
Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I'm suggesting thinking about changing the approach here and instead of using subprocess.run to execute the command from the host, I'd propose to do it from inside the container instead. That will solve lot of issues regarding symlinks and accessing un-authorized files from the host where our Python application is running.

readthedocs/storage/rclone.py Show resolved Hide resolved
readthedocs/storage/rclone.py Outdated Show resolved Hide resolved
readthedocs/storage/rclone.py Outdated Show resolved Hide resolved
"--",
*args,
]
env = os.environ.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why we want to use the same environment than where the process is running? can't we just pass only the variable from self.env_vars only?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you pass additional env vars, they override ALL the env variables, this means that other env vars like PATH will be undefined.

readthedocs/storage/rclone.py Outdated Show resolved Hide resolved
env.update(self.env_vars)
log.info("Executing rclone command.", command=command)
log.debug("env", env=env)
result = subprocess.run(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subprocess.run will execute the command from the host machine. Wouldn't it be better to execute the command from inside the container instead? That will give us a lot more security, and reduce the attack vector since users won't be able to access any file from the host at all.

I think it's possible since we have the files to be uploaded in the host, and we only need to pass the correct environment variables only to that particular command.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this with Eric some weeks ago, to make it secure it should be run from another container, otherwise the user could manipulate the executable to expose the secret env vars we pass to it. I'll be +1 on exploring that idea later.

Copy link
Member

@humitos humitos Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! 💯 We should have some integrity checking before executing or similar. Sounds good to explore in a future iteration 👍🏼

Comment on lines 91 to 92
# TODO: Fail or let the caller decide what to do?
check=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should wrap it into a try/except block and raise a custom exception that we can handle from outside properly.

readthedocs/storage/rclone.py Outdated Show resolved Hide resolved
readthedocs/storage/rclone.py Show resolved Hide resolved
@stsewd stsewd merged commit c48e5eb into main Jan 25, 2023
@stsewd stsewd deleted the rclone branch January 25, 2023 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build: improve upload step
2 participants