-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build: use rclone for sync #9842
Conversation
67fb57d
to
4382b7c
Compare
readthedocs/storage/rclone.py
Outdated
# TODO: Fail or let the caller decide what to do? | ||
check=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this one, we could manually check the exit code or catch the exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure to understand the doubt here. What are the different scenarios and their pros/cons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the command exists with a status different than 0 with check=True
it will raise an exception, if false (the default), then it won't raise an exception and continue normally (but you can still see check for the exit code manually)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should wrap it into a try/except block and raise a custom exception that we can handle from outside properly.
Excellent PR description 💯
Do you want to create an issue in -ops and assign it to me? or do you want to modify the Salt recipe yourself? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did an initial and quick review and it looks good 💯 ! I have to come back and review some parts more in-depth yet, tho.
What's the plan to understand if it's faster, and how much, than the current approach? I suppose we should dump the current metrics we have in NR somewhere else. Then, enable this feature flag for some of those projects, dump the new data and compare.
provider = "AWS" | ||
# If a custom endpoint URL is given and | ||
# we are running in DEBUG mode, use minio as provider. | ||
if self.endpoint_url and settings.DEBUG: | ||
provider = "minio" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a setting instead of having this hardcoded here? Like RTD_RCLONE_PROVIDER
so we can define different values depending on the environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already possible by using a different storage backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rclone class should be attached to the storage backend chosen, so I don't think there need for a different setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will ask this in a different way. Why do we need a settings.DEBUG
here? I'd like to avoid using this setting in a conditional and be able to define this in a setting from the environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minio is only used during development, doesn't seem to be a reason why we wouldn't use minion during development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MinIO is fine. My point is about avoid using settings.DEBUG
in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! I'm suggesting thinking about changing the approach here and instead of using subprocess.run
to execute the command from the host, I'd propose to do it from inside the container instead. That will solve lot of issues regarding symlinks and accessing un-authorized files from the host where our Python application is running.
"--", | ||
*args, | ||
] | ||
env = os.environ.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason why we want to use the same environment than where the process is running? can't we just pass only the variable from self.env_vars
only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you pass additional env vars, they override ALL the env variables, this means that other env vars like PATH will be undefined.
env.update(self.env_vars) | ||
log.info("Executing rclone command.", command=command) | ||
log.debug("env", env=env) | ||
result = subprocess.run( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subprocess.run
will execute the command from the host machine. Wouldn't it be better to execute the command from inside the container instead? That will give us a lot more security, and reduce the attack vector since users won't be able to access any file from the host at all.
I think it's possible since we have the files to be uploaded in the host, and we only need to pass the correct environment variables only to that particular command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this with Eric some weeks ago, to make it secure it should be run from another container, otherwise the user could manipulate the executable to expose the secret env vars we pass to it. I'll be +1 on exploring that idea later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! 💯 We should have some integrity checking before executing or similar. Sounds good to explore in a future iteration 👍🏼
readthedocs/storage/rclone.py
Outdated
# TODO: Fail or let the caller decide what to do? | ||
check=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should wrap it into a try/except block and raise a custom exception that we can handle from outside properly.
since right now we are re-uploading all files.
To test this, you need to re-build your docker containers.
Closes #9448