Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Update upload/download workflow for MiqS3Session #17798

Conversation

NickLaMuro
Copy link
Member

This is an update to the changes made in #17689 in regards to the changes to EvmDatabaseOps only.

The changes here are necessary to allow #17652 to function again, which will be updated to include these changes as well.

Background

The process in which the FTP and S3 support for database backups took two different approaches to handling them.

For S3 route, it was planned to treat it as a psuedo mount, but that meant that the upload for the backup, and the download for the restore had to be done manually (unlike the other mounted filesystems).

The plan for splitting files (and eventually FTP), was to use the file splitter to pipe the results into the FTP endpoint, and that would work the same for the mounts (sans the FTP bit) since they are just file systems and it will dump the files locally as it does.

Links

Steps for Testing/QA

Like my other PRs, I have been using the gist from above to test the changes against all the current file systems in the console. This does not, however, test MiqS3Session since I was unaware of it existing previously, so testing that will need to happen separately.

@NickLaMuro
Copy link
Member Author

@jerryk55 @carbonin Here is the change I am thinking as a stop gap. It works for me, but would need some tests for the s3 bit. Fine with a different approach to move forward as well.

Can discuss tomorrow.

@NickLaMuro
Copy link
Member Author

cc @Fryguy @roliveri

@carbonin
Copy link
Member

carbonin commented Aug 3, 2018

I agree with the idea that backing up locally then uploading subsequently is sub-optimal based on the file size argument.

At a first glance I don't like the idea of the specific class showing through in this part of the code as in the ideal case, the caller should be able to use any of the sessions identically. As a separate design issue I don't think the actual upload or download should be done in the with_mount_session method, but I guess we could rename/refactor that concern away. (Also I know you mentioned this was a fairly serious hack so I apologize for nit-picking on the first pass)

With all that in mind, it seems to me that the method of uploading/downloading the file (especially if there is some "trick" to it like splitting) might be better suited to living in the actual session object as the implementation of #add/#download. After working with @jerryk55 on his PRs I grew to like the interface of MiqGenericMountSession and would like to push filesystem-specific logic there.

Again, this is just a first thought and we can discuss more in the meeting. I already feel like this could be difficult because of the way the file splitting is effectively "streaming" the backup file to the "mount", but maybe we can alter the interface of #add in some way to make that work for both the mounted (NFS, SMB + friends) and the other (S3, Cinder, FTP) types of stores.

@NickLaMuro
Copy link
Member Author

@carbonin A few comments in line as a rebuttal. I would rather have these talking points on paper for myself or I will forget them:

Also I know you mentioned this was a fairly serious hack so I apologize for nit-picking on the first pass

No problem, and I left out many details in the description since I had mentioned it else where (out of band), but I will comment on my rational in the subsequent points:

At a first glance I don't like the idea of the specific class showing through in this part of the code as in the ideal case...

I get this, but this is the special case here, and was specifically what broke when running this with splits. In the previous versions of the code, the rake task did not need to know that the file output happened, because that responsibility was passed on to PostgresAdmin, and the with_mount_session was ONLY being used to setup a mount as a file system (if needed), and the assumption that regular file system operations .

As the person that refactored to make with_mount_session (because I would have been duplicating a lot of code for dump otherwise if I hadn't), I fully admit that it is named poorly. In reality, it probably should be called some like with_mount_session_if_needed, or parse_upload_options_and_mount_if_needed. And the code that was a result from #17689 is exactly to that point: sometimes, there isn't a session at all, so the "gross and disgusting" &. operator was used because of that specifically (Note: I hate the &. operator and will continually refuse to use it...).

With all that in mind, it seems to me that the method of uploading/downloading the file (especially if there is some "trick" to it like splitting) might be better suited to living in the actual session object as the implementation of #add/#download

I humbly disagree.

The MiqS3Session is the odd duck here and is breaking form from the other MiqGenericMountSession subclasses as it isn't actually mounting a filesystem. You can even see that in it's implementation, as it the only class out of the four subclasses that needs to override those methods. So while the code in this PR is made more "hacky" here as a result, it is because the pattern was broken in MiqS3Session in the first place, and adjustments had to be made here accordingly to account for it being "special".

... I grew to like the interface of MiqGenericMountSession and would like to push filesystem-specific logic there.

Again, my big issue with this is S3 is NOT a "filesystem", or at least can easily quack like one.
For the FileDepot code, however, this actually makes perfect for S3 to be included as a subclass because the implementation was already making use of the upload features of that in the places it is used.

The use case that existed here, however, pretty much assumed that PostgresAdmin will be "uploading" the file, so instead it is more like we are doing:

Dir.chdir do
  `pg_dump ...` # (or `pg_basebackup`/`pg_restore`)
end

Again, FTP has the exact same problems as s3 as it can't really act like a file-system, and since we weren't using the FileDepot code here, but the mounting code (which made sense for the filesystems we were supporting at the time), we didn't need to make use of #add/#download.

I already feel like this could be difficult because of the way the file splitting is effectively "streaming" the backup file to the "mount", but maybe we can alter the interface of #add in some way to make that work for both the mounted (NFS, SMB + friends) and the other (S3, Cinder, FTP) types of stores.

The problem with using #add is that it doesn't interface well with using an external program to generate the data and streaming it. It is designed to take an existing artifact and upload it that (or the inverse for #download. As mentioned above, we were using the mounting code specifically as a mounting interface, and letting PostgresAdmin handle the upload/download with the mount already in place on the filesystem.

I will say, I 100% agree with you concern about FileSplitter being hard to add to, and there is definitely some work that could be done there (fully open to feedback). In reality maybe it should be renamed to FileStreamer that has "file splitting capabilities", since that is really what it is going to be used as.

I do, again, think we should have a distinction between "mountables" and "upload endpoints" as neither s3 or FTP are designed to be "mountable". Then technically "can" be mounted, like s3fs for example, but they are really hacks that seem to be un-reliable and not officially supported. s3, however, can also support a multipart upload and is much more how it's API and interface is designed then trying to fit it into the "mounting" model.

@NickLaMuro NickLaMuro force-pushed the s3_specific_backup_upload_for_evm_database_ops branch 2 times, most recently from d4991a0 to d251995 Compare August 3, 2018 17:29
In EvmDatabaseOps, this change makes it so only the S3 mount will
attempt an upload/download on the necessary actions, and the rest will
remain as they were.
@NickLaMuro NickLaMuro force-pushed the s3_specific_backup_upload_for_evm_database_ops branch from d251995 to 24cbe08 Compare August 3, 2018 17:33
@miq-bot
Copy link
Member

miq-bot commented Aug 3, 2018

Checked commit NickLaMuro@24cbe08 with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0
2 files checked, 2 offenses detected

lib/evm_database_ops.rb

@NickLaMuro
Copy link
Member Author

Going to set this as [WIP] for now, since our direction might change with this in the future.

Might switch from piping to a sub process to streaming back into the main process, and the interface might change a bit from what we have here.

@NickLaMuro NickLaMuro changed the title Update upload/download workflow for MiqS3Session [WIP] Update upload/download workflow for MiqS3Session Aug 3, 2018
@miq-bot miq-bot added the wip label Aug 3, 2018
@NickLaMuro
Copy link
Member Author

As I am pretty confident at this point the way moving forward and the comment above will be addressed by this PR:

ManageIQ/manageiq-gems-pending#361

I am going to close this PR as a result. It is referenced elsewhere if need it, but I doubt we will go in this direction.

Plus it is taking up yet another tab in my browser...

@NickLaMuro NickLaMuro closed this Aug 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants