[EvmDatabaseOps] Fix .validate_free_space target #18745

NickLaMuro · 2019-05-08T00:43:26Z

Currently, db_opts[:local_file] will always be a FIFO with the new file_storage mechanisms, so .validate_free_space will only check against the tmp dir, which usually isn't very large and not the target destination for the DB backup/dump.

This fix moves the check to a place were we can get access to the file_storage object directly and target the directory of the mounted filesystem that will receive the backup/dump.

To that end, the check now only applies to file storage classes that are "mountable", since checking against non-mountable storages don't make sense (s3/swift don't have this capability, and they are assumed to be "infinite" anyway, and I don't think you can query FTP for available storage as well so I think the same assumption is also fair to apply)

Links

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1703278

Steps for Testing/QA

I think this should work (in theory), but currently looking into a way to validate this or possibly add a spec to confirm this functionality is working as expected. Once this is done, I will remove the [WIP] label.

lib/evm_database_ops.rb

carbonin

This looks good to me assuming you've tested it out. Not sure there's really a way to put this in a spec test...

lib/evm_database_ops.rb

djberg96

Looks like there's no spec for the validate_free_space singleton method in https://github.com/ManageIQ/manageiq/blob/master/spec/lib/evm_database_ops_spec.rb, should probably add one.

NickLaMuro · 2019-05-08T21:18:06Z

@djberg96 @carbonin Thanks guys! Working on getting my old test suite up and running again (so others can validate against it as well), but it turns out I left a lot of stuff un-committed since I last used it, and am working out some of the setup kinks currently. Will post again once that is more easily consumable by others.

This looks good to me assuming you've tested it out.

Well... if @djberg96 's comment is any indication, I haven't... yet...

More below.

Not sure there's really a way to put this in a spec test...

Looks like there's no spec for the validate_free_space singleton method...

In response to Nick, that is what I was afraid of and am unsure the best way to go about testing it. I currently also don't have a way of validating against a DB that goes over the limit locally since the once that I seed in my test DB only ends up to be about 10MB, and the /tmp dir on the appliance VM I am running is configured to 1Gig.

This is the main reason this is still marked as [WIP] since I would like to validate this in some way or another before shipping.

djberg96 · 2019-05-08T21:51:01Z

For testing this library may help: https://github.com/fakefs/fakefs

NickLaMuro · 2019-05-08T22:24:11Z

@djberg96 I don't know that it will, for a few reasons.

First off, and it is mentioned in the Caveats section of the README for FakeFS is that it is not a "Filesystem replacement", and doesn't do anything under the hood to emulate a filesystem/shell level to emulate a file system, just stub some constants. Unfortunately, the code that validate_free_space eventually calls is just a shell out:

manageiq/lib/evm_database_ops.rb

Lines 19 to 23 in a68811e

    
           free_space = begin 
        
             output = MiqUtil.runcmd("df -P #{parent_directory}") 
        
             data_line = output.split("\n")[1] if output.kind_of?(String) 
        
             data_line.split[3].to_i * 1024 if data_line 
        
           end

So I don't see that being too much of a help without having to work around that lib some more.

The (other) big thing that we really need to test here is the interaction of validate_free_space with what we are working with, which is both a FIFO that we create to stream the output from pg_basebackup/pg_basedump and a target file and/or remote endpoint (and we only really want to validate against mounted file systems).

And that is really where my trouble testing this lies, because it ends up requiring a complex setup to do, and as @carbonin mentioned, not sure it is really something we can simulate this in a spec...

That said, I have just updated my test suite gist:

https://gist.github.com/NickLaMuro/8438015363d90a2d2456be650a2b9bc6

And that should be a good way to simulate things and make sure the check doesn't cause syntax errors and such. I am going to apply the patch and make sure things work as expected, and will report back when I do.

NickLaMuro · 2019-05-08T22:35:04Z

Quick Note: I am running the test suite with the patch applied and everything seems to be working as expected still.

I would still like to find a failure case that simulates when the DB is too large for /tmp, but perfectly fine for share that it is being sent to. I think resizing the VM's temp dir to be super small is probably the best way to do that, but I will have to look into how I can do that via Vagrant/Virtualbox...

Will update when I figure that out.

djberg96 · 2019-05-08T23:44:14Z

Not a fan of shelling out in general, and there's the sys-filesystem library if you're interested.

Poking around a bit there's a way to create a file and make its own filesystem, though I've never tried it. Would this help?

https://en.wikipedia.org/wiki/Loop_device

NickLaMuro · 2019-05-09T00:55:47Z

... and there's the sys-filesystem library if you're interested.

And I wonder who the author of that might be? 😉

Not a fan of shelling out in general...

That makes two of us! 😄

However, this fix is scheduled for a backport, so I don't want to be making a change for that in this PR. Too much extra risk and kinda out of scope for this PR, but a definite possibly in the future.

NickLaMuro · 2019-05-09T00:57:36Z

However... this loop device stuff has piqued my interest... looking 👀

djberg96 · 2019-05-09T01:12:44Z

More info: https://www.thegeekdiary.com/how-to-create-virtual-block-device-loop-device-filesystem-in-linux/

Note: Already noticing some differences on Mac with the dd command.

NickLaMuro · 2019-05-09T01:30:17Z

Heh, found that link already 😄

djberg96 · 2019-05-09T01:32:41Z

More info for Macs: https://apple.stackexchange.com/questions/9284/does-mac-have-something-similar-to-a-linux-loop-device-alternative-to-losetup

NickLaMuro · 2019-05-14T03:12:00Z

Pushed a [WIP] commit so the two of you can take a look (if you want) of where I am heading with this. I still need to add Linux support and probably some more documentation around how to use this library, but wanted to show working-ish support for OSX.

Load and fire the 🍅 's!

NickLaMuro · 2019-05-15T16:28:42Z

Bah....

creating /home/travis/build/ManageIQ/manageiq/tmp/fake_fs.img...
losetup: /dev/loop0: failed to set up loop device: Permission denied
umount: /dev/loop0: not mounted

@djberg96 @carbonin Okay, decision time...

So the above shows up in travis because we don't have (I assume sudo) permissions on Travis, which makes sense. Question is: Do we want to enable them just to add these specs I have added?

I have already tested this else where, and while this is #cool™, I am not married to this code, and if it is just going to be super difficult to get this to run on CI, it might not be worth it.

Thoughts?

djberg96 · 2019-05-15T17:07:06Z

@NickLaMuro Possibly silly question, but is this something we could wrap using https://ruby-doc.org/core-2.6.1/Process/UID.html within the spec?

NickLaMuro · 2019-05-15T17:10:13Z

@djberg96 worth a shot. Let me try some things out.

NickLaMuro · 2019-05-15T17:12:45Z

Of note, however, I think we are not using the sudo flag in our .travis.yml, which is what is required to probably use that:

https://stackoverflow.com/a/26304928

NickLaMuro · 2019-05-15T18:40:02Z

Process::UID doesn't seem to work (see recent commit message). I think what I tried in the last commit is the best I can do, since I don't think that anybody is willing to change the sudo config in Travis for this one change.

NickLaMuro · 2019-05-15T19:30:44Z

Huh... that last commit actually worked...

Well, @carbonin and @djberg96 , thoughts? (Edit: Specifically, should we go forward with these specs, or should I remove all of them and just leave it as a code change only... or or... do you have a different approach you would prefer for the specs?)

djberg96 · 2019-05-15T20:18:28Z

@NickLaMuro I'm ok with it. I'd definitely like to keep the specs (as long as they won't blow up on my Mac).

carbonin · 2019-05-16T17:34:38Z

spec/lib/evm_database_ops_spec.rb

+      #
+      # This "stubs the stub" to have it act like a MiqSmbSession, when in
+      # fact, it isn't.
+      allow(file_storage).to receive(:class).and_return(MiqSmbSession)


Is there any reason this can't live in the definition of the double? For example, will it break other specs? It seems like a reasonable stub in any case.

Since I am fixing the specs for this (without my patch):

Yes, this is necessary to be here. It causes a bunch of other specs to fail if you do it globally.

carbonin · 2019-05-16T17:37:24Z

spec/support/fake_tmpdir_helper.rb

+# have a "fake tmp dir".  It will do all of the setup and cleanup within the
+# block.
+#
+# Note:  This class is most likely **NOT** thread safe currently


Will this be a problem for parallel tests? I mean probably not right now because there is only one spec using it ...

carbonin · 2019-05-16T17:39:32Z

spec/support/fake_tmpdir_helper.rb

+        def attach_tmp_fs
+          @dev_disk = `losetup -f`.chomp.strip
+          cmds = [
+            "sudo losetup #{@dev_disk} #{tmp_fs_file}",


Does this work running locally on a linux workstation? I'd imagine the suite would halt if you didn't run it as root, right?

Just to answer this:

The assumption (which is with sudo on Vagrant on Travis, and my two places I tested) is that using it with those users doesn't require entering a password. So to your question, I assume if sudo does require a password on your machine, then yes, the scenario you described probably would happen.

So another good reason to can this stuff for now 😄

jrafanie · 2019-05-16T18:00:14Z

Well, @carbonin and @djberg96 , thoughts? (Edit: Specifically, should we go forward with these specs, or should I remove all of them and just leave it as a code change only... or or... do you have a different approach you would prefer for the specs?)

While I'm super impressed you got this to work, I'm concerned by the complexity of the spec helper. Filesystem test contamination is a thing we've hit before where parallel tests are accessing the filesystem at the same time. I'm not sure the value these tests add would exceed the complexity cost of maintaining it.

I'm super late to reviewing this but was camcorder an option? If we could record the interactions against a live system without enough disk space and play it back for regression testing, we'd only have to deal with the maintenance of a system to record the cassettes against. It could be an awful idea, I don't know.

NickLaMuro · 2019-05-16T22:04:14Z

So @jrafanie and I had a bit of a talk today, and decided against adding tests at this time.

We discussed possibly adding better testing using camcoder. But for right now, we just decided it makes sense to just test manually and fix the existing tests.

NickLaMuro · 2019-05-16T22:22:35Z

Also, I left the old code for reference here:

https://github.com/NickLaMuro/manageiq/tree/fix_database_backup_diskspace_check_old

Currently, `db_opts[:local_file]` will always be a FIFO with the new file_storage mechanisms, so `.validate_free_space` will only check against the tmp dir, which usually isn't very large and not the target destination for the DB backup/dump. This fix moves the check to a place were we can get access to the file_storage object directly and target the directory of the mounted filesystem that will receive the backup/dump. To that end, the check now only applies to file storage classes that are "mountable", since checking against non-mountable storages don't make sense (s3/swift don't have this capability, and they are assumed to be "infinite" anyway, and I don't think you can query FTP for available storage as well so I think the same assumption is also fair to apply) * * * Quick note: In a couple of spots in the tests, the following is added: allow(file_storage).to receive(:class).and_return(MiqSmbSession) Despite `file_storage` being defined above with: let(:file_storage) { double("MiqSmbSession", :disconnect => nil) } This is now required after moving validate_free_space to inside the with_file_storage method, and wrapping it in a class check of: if file_storage.class <= MiqGenericMountSession This "stubs the stub" to have it act like a MiqSmbSession, when in fact, it isn't. Unfortunately, this can't be used globally since it causes a bunch of other tests failures.

miq-bot · 2019-05-17T02:56:10Z

Checked commits NickLaMuro/manageiq@88f99d7~...cb13d74 with ruby 2.3.3, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0
2 files checked, 0 offenses detected
Everything looks fine. 👍

NickLaMuro · 2019-05-17T15:12:19Z

@jrafanie @carbonin @djberg96 Are we good with this now?

More info regarding the current state of things: #18745 (comment)

djberg96 · 2019-05-17T15:34:46Z

👍

…ce_check [EvmDatabaseOps] Fix .validate_free_space target (cherry picked from commit 6642bf6) Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1717025

simaishi · 2019-06-10T13:36:13Z

Hammer backport details:

$ git log -1
commit c3c65b640900e0e5381c91a43ba414cb647d770a
Author: Nick Carboni <[email protected]>
Date:   Fri May 17 11:39:09 2019 -0400

    Merge pull request #18745 from NickLaMuro/fix_database_backup_diskspace_check
    
    [EvmDatabaseOps] Fix .validate_free_space target
    
    (cherry picked from commit 6642bf629f9e642e3cc8665d6ca52580f3e2c01d)
    
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1717025

This is some test cases for: https://bugzilla.redhat.com/show_bug.cgi?id=1703278 which provides some failing tests for when `validate_free_space` checks the wrong directory, and helps validate that the fix here: ManageIQ/manageiq#18745 Does what it is supposed to. Thanks a bunch to Dan Berger for the assist on this and pointing me in the direction to use loop back devices. Turned out to be a much better solution then what I was considering with vagrant directly. How it works ------------ This patch effectively does three things: First, it adds a provision step on VM boot to add a loop back device (or a file that acts like a file system) to represent a "fake tmp" directory that is under 2MB in size after file system overhead. The process for building this can be found in the "mount_fake_tmp" provision step, but the file is writable to just like a normal mountable file system, and since it is so small it won't fit even our ~2MB database dumps. Of note, something that was require was the `chmod +t`, which adds the sticky bit on to the dir. This is essential for the next part which... Secondly, adds a patch to `tmpdir` which forces `Dir.tmpdir` to return `/fake_tmp` instead of `/tmp`. This part we override both the `@@systmpdir` var and the `ENV['TMPDIR']` values since either one could be used in determining the `Dir.tmpdir` depending on the value of `$SAFE`. Since usually we aren't using `@@systmpdir`, the `chmod +t ...` becomes important here since it allows the `File::Stat#sticky?` check to pass, allowing `/fake_tmp` dir to be a vaild option, and not fall back to `/tmp` instead. Finally, we inject the patch into some tests to make sure that those tests still function when the "tmp" directory has a file that is smaller then the resulting dump/backup. These new tests fail without the patch from above, and pass with it. (transferred from https://gist.github.com/NickLaMuro/8438015363d90a2d2456be650a2b9bc6/bf256dd6a2f385a78cf945e4efea5084f63812e1)

Since a patch was required to fix an issue with hammer: ManageIQ/manageiq#18745 Certain specs do not work with the rake tasks, and causes a decent amount of errors. As a result, the simplest way forward was to just simply switch to using a master appliance for the time being. However, the downside to this is that "master" isn't an available option to download from vagrantup.com: https://app.vagrantup.com/manageiq So downloading and installing an appliance from: http://releases.manageiq.org/ is required for this to work. A more pragmatic solution in the future might be to allow setting a custom box, but defaulting to a box type that available from vagrantup.com. Steps to download/install manageiq/master: ------------------------------------------ Since the `manageiq/master` appliance isn't available as valid release from https://app.vagrantup.com/manageiq/ , you can use the following script to download a versioned copy of it: NickLaMuro/miq_tools#13 And add it to vagrant by running the following: $ ./miq_vagrant_master/cli --version 20190629

miq-bot added the wip label May 8, 2019

NickLaMuro commented May 8, 2019

View reviewed changes

lib/evm_database_ops.rb Outdated Show resolved Hide resolved

NickLaMuro commented May 8, 2019

View reviewed changes

lib/evm_database_ops.rb Show resolved Hide resolved

carbonin reviewed May 8, 2019

View reviewed changes

lib/evm_database_ops.rb Outdated Show resolved Hide resolved

djberg96 reviewed May 8, 2019

View reviewed changes

lib/evm_database_ops.rb Outdated Show resolved Hide resolved

djberg96 suggested changes May 8, 2019

View reviewed changes

NickLaMuro force-pushed the fix_database_backup_diskspace_check branch from eddbaca to 7f73880 Compare May 8, 2019 22:31

NickLaMuro force-pushed the fix_database_backup_diskspace_check branch from fe63477 to 1155e07 Compare May 14, 2019 03:17

NickLaMuro force-pushed the fix_database_backup_diskspace_check branch from e9ed8cf to 02e08af Compare May 16, 2019 16:06

NickLaMuro changed the title ~~[WIP][EvmDatabaseOps] Fix .validate_free_space target~~ [EvmDatabaseOps] Fix .validate_free_space target May 16, 2019

miq-bot removed the wip label May 16, 2019

carbonin reviewed May 16, 2019

View reviewed changes

[EvmDatabaseOpsSpec] Remove unneeded stubs

88f99d7

NickLaMuro force-pushed the fix_database_backup_diskspace_check branch from 02e08af to 295c04e Compare May 16, 2019 21:32

NickLaMuro force-pushed the fix_database_backup_diskspace_check branch from 295c04e to cb13d74 Compare May 17, 2019 02:55

carbonin approved these changes May 17, 2019

View reviewed changes

carbonin self-assigned this May 17, 2019

carbonin added bug changelog/yes core hammer/yes labels May 17, 2019

carbonin merged commit 6642bf6 into ManageIQ:master May 17, 2019

carbonin added this to the Sprint 112 Ending May 27, 2019 milestone May 17, 2019

simaishi added hammer/backported and removed hammer/yes labels Jun 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[EvmDatabaseOps] Fix .validate_free_space target #18745

[EvmDatabaseOps] Fix .validate_free_space target #18745

NickLaMuro commented May 8, 2019

carbonin left a comment

djberg96 left a comment •

edited

Loading

NickLaMuro commented May 8, 2019

djberg96 commented May 8, 2019

NickLaMuro commented May 8, 2019 •

edited

Loading

NickLaMuro commented May 8, 2019

djberg96 commented May 8, 2019

NickLaMuro commented May 9, 2019

NickLaMuro commented May 9, 2019

djberg96 commented May 9, 2019 •

edited

Loading

NickLaMuro commented May 9, 2019

djberg96 commented May 9, 2019

NickLaMuro commented May 14, 2019 •

edited

Loading

NickLaMuro commented May 15, 2019

djberg96 commented May 15, 2019

NickLaMuro commented May 15, 2019

NickLaMuro commented May 15, 2019

NickLaMuro commented May 15, 2019

NickLaMuro commented May 15, 2019 •

edited

Loading

djberg96 commented May 15, 2019

carbonin May 16, 2019

NickLaMuro May 16, 2019

carbonin May 16, 2019

carbonin May 16, 2019

NickLaMuro May 16, 2019

jrafanie commented May 16, 2019

NickLaMuro commented May 16, 2019

NickLaMuro commented May 16, 2019

miq-bot commented May 17, 2019

NickLaMuro commented May 17, 2019

djberg96 commented May 17, 2019

simaishi commented Jun 10, 2019

[EvmDatabaseOps] Fix .validate_free_space target #18745

[EvmDatabaseOps] Fix .validate_free_space target #18745

Conversation

NickLaMuro commented May 8, 2019

Links

Steps for Testing/QA

carbonin left a comment

Choose a reason for hiding this comment

djberg96 left a comment • edited Loading

Choose a reason for hiding this comment

NickLaMuro commented May 8, 2019

djberg96 commented May 8, 2019

NickLaMuro commented May 8, 2019 • edited Loading

NickLaMuro commented May 8, 2019

djberg96 commented May 8, 2019

NickLaMuro commented May 9, 2019

NickLaMuro commented May 9, 2019

djberg96 commented May 9, 2019 • edited Loading

NickLaMuro commented May 9, 2019

djberg96 commented May 9, 2019

NickLaMuro commented May 14, 2019 • edited Loading

NickLaMuro commented May 15, 2019

djberg96 commented May 15, 2019

NickLaMuro commented May 15, 2019

NickLaMuro commented May 15, 2019

NickLaMuro commented May 15, 2019

NickLaMuro commented May 15, 2019 • edited Loading

djberg96 commented May 15, 2019

carbonin May 16, 2019

Choose a reason for hiding this comment

NickLaMuro May 16, 2019

Choose a reason for hiding this comment

carbonin May 16, 2019

Choose a reason for hiding this comment

carbonin May 16, 2019

Choose a reason for hiding this comment

NickLaMuro May 16, 2019

Choose a reason for hiding this comment

jrafanie commented May 16, 2019

NickLaMuro commented May 16, 2019

NickLaMuro commented May 16, 2019

miq-bot commented May 17, 2019

NickLaMuro commented May 17, 2019

djberg96 commented May 17, 2019

simaishi commented Jun 10, 2019

djberg96 left a comment •

edited

Loading

NickLaMuro commented May 8, 2019 •

edited

Loading

djberg96 commented May 9, 2019 •

edited

Loading

NickLaMuro commented May 14, 2019 •

edited

Loading

NickLaMuro commented May 15, 2019 •

edited

Loading