Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EvmDatabaseOps] Fix .validate_free_space target #18745

Merged

Conversation

NickLaMuro
Copy link
Member

Currently, db_opts[:local_file] will always be a FIFO with the new file_storage mechanisms, so .validate_free_space will only check against the tmp dir, which usually isn't very large and not the target destination for the DB backup/dump.

This fix moves the check to a place were we can get access to the file_storage object directly and target the directory of the mounted filesystem that will receive the backup/dump.

To that end, the check now only applies to file storage classes that are "mountable", since checking against non-mountable storages don't make sense (s3/swift don't have this capability, and they are assumed to be "infinite" anyway, and I don't think you can query FTP for available storage as well so I think the same assumption is also fair to apply)

Links

Steps for Testing/QA

I think this should work (in theory), but currently looking into a way to validate this or possibly add a spec to confirm this functionality is working as expected. Once this is done, I will remove the [WIP] label.

@miq-bot miq-bot added the wip label May 8, 2019
lib/evm_database_ops.rb Outdated Show resolved Hide resolved
Copy link
Member

@carbonin carbonin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me assuming you've tested it out. Not sure there's really a way to put this in a spec test...

lib/evm_database_ops.rb Outdated Show resolved Hide resolved
lib/evm_database_ops.rb Outdated Show resolved Hide resolved
Copy link
Contributor

@djberg96 djberg96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there's no spec for the validate_free_space singleton method in https://github.com/ManageIQ/manageiq/blob/master/spec/lib/evm_database_ops_spec.rb, should probably add one.

@NickLaMuro
Copy link
Member Author

@djberg96 @carbonin Thanks guys! Working on getting my old test suite up and running again (so others can validate against it as well), but it turns out I left a lot of stuff un-committed since I last used it, and am working out some of the setup kinks currently. Will post again once that is more easily consumable by others.

This looks good to me assuming you've tested it out.

Well... if @djberg96 's comment is any indication, I haven't... yet...

More below.

Not sure there's really a way to put this in a spec test...

Looks like there's no spec for the validate_free_space singleton method...

In response to Nick, that is what I was afraid of and am unsure the best way to go about testing it. I currently also don't have a way of validating against a DB that goes over the limit locally since the once that I seed in my test DB only ends up to be about 10MB, and the /tmp dir on the appliance VM I am running is configured to 1Gig.

This is the main reason this is still marked as [WIP] since I would like to validate this in some way or another before shipping.

@djberg96
Copy link
Contributor

djberg96 commented May 8, 2019

For testing this library may help: https://github.com/fakefs/fakefs

@NickLaMuro
Copy link
Member Author

NickLaMuro commented May 8, 2019

@djberg96 I don't know that it will, for a few reasons.

First off, and it is mentioned in the Caveats section of the README for FakeFS is that it is not a "Filesystem replacement", and doesn't do anything under the hood to emulate a filesystem/shell level to emulate a file system, just stub some constants. Unfortunately, the code that validate_free_space eventually calls is just a shell out:

free_space = begin
output = MiqUtil.runcmd("df -P #{parent_directory}")
data_line = output.split("\n")[1] if output.kind_of?(String)
data_line.split[3].to_i * 1024 if data_line
end

So I don't see that being too much of a help without having to work around that lib some more.

The (other) big thing that we really need to test here is the interaction of validate_free_space with what we are working with, which is both a FIFO that we create to stream the output from pg_basebackup/pg_basedump and a target file and/or remote endpoint (and we only really want to validate against mounted file systems).

And that is really where my trouble testing this lies, because it ends up requiring a complex setup to do, and as @carbonin mentioned, not sure it is really something we can simulate this in a spec...


That said, I have just updated my test suite gist:

https://gist.github.com/NickLaMuro/8438015363d90a2d2456be650a2b9bc6

And that should be a good way to simulate things and make sure the check doesn't cause syntax errors and such. I am going to apply the patch and make sure things work as expected, and will report back when I do.

@NickLaMuro NickLaMuro force-pushed the fix_database_backup_diskspace_check branch from eddbaca to 7f73880 Compare May 8, 2019 22:31
@NickLaMuro
Copy link
Member Author

Quick Note: I am running the test suite with the patch applied and everything seems to be working as expected still.

I would still like to find a failure case that simulates when the DB is too large for /tmp, but perfectly fine for share that it is being sent to. I think resizing the VM's temp dir to be super small is probably the best way to do that, but I will have to look into how I can do that via Vagrant/Virtualbox...

Will update when I figure that out.

@djberg96
Copy link
Contributor

djberg96 commented May 8, 2019

Not a fan of shelling out in general, and there's the sys-filesystem library if you're interested.

Poking around a bit there's a way to create a file and make its own filesystem, though I've never tried it. Would this help?

https://en.wikipedia.org/wiki/Loop_device

@NickLaMuro
Copy link
Member Author

... and there's the sys-filesystem library if you're interested.

And I wonder who the author of that might be? 😉

Not a fan of shelling out in general...

That makes two of us! 😄

However, this fix is scheduled for a backport, so I don't want to be making a change for that in this PR. Too much extra risk and kinda out of scope for this PR, but a definite possibly in the future.

@NickLaMuro
Copy link
Member Author

However... this loop device stuff has piqued my interest... looking 👀

@djberg96
Copy link
Contributor

djberg96 commented May 9, 2019

More info: https://www.thegeekdiary.com/how-to-create-virtual-block-device-loop-device-filesystem-in-linux/

Note: Already noticing some differences on Mac with the dd command.

@NickLaMuro
Copy link
Member Author

Heh, found that link already 😄

@djberg96
Copy link
Contributor

djberg96 commented May 9, 2019

@NickLaMuro
Copy link
Member Author

NickLaMuro commented May 14, 2019

Pushed a [WIP] commit so the two of you can take a look (if you want) of where I am heading with this. I still need to add Linux support and probably some more documentation around how to use this library, but wanted to show working-ish support for OSX.

Load and fire the 🍅 's!

@NickLaMuro NickLaMuro force-pushed the fix_database_backup_diskspace_check branch from fe63477 to 1155e07 Compare May 14, 2019 03:17
@NickLaMuro
Copy link
Member Author

Bah....

creating /home/travis/build/ManageIQ/manageiq/tmp/fake_fs.img...
losetup: /dev/loop0: failed to set up loop device: Permission denied
umount: /dev/loop0: not mounted

@djberg96 @carbonin Okay, decision time...

So the above shows up in travis because we don't have (I assume sudo) permissions on Travis, which makes sense. Question is: Do we want to enable them just to add these specs I have added?

I have already tested this else where, and while this is #cool™, I am not married to this code, and if it is just going to be super difficult to get this to run on CI, it might not be worth it.

Thoughts?

@djberg96
Copy link
Contributor

@NickLaMuro Possibly silly question, but is this something we could wrap using https://ruby-doc.org/core-2.6.1/Process/UID.html within the spec?

@NickLaMuro
Copy link
Member Author

@djberg96 worth a shot. Let me try some things out.

@NickLaMuro
Copy link
Member Author

Of note, however, I think we are not using the sudo flag in our .travis.yml, which is what is required to probably use that:

https://stackoverflow.com/a/26304928

@NickLaMuro
Copy link
Member Author

Process::UID doesn't seem to work (see recent commit message). I think what I tried in the last commit is the best I can do, since I don't think that anybody is willing to change the sudo config in Travis for this one change.

@NickLaMuro
Copy link
Member Author

NickLaMuro commented May 15, 2019

Huh... that last commit actually worked...

Well, @carbonin and @djberg96 , thoughts? (Edit: Specifically, should we go forward with these specs, or should I remove all of them and just leave it as a code change only... or or... do you have a different approach you would prefer for the specs?)

@djberg96
Copy link
Contributor

@NickLaMuro I'm ok with it. I'd definitely like to keep the specs (as long as they won't blow up on my Mac).

@NickLaMuro NickLaMuro force-pushed the fix_database_backup_diskspace_check branch from e9ed8cf to 02e08af Compare May 16, 2019 16:06
@NickLaMuro NickLaMuro changed the title [WIP][EvmDatabaseOps] Fix .validate_free_space target [EvmDatabaseOps] Fix .validate_free_space target May 16, 2019
@miq-bot miq-bot removed the wip label May 16, 2019
#
# This "stubs the stub" to have it act like a MiqSmbSession, when in
# fact, it isn't.
allow(file_storage).to receive(:class).and_return(MiqSmbSession)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason this can't live in the definition of the double? For example, will it break other specs? It seems like a reasonable stub in any case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I am fixing the specs for this (without my patch):

Yes, this is necessary to be here. It causes a bunch of other specs to fail if you do it globally.

# have a "fake tmp dir". It will do all of the setup and cleanup within the
# block.
#
# Note: This class is most likely **NOT** thread safe currently
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be a problem for parallel tests? I mean probably not right now because there is only one spec using it ...

def attach_tmp_fs
@dev_disk = `losetup -f`.chomp.strip
cmds = [
"sudo losetup #{@dev_disk} #{tmp_fs_file}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work running locally on a linux workstation? I'd imagine the suite would halt if you didn't run it as root, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to answer this:

The assumption (which is with sudo on Vagrant on Travis, and my two places I tested) is that using it with those users doesn't require entering a password. So to your question, I assume if sudo does require a password on your machine, then yes, the scenario you described probably would happen.

So another good reason to can this stuff for now 😄

@jrafanie
Copy link
Member

Well, @carbonin and @djberg96 , thoughts? (Edit: Specifically, should we go forward with these specs, or should I remove all of them and just leave it as a code change only... or or... do you have a different approach you would prefer for the specs?)

While I'm super impressed you got this to work, I'm concerned by the complexity of the spec helper. Filesystem test contamination is a thing we've hit before where parallel tests are accessing the filesystem at the same time. I'm not sure the value these tests add would exceed the complexity cost of maintaining it.

I'm super late to reviewing this but was camcorder an option? If we could record the interactions against a live system without enough disk space and play it back for regression testing, we'd only have to deal with the maintenance of a system to record the cassettes against. It could be an awful idea, I don't know.

@NickLaMuro NickLaMuro force-pushed the fix_database_backup_diskspace_check branch from 02e08af to 295c04e Compare May 16, 2019 21:32
@NickLaMuro
Copy link
Member Author

So @jrafanie and I had a bit of a talk today, and decided against adding tests at this time.

We discussed possibly adding better testing using camcoder. But for right now, we just decided it makes sense to just test manually and fix the existing tests.

@NickLaMuro
Copy link
Member Author

Also, I left the old code for reference here:

https://github.com/NickLaMuro/manageiq/tree/fix_database_backup_diskspace_check_old

Currently, `db_opts[:local_file]` will always be a FIFO with the new
file_storage mechanisms, so `.validate_free_space` will only check
against the tmp dir, which usually isn't very large and not the target
destination for the DB backup/dump.

This fix moves the check to a place were we can get access to the
file_storage object directly and target the directory of the mounted
filesystem that will receive the backup/dump.

To that end, the check now only applies to file storage classes that are
"mountable", since checking against non-mountable storages don't make
sense (s3/swift don't have this capability, and they are assumed to be
"infinite" anyway, and I don't think you can query FTP for available
storage as well so I think the same assumption is also fair to apply)

* * *

Quick note:

In a couple of spots in the tests, the following is added:

    allow(file_storage).to receive(:class).and_return(MiqSmbSession)

Despite `file_storage` being defined above with:

    let(:file_storage) { double("MiqSmbSession", :disconnect => nil) }

This is now required after moving validate_free_space to inside the
with_file_storage method, and wrapping it in a class check of:

    if file_storage.class <= MiqGenericMountSession

This "stubs the stub" to have it act like a MiqSmbSession, when in fact,
it isn't.  Unfortunately, this can't be used globally since it causes a
bunch of other tests failures.
@NickLaMuro NickLaMuro force-pushed the fix_database_backup_diskspace_check branch from 295c04e to cb13d74 Compare May 17, 2019 02:55
@miq-bot
Copy link
Member

miq-bot commented May 17, 2019

Checked commits NickLaMuro/manageiq@88f99d7~...cb13d74 with ruby 2.3.3, rubocop 0.69.0, haml-lint 0.20.0, and yamllint 1.10.0
2 files checked, 0 offenses detected
Everything looks fine. 👍

@NickLaMuro
Copy link
Member Author

@jrafanie @carbonin @djberg96 Are we good with this now?

More info regarding the current state of things: #18745 (comment)

@djberg96
Copy link
Contributor

👍

@carbonin carbonin self-assigned this May 17, 2019
@carbonin carbonin merged commit 6642bf6 into ManageIQ:master May 17, 2019
@carbonin carbonin added this to the Sprint 112 Ending May 27, 2019 milestone May 17, 2019
simaishi pushed a commit that referenced this pull request Jun 10, 2019
…ce_check

[EvmDatabaseOps] Fix .validate_free_space target

(cherry picked from commit 6642bf6)

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1717025
@simaishi
Copy link
Contributor

Hammer backport details:

$ git log -1
commit c3c65b640900e0e5381c91a43ba414cb647d770a
Author: Nick Carboni <[email protected]>
Date:   Fri May 17 11:39:09 2019 -0400

    Merge pull request #18745 from NickLaMuro/fix_database_backup_diskspace_check
    
    [EvmDatabaseOps] Fix .validate_free_space target
    
    (cherry picked from commit 6642bf629f9e642e3cc8665d6ca52580f3e2c01d)
    
    Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1717025

NickLaMuro added a commit to NickLaMuro/manageiq-db_backup-tests that referenced this pull request Mar 30, 2021
This is some test cases for:

https://bugzilla.redhat.com/show_bug.cgi?id=1703278

which provides some failing tests for when `validate_free_space` checks
the wrong directory, and helps validate that the fix here:

ManageIQ/manageiq#18745

Does what it is supposed to.

Thanks a bunch to Dan Berger for the assist on this and pointing me in
the direction to use loop back devices.  Turned out to be a much better
solution then what I was considering with vagrant directly.

How it works
------------

This patch effectively does three things:

First, it adds a provision step on VM boot to add a loop back device (or
a file that acts like a file system) to represent a "fake tmp" directory
that is under 2MB in size after file system overhead.  The process for
building this can be found in the "mount_fake_tmp" provision step, but
the file is writable to just like a normal mountable file system, and
since it is so small it won't fit even our ~2MB database dumps.

Of note, something that was require was the `chmod +t`, which adds the
sticky bit on to the dir.  This is essential for the next part which...

Secondly, adds a patch to `tmpdir` which forces `Dir.tmpdir` to return
`/fake_tmp` instead of `/tmp`.  This part we override both the
`@@systmpdir` var and the `ENV['TMPDIR']` values since either one could
be used in determining the `Dir.tmpdir` depending on the value of
`$SAFE`.

Since usually we aren't using `@@systmpdir`, the `chmod +t ...` becomes
important here since it allows the `File::Stat#sticky?` check to pass,
allowing `/fake_tmp` dir to be a vaild option, and not fall back to
`/tmp` instead.

Finally, we inject the patch into some tests to make sure that those
tests still function when the "tmp" directory has a file that is smaller
then the resulting dump/backup.  These new tests fail without the patch
from above, and pass with it.


(transferred from https://gist.github.com/NickLaMuro/8438015363d90a2d2456be650a2b9bc6/bf256dd6a2f385a78cf945e4efea5084f63812e1)
NickLaMuro added a commit to NickLaMuro/manageiq-db_backup-tests that referenced this pull request Mar 30, 2021
Since a patch was required to fix an issue with hammer:

ManageIQ/manageiq#18745

Certain specs do not work with the rake tasks, and causes a decent
amount of errors.

As a result, the simplest way forward was to just simply switch to using
a master appliance for the time being.  However, the downside to this is
that "master" isn't an available option to download from vagrantup.com:

https://app.vagrantup.com/manageiq

So downloading and installing an appliance from:

http://releases.manageiq.org/

is required for this to work.  A more pragmatic solution in the future
might be to allow setting a custom box, but defaulting to a box type
that available from vagrantup.com.

Steps to download/install manageiq/master:
------------------------------------------

Since the `manageiq/master` appliance isn't available as valid release
from https://app.vagrantup.com/manageiq/ , you can use the following
script to download a versioned copy of it:

NickLaMuro/miq_tools#13

And add it to vagrant by running the following:

    $ ./miq_vagrant_master/cli --version 20190629
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants