Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for sufficient disk space before upgrade #1381

Merged
merged 1 commit into from
Aug 21, 2017

Conversation

stevendanna
Copy link
Contributor

@stevendanna stevendanna commented Aug 18, 2017

The pg_upgrade tool copies all of the database files as part of the
upgrade. While this could be avoided with the --link flag, the
--link flag would also prevent falling back to the old pg cluster in
case of a problem.

To avoid upgrade failures, we check that the disk for the new data
directory has enough free space for another copy of the database plus
some headroom to ensure we have some room to operate after the
upgrade.

Available disk space is currently queried via an FFI call into libc's
statvfs function. The downside of the FFI approach is making sure our
representation of the statvfs struct is correct on all of our
supported platforms. A more straightforward approach might be to parse
df output; however, the first and the last columns of the df
output are paths which could theoretically contain spaces making it a
bit of a pain to parse. However, if all of our platforms support the
--output flag, this problem can be easily avoided.

Signed-off-by: Steven Danna [email protected]

@stevendanna stevendanna requested a review from a team August 18, 2017 11:04
@stevendanna stevendanna force-pushed the ssd/pg-upgrade-disk-check branch from f0d86d4 to 95f9c43 Compare August 18, 2017 11:05
@stevendanna
Copy link
Contributor Author

A number of our supported platforms don't have the --output flag in the version of df they ship:

sdanna@thrace ~/oc/environments/opscode-ci > knife ssh 'tags:chef-server AND tags:tester' -a ipaddress -x steven 'df --output=avail /tmp | tail -1'
148.100.110.197 df: unrecognized option '--output=avail'
148.100.110.197 Try `df --help' for more information.
172.31.10.57    22051380
148.100.110.187 19400044
148.100.110.192 19308500
172.31.10.53    22838020
10.194.10.101   24936076
10.194.12.119   df: unrecognized option '--output=avail'
10.194.12.119   Try `df --help' for more information.
172.31.10.55    23012484
148.100.110.3   df: unrecognized option '--output=avail'
148.100.110.3   Try `df --help' for more information.
10.194.12.212   df: unrecognized option `--output=avail'
10.194.12.212   Try `df --help' for more information.
10.194.14.186   24734896
10.194.12.178   25403788
10.194.11.218   df: unrecognized option '--output=avail'
10.194.11.218   Try `df --help' for more information.
10.194.12.75    27019252
10.194.10.212   df: unrecognized option '--output=avail'
10.194.10.212   Try `df --help' for more information.

If we wanted to parse df output instead of using the FFI, it is still possible, there are just a few edge cases related to spaces we'd have to decide what to do about.

@stevendanna
Copy link
Contributor Author

All of our platforms support the -s, -k, and (if we want to use it) the --apparent-size argument to du:

sdanna@thrace ~/oc/environments/opscode-ci > knife ssh 'tags:chef-server AND tags:tester' -a ipaddress -x steven 'du -sk /home/steven'
148.100.110.187 728      /home/steven
148.100.110.192 84       /home/steven
148.100.110.197 84       /home/steven
172.31.10.57    32       /home/steven
148.100.110.3   32       /home/steven
10.194.10.101   36       /home/steven
10.194.12.119   36       /home/steven
10.194.12.212   56       /home/steven
10.194.12.178   28       /home/steven
172.31.10.55    20       /home/steven
172.31.10.53    28       /home/steven
10.194.14.186   56       /home/steven
10.194.11.218   56       /home/steven
10.194.10.212   24       /home/steven
10.194.12.75    28       /home/steven
sdanna@thrace ~/oc/environments/opscode-ci > knife ssh 'tags:chef-server AND tags:tester' -a ipaddress -x steven 'du --apparent-size -sk /home/steven'
148.100.110.3   14       /home/steven
148.100.110.187 697      /home/steven
148.100.110.197 54       /home/steven
148.100.110.192 51       /home/steven
172.31.10.57    18       /home/steven
10.194.10.101   20       /home/steven
10.194.12.119   18       /home/steven
10.194.12.178   18       /home/steven
10.194.12.212   10       /home/steven
172.31.10.53    7        /home/steven
172.31.10.55    2        /home/steven
10.194.14.186   42       /home/steven
10.194.11.218   33       /home/steven
10.194.10.212   10       /home/steven
10.194.12.75    6        /home/steven

@stevendanna stevendanna force-pushed the ssd/pg-upgrade-disk-check branch from 95f9c43 to 94aa3eb Compare August 18, 2017 12:38
Copy link
Contributor

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've confused myself a bit with those numbers 😄

Looks good to me

# TODO(ssd) 2017-08-18: Do we need to worry about sparse files
# here? If so, can we expect the --apparent-size flag to exist on
# all of our platforms.
command = Mixlib::ShellOut.new("du -sk #{path}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about du -P?

this sounds compelling:

-P
Use a standard, portable, output format

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, you've already checked that. (Just now read the message)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-P is a flag to df rather than du. Unfortunately, as far as I can see, the -P flag doesn't solve the "mount point and file system with space with spaces"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sorry for the noise, ignore me

raise "du failed"
end
rescue Errno::ENOENT
raise "The du utility is not available. Unable to check disk usage"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this block upgrading? Can I override the disk usage check failure if I believe I know what I'm doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it does. We could use an environment variable to skip the check perhaps?

# @param path [String] Path to a directory on disk
# @return [Integer] KB used by directory on disk
#
def self.du(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be hard to statfs for this, too? I've got the impression that this would let us stay clear of some potential compatibility issues...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind 😄

return dir if ::File.exists?(dir)
return dir if ::File.expand_path(dir) == "/"

dir_or_existing_parent(::File.expand_path("#{dir}/.."))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe that's clever

else
Chef::Log.error("Insufficient free space on disk to complete upgrade.")
Chef::Log.error("The current postgresql data directory contains #{old_data_dir_size} KB of data but only #{free_disk_space} KB is available on disk.")
Chef::Log.error("The upgrade process requires at least #{free_disk_space/0.90} KB.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused about the division here.... so, we want free disk space to be... uhm...

old_data_dir_size < (free_disk_space * 0.9) # * 1/0.9
old_data_dir_size * 1/0.9 < free_disk_space

so we want free_disk_space to be 1.1111 * old_data_dir_size, don't we?

old_data_dir_size * 1/0.9 < free_disk_space # * 1/0.9
old_data_dir_size * 1/0.9 * 1/0.9 < free_disk_space / 0.90

so, free_disk_space/0.90 = 1.2345679 * old_disk_space?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've just goofed and this should have been old_data_dir_size/0.9. That is, I used the wrong variable.

Copy link

@ksubrama ksubrama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked the FFI interface. Looks great to me.

end

def dir_or_existing_parent(dir)
return dir if ::File.exists?(dir)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: exist?

class Statfs
#
# Statfs provides a simple interface to the statvfs system call.
# Since the statvfs struct varies a bit across platforms, this likly

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sp: likely

The `pg_upgrade` tool copies all of the database files as part of the
upgrade.  While this could be avoided with the `--link` flag, the
`--link` flag would also prevent falling back to the old pg cluster in
case of a problem.

To avoid upgrade failures, we check that the disk for the new data
directory has enough free space for another copy of the database plus
some headroom to ensure we have some room to operate after the
upgrade.

Available disk space is currently queried via an FFI call into libc's
statvfs function. The downside of the FFI approach is making sure our
representation of the statvfs struct is correct on all of our
supported platforms. A more straightforward approach might be to parse
`df` output; however, we've gone with the FFI call for now to avoid
some edge cases in parsing `df` output on versions of `df` without an
--output flag.

Signed-off-by: Steven Danna <[email protected]>
@stevendanna stevendanna force-pushed the ssd/pg-upgrade-disk-check branch from c92c1ac to 9f03aff Compare August 18, 2017 18:00
@stevendanna stevendanna changed the title WIP: Check for sufficient disk space before upgrade Check for sufficient disk space before upgrade Aug 18, 2017
Copy link
Contributor

@ryancragun ryancragun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌮

@stevendanna stevendanna merged commit 2c99ee2 into master Aug 21, 2017
@stevendanna stevendanna deleted the ssd/pg-upgrade-disk-check branch November 27, 2017 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants