Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort: -S/--buffer-size does not support memory percentages % #3500

Open
zacchiro opened this issue May 6, 2022 · 10 comments · May be fixed by #7181
Open

sort: -S/--buffer-size does not support memory percentages % #3500

zacchiro opened this issue May 6, 2022 · 10 comments · May be fixed by #7181
Assignees
Labels

Comments

@zacchiro
Copy link

zacchiro commented May 6, 2022

GNU sort v. Rust sort:

$ /usr/bin/sort --buffer-size=50% < /dev/null
$ target/release/coreutils sort --buffer-size=50% < /dev/null
sort: invalid --buffer-size argument '50%'

Relevant excerpts from GNU sort manpage:

       -S, --buffer-size=SIZE
              use SIZE for main memory buffer

[...]

       SIZE may be followed by the following multiplicative suffixes: % 1% of memory, b 1, K 1024 (default), and so on for M, G, T, P, E, Z, Y.

(this could be a good first issue)

@sylvestre sylvestre added the good first issue For newcomers! label May 6, 2022
@miDeb
Copy link
Contributor

miDeb commented May 6, 2022

There is an even bigger problem here, which is that sort precomputes data for each line that will be sorted (this makes comparing them faster). Due to that the exact amount of memory usage depends on the number of lines in a chunk (shorter lines -> more lines in a chunk -> more memory usage).
Right now, when you specify a buffer size, the actual memory usage will be less or more than that. I think we should first solve that problem.

@zacchiro
Copy link
Author

zacchiro commented May 7, 2022

@miDeb I wasn't aware of that issue, but it looks to me the two can be fixed independently from one another.

jack-t added a commit to jack-t/coreutils that referenced this issue May 7, 2022
Addresses issue uutils#3500, which calls for sort to permit percentage values
for the buffer-size flag. Portability is an issue here, but I don't
think this change constrains `sort`'s portability any more than its
dependencies already do.

To calculate a fraction of the system's memory, naturally we need to
retrieve the system's total memory. GNU's equivalent -- see
`physmem_total` in `lib/physmem.c` in gnulib -- has a series of `ifdef`s
to decide how to retrive the value. Each platform has its own
considerations, and the result is a pretty serious mess.

This patch omits that mess. In exchange, it will only work on platforms
with a functioning libc. The thing is, `sort` already includes libc,
which is why I said this patch doesn't introduce new constraints on
`sort`'s portability.
@sallto
Copy link

sallto commented Oct 11, 2022

I will take a look at this one (only the percentage parsing for now)

sallto added a commit to sallto/coreutils that referenced this issue Oct 17, 2022
Fixes issue uutils#3500 by adding support for percentags of total memory for the buffer-size flag. The sysinfo crate is used to determine the total memory of a system.

Usage of sysinfo instead of custom code makes this part very multi-platform compatible (See sysinfo documentation for supported OSes). As well as avoiding introducing unsafe code from libc's sysinfo.
sallto added a commit to sallto/coreutils that referenced this issue Oct 18, 2022
Fixes issue uutils#3500 by adding support for percentags of total memory for the buffer-size flag. The sysinfo crate is used to determine the total memory of a system.

Usage of sysinfo instead of custom code makes this part very multi-platform compatible (See sysinfo documentation for supported OSes). As well as avoiding introducing unsafe code from libc's sysinfo.
sallto added a commit to sallto/coreutils that referenced this issue Oct 21, 2022
Fixes issue uutils#3500 by adding support for percentags of total memory for the buffer-size flag. The sysinfo crate is used to determine the total memory of a system.

Usage of sysinfo instead of custom code makes this part very multi-platform compatible (See sysinfo documentation for supported OSes). As well as avoiding introducing unsafe code from libc's sysinfo.
sallto added a commit to sallto/coreutils that referenced this issue Oct 23, 2022
Fixes issue uutils#3500 by adding support for percentags of total memory for the buffer-size flag. The sysinfo crate is used to determine the total memory of a system.

Usage of sysinfo instead of custom code makes this part very multi-platform compatible (See sysinfo documentation for supported OSes). As well as avoiding introducing unsafe code from libc's sysinfo.
@zeroishero
Copy link

Is this open for contribution?

@tertsdiepraam
Copy link
Member

Sure, go ahead!

@zeroishero
Copy link

Is it ok to use sysinfo crate to get memory size?

@tertsdiepraam
Copy link
Member

@zeroishero I think we already use that crate in some utils, so should be fine!

@zeroishero
Copy link

Ok, then. I can probably do this. I will start it soon.

@zeroishero
Copy link

@tertsdiepraam I don't see sysinfo being used. I meant this crate:
https://crates.io/crates/sysinfo

@tertsdiepraam
Copy link
Member

tertsdiepraam commented Nov 9, 2023

Hmmm, I was sure we were using that somewhere, but maybe that was some other repo. If it's hard to get the info some other way, it might be fine to use that. You can make a PR with that first and then we'll figure out what to do. The crate is by Guillaume Gomez, so I definitely trust it 😄

@cakebaker cakebaker linked a pull request Nov 10, 2023 that will close this issue
@cakebaker cakebaker linked a pull request Dec 4, 2023 that will close this issue
@jfinkels jfinkels self-assigned this Jan 20, 2025
jfinkels added a commit to jfinkels/coreutils that referenced this issue Jan 20, 2025
Add support for parsing percent arguments to the `-S` option. The given
percentage specifies a percentage of the total physical memory. For
Linux, the total physical memory is read from `/proc/meminfo`. The
feature is not yet implemented for other systems.

In order to implement the feature, the `uucore::parser::parse_size`
function was updated to recognize strings of the form `NNN%`.

Fixes uutils#3500
@jfinkels jfinkels linked a pull request Jan 20, 2025 that will close this issue
jfinkels added a commit to jfinkels/coreutils that referenced this issue Jan 20, 2025
Add support for parsing percent arguments to the `-S` option. The given
percentage specifies a percentage of the total physical memory. For
Linux, the total physical memory is read from `/proc/meminfo`. The
feature is not yet implemented for other systems.

In order to implement the feature, the `uucore::parser::parse_size`
function was updated to recognize strings of the form `NNN%`.

Fixes uutils#3500
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment