Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paths[with spaces and square brackets] cannot be listed in %files #1749

Closed
hroncok opened this issue Aug 6, 2021 · 12 comments · Fixed by #2103, #2151 or #2206
Closed

paths[with spaces and square brackets] cannot be listed in %files #1749

hroncok opened this issue Aug 6, 2021 · 12 comments · Fixed by #2103, #2151 or #2206
Assignees
Labels
Milestone

Comments

@hroncok
Copy link
Contributor

hroncok commented Aug 6, 2021

Consider this spec file:

Name:           reproducer
Version:        0
Release:        0
Summary:        ...
License:        MIT
BuildArch:      noarch

%description
...

%prep

%install
touch '%{buildroot}/file[with spaces]'

%files
"/file[with spaces]"

RPM 4.16 and 4.17 will fail to build it with:

Processing files: reproducer-0-0.noarch
error: File not found: .../rpmbuild/BUILDROOT/reproducer-0-0.x86_64/file[with
error: Path is outside buildroot: spaces]

It seems that even thou the path is in " quotes, the spaces split it into multiple paths.

@pmatilai pmatilai added the bug label Aug 17, 2021
@pmatilai
Copy link
Member

Looks like a bug someplace.

Wildcards usually get you around obstacles where other escaping methods fail:
/file?with?spaces? works, except of course if you have multiple similarly named files, some using [ ] and others { } etc...

@dmnks dmnks self-assigned this Aug 17, 2021
@dmnks
Copy link
Contributor

dmnks commented Aug 17, 2021

Seems to be caused by __glob_pattern_p() where a filename containing square brackets is interpreted as a glob (which then messes up our glob parsing logic later in the process, resulting in the filename being split into two at the space char). Will take a closer look.

@hroncok
Copy link
Contributor Author

hroncok commented Aug 17, 2021

The semantics of escaping stuff in %files is not very clear. See also this thread: http://lists.rpm.org/pipermail/rpm-list/2021-June/002048.html

@dmnks
Copy link
Contributor

dmnks commented Aug 17, 2021

Yeah, documentation also seems to be lacking here. Thanks for the report!

@hroncok
Copy link
Contributor Author

hroncok commented Dec 7, 2021

Seems to be caused by __glob_pattern_p() where a filename containing square brackets is interpreted as a glob (which then messes up our glob parsing logic later in the process, resulting in the filename being split into two at the space char). Will take a closer look.

Including { and } behaves similarly.
It would be really useful if we could escape the glob characters somehow.

@gotmax23
Copy link
Contributor

Seems to be caused by __glob_pattern_p() where a filename containing square brackets is interpreted as a glob (which then messes up our glob parsing logic later in the process, resulting in the filename being split into two at the space char). Will take a closer look.

Including { and } behaves similarly. It would be really useful if we could escape the glob characters somehow.

Here1 is the full RHBZ ticket for the issue that Miro's referring to.

@dmnks
Copy link
Contributor

dmnks commented Jan 24, 2022

As far as semantics, what we should aim for, I think, is that it's similar to a typical shell (like Bash): the entries in a %files section are pathnames to be matched against the buildroot tree and are subject to expansion using the conventional glob characters. Similarly to a shell, it should be possible to escape these characters.

As far as the current implementation, the glob chars are supported, but the backslash escape char only works partially; it avoids the expansion of the pathname in processBinaryFile(), but it is then passed to the low-level system functions like lstat(2) (via addFile()) unchanged, i.e. with the escape chars still in place, which obviously fails to match the desired filename. In addition, the backslash doesn't work with spaces; there's a check that runs before pathname expansion that checks whether the given pathname is absolute; this check doesn't understand escaping, though, so it errors out, as seen in the original comment above.

As an alternative to the backslash, pathnames can be enclosed in quotes. These do work for space chars, however, for a change, they do not work for glob chars.

As an example, for the following pathnames in %files, rpmbuild behaves as expected, similarly to a shell:

  • /foo[bar] is expanded and if it doesn't yield any matches, it's matched verbatim
  • /foo bar is interpreted as two paths
  • "/foo bar" is matched verbatim

However, the following works unlike a shell and thus is arguably incorrect (or at least unconventional):

  • "/foo[bar]" is expanded
  • /foo\ bar is interpreted as two paths
  • /foo\[bar\] is matched verbatim (i.e. with literal backslashes), whereas the expected semantics here would be "this is not a glob, treat it as a literal /foo[bar]")

Therefore, a combination of glob and space chars can't be properly escaped; neither backslashes nor quotes will do the trick here:

  • "/foo[bar baz]" is expanded
  • /foo\[bar\ baz\] is interpreted as two paths
  • "/foo\[bar baz\]" is matched verbatim (i.e. with literal backslashes)

In other words, the root cause is threefold (same order as above):

  • Quotes don't escape globs (just spaces)
  • Backslashes don't escape spaces (just globs)
  • Backslashes are not stripped after determining whether the pathname is a glob

Either the first or the last one, if addressed, would resolve this issue. The second one is orthogonal to this particular problem, but would also be nice to have eventually.

Next, I'm going to look closer at the code and the logic around this to see if implementing some of the suggested fixes would be safe and not cause some breakage elsewhere.

@hroncok
Copy link
Contributor Author

hroncok commented Jan 24, 2022

That is a very good analysis of the problem, @dmnks. Thank you!

@ffesti
Copy link
Contributor

ffesti commented Jan 27, 2022

Is this issue related to #1294 ?

@dmnks
Copy link
Contributor

dmnks commented Jan 27, 2022

Possibly, even though there it seems that quoting doesn't work with %doc entries (whereas it does for plain entries). I'll have a look at it later.

dmnks added a commit to dmnks/rpm that referenced this issue Apr 19, 2022
We already interpret the backslash as the escape char in rpmIsGlob() so
just unescape the file name that fails the check before passing it to
lstat(2) via addFile().

Literal backslashes (i.e. escaped themselves) are supported.

Fixes: rpm-software-management#1749
dmnks added a commit to dmnks/rpm that referenced this issue Apr 20, 2022
This is already partially implemented using rpmIsGlob() which interprets
the backslash as an escape character, we just need to make sure any such
characters are discarded as soon as they've been consumed, i.e. before
the file name is passed to lstat(2) via addFile().

Support for literal (i.e. escaped) backslash is included.

Note that, while rpmGlob() understands escapes too, they're only
applicable to spaces and quotes during string tokenization, not to
globs, and changing its behavior would break the API, so we need to
handle glob escapes separately.  For this reason, we also need to have
an rpmIsGlob() check when processing special directories like %doc, so
add that.

Fixes: rpm-software-management#1749
dmnks added a commit to dmnks/rpm that referenced this issue Apr 20, 2022
This was already partially implemented using rpmIsGlob() which
interprets the backslash as an escape character, we just need to make
sure any such characters are discarded as soon as they've been consumed,
i.e. before the file name is passed to lstat(2) via addFile().

Support for literal (i.e. escaped) backslash is included.

Note that, while rpmGlob() understands escapes too, they're only
applicable to spaces and quotes during string tokenization, not to
globs, and changing its behavior would break the API, so we need to
handle glob escapes separately.  For this reason, we also need to have
an rpmIsGlob() check when processing special directories like %doc, so
add that.

Fixes: rpm-software-management#1749
dmnks added a commit to dmnks/rpm that referenced this issue Apr 20, 2022
This was already partially implemented using rpmIsGlob() which
interprets the backslash as an escape character, we just need to make
sure any such characters are discarded as soon as they've been consumed,
i.e. before the file name is passed to lstat(2) via addFile().

Support for literal (i.e. escaped) backslash is included.

Note that, while rpmGlob() understands escapes too, they're only
applicable to spaces and quotes during string tokenization, not to
globs, and changing its behavior would break the API, so we need to
handle glob escapes separately.  For this reason, we also need to use an
rpmIsGlob() check when processing special directories like %doc, so add
that.

Fixes: rpm-software-management#1749
dmnks added a commit to dmnks/rpm that referenced this issue Apr 21, 2022
This was already partially implemented using rpmIsGlob() which
interprets the backslash as an escape character, we just need to make
sure any such characters are discarded as soon as they've been consumed,
i.e. before the file name is passed to lstat(2) via addFile().

Support for literal (i.e. escaped) backslash is included.

Note that, while rpmGlob() understands escapes too, they're only
applicable to spaces and quotes during string tokenization, not to
globs, and changing its behavior would break the API, so we need to
handle glob escapes separately.  For this reason, we also need to use
rpmIsGlob() when processing special directories like %doc, so do that.

Fixes: rpm-software-management#1749
dmnks added a commit to dmnks/rpm that referenced this issue Apr 22, 2022
Turn off the special meaning of glob characters within a file name if
they're preceded by a backslash.  This was already partially implemented
using rpmIsGlob() which interprets the backslash as an escape character,
we just need to make sure any backslashes are discarded as soon as
they've been consumed, i.e. before the file name is passed to lstat(2)
via addFile(), so that it actually works.

Note that, while rpmGlob() understands escapes too, they're only
applicable to spaces and quotes during string tokenization, not to
globs, and changing its behavior would break the API, so we need to
handle glob escapes separately.  For this reason, we also need to use
rpmIsGlob() when processing special directories like %doc, so do that.

For curly braces specifically, we need to push the brace detection logic
from rpmIsGlob() down to __glob_pattern_p() so that backslashes are also
applicable to those.  That's probably where it belongs, anyway.  Since
we're now passing it the flags directly, we need to make sure the
behavior stays the same in the other few places where __glob_pattern_p()
is called, by turning off the GLOB_BRACE flag there.  This part is
basically a revert of commit 630a097,
although functionally equivalent.

Fixes: rpm-software-management#1749
dmnks added a commit to dmnks/rpm that referenced this issue Apr 22, 2022
Turn off the special meaning of glob characters within a file name
enclosed in single or double quotes.  Do it by automatically escaping
such characters at parse time, as that internally reduces the scenario
to the one where the user manually escapes the file name in the spec
file (i.e. fewer code paths to worry about), as well as allowing for
implementing partial quoting (e.g. /foo"*") in the future if we so
choose.  Currently, only fully quoted file names are supported.  Quotes
within quotes of the same type can be escaped.

This is a change in quote semantics that should bring it closer to user
expectations as well as providing a more convenient way of escaping a
lot of characters at once.  The negative impact on existing spec files
should be minimal, if any, since the only reason to quote a file name
right now would be to escape spaces, and that continues to work as
expected.

One use case that could break is a file that's quoted but contains globs
which would previously be expanded (e.g. "/foo[abc]").  Depending on the
actual build root contents, this may cause an unpredictable and/or
silent change in the resulting file list at build time, compared to the
previous builds of the same package.

Note that we must keep the behavior of rpmEscapeSpaces() exactly as it
is and since isspace(3) is locale-dependent, we need to allow for it to
still be used through rpmEscapeChars() while also allowing the caller to
specify a static string of characters to escape, hence accepting two
argument types to choose from.

Fixes: rpm-software-management#1749
dmnks added a commit to dmnks/rpm that referenced this issue Apr 25, 2022
Turn off the special meaning of metacharacters within a file name
enclosed in single or double quotes.  Do it by automatically escaping
such characters at parse time, as that internally reduces the scenario
to the one where the user manually escapes the file name in the spec
file (i.e. fewer code paths to worry about), as well as allowing for
implementing partial quoting (e.g. /foo"*") in the future if we so
choose.  Currently, only fully quoted file names are supported.  Quotes
within quotes of the same type can be escaped.

This is a change in quote semantics that should bring it closer to user
expectations as well as providing a more convenient way of escaping a
lot of characters at once.  The negative impact on existing spec files
should be minimal, if any, since the only reason to quote a file name
right now would be to escape spaces, and that continues to work as
expected.

One use case that could break is a file that's quoted but contains
metacharacters which would previously be expanded (e.g. "/foo[abc]").
Depending on the actual build root contents, this may cause an
unpredictable and/or silent change in the resulting file list at build
time, compared to the previous builds of the same package.

Note that we must keep the behavior of rpmEscapeSpaces() exactly as it
is and since isspace(3) is locale-dependent, we need to allow for it to
still be used through rpmEscapeChars() while also allowing the caller to
specify a static string of characters to escape, hence accepting two
argument types to choose from.

Fixes: rpm-software-management#1749
dmnks added a commit to dmnks/rpm that referenced this issue Apr 25, 2022
Turn off the special meaning of metacharacters within a file name
enclosed in single or double quotes.  Do it by automatically escaping
such characters at parse time, as that internally reduces the scenario
to the one where the user manually escapes the file name in the spec
file (i.e. fewer code paths to worry about), as well as allowing for
implementing partial quoting (e.g. /foo"*") in the future if we so
choose.  Currently, only fully quoted file names are supported.  Quotes
within quotes of the same type can be escaped.

This is a change in quote semantics that should bring it closer to user
expectations as well as providing a more convenient way of escaping a
lot of characters at once.  The negative impact on existing spec files
should be minimal, if any, since the only reason to quote a file name
right now would be to escape spaces, and that continues to work as
expected.

One use case that could break is a file that's quoted but contains
metacharacters which would previously be expanded (e.g. "/foo[abc]").
Depending on the actual build root contents, this may cause an
unpredictable and/or silent change in the resulting file list at build
time, compared to the previous builds of the same package.

Note that we must keep the behavior of rpmEscapeSpaces() exactly as it
is and since isspace(3) is locale-dependent, we need to allow for it to
still be used through rpmEscapeChars() while also allowing the caller to
specify a static string of characters to escape, hence accepting two
argument types to choose from.

Fixes: rpm-software-management#1749
dmnks added a commit to dmnks/rpm that referenced this issue Apr 25, 2022
Turn off the special meaning of metacharacters within a file name
enclosed in single or double quotes.  Do it by automatically escaping
such characters at parse time, as that internally reduces the scenario
to the one where the user manually escapes the file name in the spec
file (i.e. fewer code paths to worry about), as well as allowing for
implementing partial quoting (e.g. /foo"*") in the future if we so
choose.  Currently, only fully quoted file names are supported.  Quotes
within quotes of the same type can be escaped.

This is a change in quote semantics that should bring it closer to user
expectations as well as providing a more convenient way of escaping a
lot of characters at once.  The negative impact on existing spec files
should be minimal, if any, since the only reason to quote a file name
right now would be to escape spaces, and that continues to work as
expected.

One use case that could break is a file that's quoted but contains
metacharacters which would previously be expanded (e.g. "/foo[abc]").
Depending on the actual build root contents, this may cause an
unpredictable and/or silent change in the resulting file list at build
time, compared to the previous builds of the same package.

Note that we must keep the behavior of rpmEscapeSpaces() exactly as it
is and since isspace(3) is locale-dependent, we need to allow for it to
still be used through rpmEscapeChars() while also allowing the caller to
specify a static string of characters to escape, hence accepting two
argument types to choose from.

Fixes: rpm-software-management#1749
dmnks added a commit to dmnks/rpm that referenced this issue Apr 25, 2022
Turn off the special meaning of metacharacters within a file name
enclosed in single or double quotes.  Do it by automatically escaping
such characters at parse time, as that internally reduces the scenario
to the one where the user manually escapes the file name in the spec
file (i.e. fewer code paths to worry about), as well as allowing for
implementing partial quoting (e.g. /foo"*") in the future if we so
choose.  Currently, only fully quoted file names are supported.  Quotes
within quotes of the same type can be escaped.

This is a change in quote semantics that should bring it closer to user
expectations as well as providing a more convenient way of escaping a
lot of characters at once.  The negative impact on existing spec files
should be minimal, if any, since the only reason to quote a file name
right now would be to escape spaces, and that continues to work as
expected.

One use case that could break is a file that's quoted but contains
metacharacters which would previously be expanded (e.g. "/foo[abc]").
Depending on the actual build root contents, this may cause an
unpredictable and/or silent change in the resulting file list at build
time, compared to the previous builds of the same package.

Note that we must keep the behavior of rpmEscapeSpaces() exactly as it
is and since isspace(3) is locale-dependent, we need to allow for it to
still be used through rpmEscapeChars() while also allowing the caller to
specify a static string of characters to escape, hence accepting two
argument types to choose from.

Fixes: rpm-software-management#1749
dmnks added a commit to dmnks/rpm that referenced this issue Apr 25, 2022
Turn off the special meaning of metacharacters within a file name
enclosed in single or double quotes.  Do it by automatically escaping
such characters at parse time, as that internally reduces the scenario
to the one where the user manually escapes the file name in the spec
file (i.e. fewer code paths to worry about), as well as allowing for
implementing partial quoting (e.g. /foo"*") in the future if we so
choose.  Currently, only fully quoted file names are supported.  Quotes
within quotes of the same type can be escaped.

This is a change in quote semantics that should bring it closer to user
expectations as well as providing a more convenient way of escaping a
lot of characters at once.  The negative impact on existing spec files
should be minimal, if any, since the only reason to quote a file name
right now would be to escape spaces, and that continues to work as
expected.

One use case that could break is a file that's quoted but contains
metacharacters which would previously be expanded (e.g. "/foo[abc]").
Depending on the actual build root contents, this may cause an
unpredictable and/or silent change in the resulting file list at build
time, compared to the previous builds of the same package.

Note that we must keep the behavior of rpmEscapeSpaces() exactly as it
is and since isspace(3) is locale-dependent, we need to allow for it to
still be used through rpmEscapeChars() while also allowing the caller to
specify a static string of characters to escape, hence accepting two
argument types to choose from.

Fixes: rpm-software-management#1749
dmnks added a commit to dmnks/rpm that referenced this issue Apr 26, 2022
Turn off the special meaning of metacharacters within a file name
enclosed in single or double quotes.  Do it by automatically escaping
such characters at parse time, as that internally reduces the scenario
to the one where the user manually escapes the file name in the spec
file (i.e. fewer code paths to worry about), as well as allowing for
implementing partial quoting (e.g. /foo"*") in the future if we so
choose.  Currently, only fully quoted file names are supported.  Quotes
within quotes of the same type can be escaped.

This is an extension of the existing (though poorly understood)
quotation semantics which historically has only worked for space
escaping, so this change should mostly be backward-compatible with
existing spec files, with the exception of misuse such as a combination
of spaces and globs in a quoted name, which never worked correctly, due
to the fact that rpmGlob() splits such names and tries to glob them
separately (see commit 13799b8 for some
more detail).

Note that we must keep the behavior of rpmEscapeSpaces() exactly as it
is and since isspace(3) is locale-dependent, we need to allow for it to
still be used through rpmEscapeChars() while also allowing the caller to
specify a static string of characters to escape, hence accepting two
argument types to choose from.

Fixes: rpm-software-management#1749
dmnks added a commit to dmnks/rpm that referenced this issue Apr 27, 2022
Turn off the special meaning of metacharacters within a file name
enclosed in single or double quotes.  Do it by automatically escaping
such characters at parse time, as that internally reduces the scenario
to the one where the user manually escapes a file name in the spec file
(i.e. fewer code paths to worry about), as well as allowing for
implementing partial quoting (e.g. /foo"*") in the future if we so
choose.  Currently, only fully quoted file names are supported.  Quotes
within quotes of the same type can be escaped with a backslash.

This is an extension of the existing (and possibly misleading) quotation
semantics only meant for space escaping and should therefore be backward
compatible with existing spec files, with the exception of quoted names
with globs in them, but those shouldn't really exist in the wild since
combining spaces and globs never worked correctly (rpmGlob() does its
own splitting by space and then tries to glob each part separately).

Note that we must keep the behavior of rpmEscapeSpaces() exactly as it
is and since isspace(3) is locale-dependent, we need to allow for it to
still be used through rpmEscapeChars() while also allowing the caller to
specify a set of bytes to escape, hence accepting two argument types to
choose from.

Fixes: rpm-software-management#1749
dmnks added a commit to dmnks/rpm that referenced this issue Apr 28, 2022
Turn off the special meaning of metacharacters within a file name
enclosed in single or double quotes.  Do it by automatically escaping
such characters at parse time, as that internally reduces the scenario
to the one where the user manually escapes a file name in the spec file
(i.e. fewer code paths to worry about), as well as allowing for
implementing partial quoting (e.g. /foo"*") in the future if we so
choose.  Currently, only fully quoted file names are supported.

This is an extension of the existing (and possibly misleading) quotation
semantics only meant for space escaping and should therefore be backward
compatible with existing spec files, with the exception of quoted names
with globs in them, but those shouldn't really exist in the wild since
combining spaces and globs never worked correctly (rpmGlob() does its
own splitting by space and then tries to glob each part separately).

Note that we must keep the behavior of rpmEscapeSpaces() exactly as it
is and since isspace(3) is locale-dependent, we need to allow for it to
still be used through rpmEscapeChars() while also allowing the caller to
specify a set of bytes to escape, so accept two argument types to choose
from.

Fixes: rpm-software-management#1749
dmnks added a commit to dmnks/rpm that referenced this issue May 23, 2022
Turn off the special meaning of metacharacters within a file name
enclosed in single or double quotes.  Do it by automatically escaping
such characters at parse time, as that internally reduces the scenario
to the one where the user manually escapes a file name in the spec file
(i.e. fewer code paths to worry about), as well as allowing for
implementing partial quoting (e.g. /foo"*") in the future if we so
choose.  Currently, only fully quoted file names are supported.

This is an extension of the existing (and possibly misleading) quotation
semantics only meant for space escaping and should therefore be backward
compatible with existing spec files, with the exception of quoted names
with globs in them, but those shouldn't really exist in the wild since
combining spaces and globs never worked correctly (rpmGlob() does its
own splitting by space and then tries to glob each part separately).

Note that we must keep the behavior of rpmEscapeSpaces() exactly as it
is and since isspace(3) is locale-dependent, we need to allow for it to
still be used through rpmEscapeChars() while also allowing the caller to
specify a set of bytes to escape, so accept two argument types to choose
from.

Fixes: rpm-software-management#1749
pmatilai pushed a commit that referenced this issue Jun 27, 2022
Make sure any quoted metacharacters in a pattern are unescaped before
it's passed to addFile() and subsequently lstat(2).  Do this by always
letting such patterns through rpmGlob() which now handles backslashes
properly since commit 4030062.  Don't
expand globs twice in processSpecialDir(), though.

Axe the public rpmIsGlob() which is no longer needed, we'll be bumping
the soname in 4.19.

Construct absolute file names directly without going through
rpmGenPath() so that literal (double) percent signs are not expanded
twice and can be used the same way as in other parts of a spec file.
This means URLs are not handled anymore but we're dealing with local
files here so that should be fine.

Add the "cp" binary to the test environment, it's needed for the %doc
directives which copy stuff around.

Fixes: #1749
Repository owner moved this from In Progress to Done in RPM Jun 27, 2022
@dmnks dmnks moved this from Done to In Progress in RPM Jun 27, 2022
@mcurlej mcurlej moved this from In Progress to Done in RPM Jun 28, 2022
@mcurlej mcurlej moved this from Done to In Progress in RPM Jun 28, 2022
@dmnks
Copy link
Contributor

dmnks commented Jun 29, 2022

I'm reopening this issue as there's still outstanding work to be done in terms of making the quotes behave as expected (i.e. escaping the enclosed content), which I have almost ready. I'll open another PR soon.

@dmnks dmnks reopened this Jun 29, 2022
Repository owner moved this from In Progress to Todo in RPM Jun 29, 2022
@mcurlej mcurlej moved this from Todo to In Progress in RPM Jul 12, 2022
@dmnks dmnks linked a pull request Aug 2, 2022 that will close this issue
@dmnks dmnks linked a pull request Aug 16, 2022 that will close this issue
Repository owner moved this from In Progress to Done in RPM Aug 17, 2022
@dmnks dmnks reopened this Aug 17, 2022
Repository owner moved this from Done to Todo in RPM Aug 17, 2022
@dmnks dmnks moved this from Todo to In Progress in RPM Aug 17, 2022
@dmnks dmnks linked a pull request Aug 17, 2022 that will close this issue
@dmnks dmnks removed a link to a pull request Aug 17, 2022
@pmatilai pmatilai modified the milestones: 4.18.0, 4.19.0 Sep 21, 2022
@dmnks dmnks linked a pull request Sep 27, 2022 that will close this issue
Repository owner moved this from In Progress to Done in RPM Oct 4, 2022
@kloczek
Copy link
Contributor

kloczek commented Sep 8, 2023

Quite possible that this PR broke shell globs in %files
#2649

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment