Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pants package :: can corrupt directory target output by leaving stale/deleted contents #18849

Closed
huonw opened this issue Apr 27, 2023 · 3 comments · Fixed by #18930
Closed

pants package :: can corrupt directory target output by leaving stale/deleted contents #18849

huonw opened this issue Apr 27, 2023 · 3 comments · Fixed by #18930
Labels

Comments

@huonw
Copy link
Contributor

huonw commented Apr 27, 2023

Describe the bug

A packageable target can sometimes be a directory. For instance, pex_binary(..., layout="packed") (or layout="loose"). If someone has run pants package :: once to output this target, makes a change that removes files, and then reruns pants package ::, the removed files are left there.

In theory, writing to dist/ should exactly synchronise the contents of a target-that-outputs-directory (removing stale contents). This will need to be done at an appropriate level, e.g. for a packed pex pants package path/to:target, it'll output dist/path.to/target.pex/..., but it should only synchronise dist/path.to/target.pex/, not removing anything from dist/ or dist/path.to/.

This likely applies to other methods of writing to dist/, like export-codegen.

Reproducer:

cd $(mktemp -d)
cat > pants.toml <<EOF
[GLOBAL]
pants_version = "2.15.0"
backend_packages = ["pants.backend.python"]

[anonymous-telemetry]
enabled = false
EOF

cat > BUILD <<EOF
python_sources(name="sources")
pex_binary(name="pex", layout="packed", dependencies=[":sources"])
EOF

touch a.py b.py

# initial export:
pants package :pex
tree dist

rm b.py
pants package :pex
# BUG: b.py is still in dist/pex.pex/
tree dist

# clear out and start again to validate the expected contents (no b.py)
rm -rf dist
pants package :pex
tree dist

Output:

  1. initial export
    08:43:37.96 [INFO] Wrote dist/pex.pex
    dist
    └── pex.pex
        ├── PEX-INFO
        ├── __main__.py
        ├── __pex__
        │   └── __init__.py
        ├── a.py
        └── b.py
    
    2 directories, 5 files
    
  2. BUG: b.py is still in dist/pex.pex/
    08:43:38.33 [WARN] Unmatched glob from //b.py:sources's `source` field: "b.py"
    
    Do the file(s) exist? If so, check if the file(s) are in your `.gitignore` or the global `pants_ignore` option, which may result in Pants not being able to see the file(s) even though they exist on disk. Refer to https://www.pantsbuild.org/v2.15/docs/troubleshooting#pants-cannot-find-a-file-in-your-project.
    08:43:38.33 [INFO] Wrote dist/pex.pex
    dist
    └── pex.pex
        ├── PEX-INFO
        ├── __main__.py
        ├── __pex__
        │   └── __init__.py
        ├── a.py
        └── b.py
    
    2 directories, 5 files
    
  3. clear out and start again ...
    08:43:38.74 [INFO] Wrote dist/pex.pex
    dist
    └── pex.pex
        ├── PEX-INFO
        ├── __main__.py
        ├── __pex__
        │   └── __init__.py
        └── a.py
    
    2 directories, 4 files
    

Pants version
2.15.0

OS
macOS

Additional info

Potentially vaguely related: #17758, #18809.

@huonw huonw added the bug label Apr 27, 2023
@jsirois
Copy link
Contributor

jsirois commented Apr 27, 2023

@huonw its quibbling, but Pants has always just populated dist/ back to v0 days and never synchronized. You're really describing new semantics you desire / a new feature.

@huonw
Copy link
Contributor Author

huonw commented Apr 27, 2023

Feel free to reframe it as a feature request (although, from a user perspective, the end result of silently corrupting a target-that-outputs-a-directory is very surprising, and is very different to target-that-outputs-a-file).

@jsirois
Copy link
Contributor

jsirois commented Apr 27, 2023

Ah, I read too fast. Ok, yeah - not removing the complete dir then replacing it is odd. Pex used alone does that; so Pants is doing something weird here.

@huonw huonw changed the title pants package :: can leave stale/deleted contents in directory targets pants package :: can corrupt directory target output by leaving stale/deleted contents Apr 28, 2023
huonw added a commit that referenced this issue May 1, 2023
This applies a workaround that fixes #18809, for 2.16: before this PR,
repeated commands that write the exact same contents to `dist/` will
fail, if those contents include a symlink. After this patch, they will
succeed. For instance, `pants export-codegen ::` twice if any codegen
creates a symlink.

The particular problem of failing when re-materialising an entry only
surfaces with symlinks, because directories are created in "exists okay"
mode, and files are truncated if they already exist.

However, directories and files _do_ have problems when being
materialised over an entry of a different kind (#17758), but fixing that
seems like a broader issue, and likely too large to target 2.16 at this
point. After the change in this PR, we're at least back to the behaviour
in 2.15:

- directly rerunning commands that write to the workspace will always
succeed
- rerunning after changes may or may not (and, if it does, may or may
not give a valid result: #18849)

I've started on a potential fix for #17758 and #18849 in #18871, but, as
mentioned, it felt like it was getting too large and too "feature"-y to
land for 2.16. If/when a fix along those lines lands, this workaround
can likely be reverted.
huonw added a commit to huonw/pants that referenced this issue May 1, 2023
…8873)

This applies a workaround that fixes pantsbuild#18809, for 2.16: before this PR,
repeated commands that write the exact same contents to `dist/` will
fail, if those contents include a symlink. After this patch, they will
succeed. For instance, `pants export-codegen ::` twice if any codegen
creates a symlink.

The particular problem of failing when re-materialising an entry only
surfaces with symlinks, because directories are created in "exists okay"
mode, and files are truncated if they already exist.

However, directories and files _do_ have problems when being
materialised over an entry of a different kind (pantsbuild#17758), but fixing that
seems like a broader issue, and likely too large to target 2.16 at this
point. After the change in this PR, we're at least back to the behaviour
in 2.15:

- directly rerunning commands that write to the workspace will always
succeed
- rerunning after changes may or may not (and, if it does, may or may
not give a valid result: pantsbuild#18849)

I've started on a potential fix for pantsbuild#17758 and pantsbuild#18849 in pantsbuild#18871, but, as
mentioned, it felt like it was getting too large and too "feature"-y to
land for 2.16. If/when a fix along those lines lands, this workaround
can likely be reverted.
huonw added a commit that referenced this issue May 2, 2023
#18873) (#18878)

This applies a workaround that fixes #18809, for 2.16: before this PR,
repeated commands that write the exact same contents to `dist/` will
fail, if those contents include a symlink. After this patch, they will
succeed. For instance, `pants export-codegen ::` twice if any codegen
creates a symlink.

The particular problem of failing when re-materialising an entry only
surfaces with symlinks, because directories are created in "exists okay"
mode, and files are truncated if they already exist.

However, directories and files _do_ have problems when being
materialised over an entry of a different kind (#17758), but fixing that
seems like a broader issue, and likely too large to target 2.16 at this
point. After the change in this PR, we're at least back to the behaviour
in 2.15:

- directly rerunning commands that write to the workspace will always
succeed
- rerunning after changes may or may not (and, if it does, may or may
not give a valid result: #18849)

I've started on a potential fix for #17758 and #18849 in #18871, but, as
mentioned, it felt like it was getting too large and too "feature"-y to
land for 2.16. If/when a fix along those lines lands, this workaround
can likely be reverted.
@huonw huonw closed this as completed in a24a59e May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants