Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle runs of delimiters in filenames #14172

Merged
merged 4 commits into from
Jul 21, 2023
Merged

Conversation

di
Copy link
Member

@di di commented Jul 20, 2023

Fixes the issue raised here: #14155 (comment)

Fixes #13991.

@di di requested a review from a team as a code owner July 20, 2023 20:55
@di
Copy link
Member Author

di commented Jul 20, 2023

(This also removes one of our usages of pkg_resources.safe_name, towards #13991)

@takluyver
Copy link
Contributor

I think this probably works, but I don't know precisely what rules packages generating wheels (principally setuptools) have been using for the filename, e.g. if they might have .. or ._ in the filename.

The name normalization spec has a one line function to fully normalize a name from any valid starting point. Is there any reason to do it differently for the filename prefix?

def normalize(name):
    return re.sub(r"[-_.]+", "-", name).lower()

@di
Copy link
Member Author

di commented Jul 21, 2023

We don't actually want to normalize the filename prefix, as it should already be normalized. But to handle backwards compatibility with tools that might not be replacing - and . with _, or lowercasing the prefix, we have to do a minimal normalization here, which we'll eventually want to remove via #14156:

filename_prefix = filename_prefix.lower().replace(".", "_").replace("-", "_")

If we remove consecutive runs when normalizing the prefix, this would mean any filenames that have any amount of consecutive runs of delimiters in the filename would become valid, because the comparison between the prefix and the project name would not take repeating delimiters into account. Since these filenames are currently invalid, we haven't had an upload with a ._, _., or .. in the filename for quite some time:

warehouse=> select upload_time, filename from release_files where filename ilike '%\.\_%.whl' order by upload_time desc limit 1;
        upload_time         |                   filename
----------------------------+-----------------------------------------------
 2021-11-03 20:54:45.340789 | hydrotools._restclient-3.0.5-py3-none-any.whl
(1 row)

warehouse=> select upload_time, filename from release_files where filename ilike '%\_\.%.whl' order by upload_time desc limit 1;
        upload_time         |                     filename
----------------------------+---------------------------------------------------
 2021-05-03 00:44:10.904378 | cmc_csci046_.yilinli_trees-1.0.1-py3-none-any.whl
(1 row)

warehouse=> select upload_time, filename from release_files where filename ilike '%\.\.%.whl' order by upload_time desc limit 1;
        upload_time         |                       filename
----------------------------+-------------------------------------------------------
 2020-11-06 20:00:06.720522 | example_..._pkg_megankuooo1234-0.0.1-py3-none-any.whl
(1 row)

So TL;DR: I'm not sure we want to become less restrictive by normalizing out character runs in the filename before comparison to the project name.

@takluyver
Copy link
Contributor

takluyver commented Jul 21, 2023 via email

Copy link
Member

@miketheman miketheman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go. Accept the suggestion if you like, it's okay if you don't!

tests/unit/forklift/test_legacy.py Outdated Show resolved Hide resolved
@di di enabled auto-merge (squash) July 21, 2023 17:40
@di di merged commit 81db47b into main Jul 21, 2023
@di di deleted the handle-runs-of-delimters-in-filenames branch July 21, 2023 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace deprecated usage of pkg_resources
3 participants