Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NormalizedMatcher fails to parse dev versions like 0.8.1dev #200

Open
con-f-use opened this issue Jun 6, 2023 · 9 comments
Open

NormalizedMatcher fails to parse dev versions like 0.8.1dev #200

con-f-use opened this issue Jun 6, 2023 · 9 comments

Comments

@con-f-use
Copy link

con-f-use commented Jun 6, 2023

Describe the bug

dislib.version.NormaizeMatcher errors when called with something like 0.8.1dev. I.e. development version specifiers that either omit the trailing number or the dot separator.

PEP 440 specifies that development release separators may omit the separator and number entirely:

Development release separators

Development releases allow a ., -, or a _ separator as well as omitting the separator all together. The normal form of this is with the . separator. This allows versions such as 1.2-dev2 or 1.2dev2 which normalize to 1.2.dev2.
Implicit development release number

Development releases allow omitting the numeral in which case it is implicitly assumed to be 0. The normal form for this is to include the 0 explicitly. This allows versions such as 1.2.dev which is normalized to 1.2.dev0.

To Reproduce

$ python -c 'import distlib, distlib.version; print(f"{distlib.__version__=}"); print(distlib.version.NormalizedMatcher("mypackage<0.8.1dev"))'
distlib.__version__='0.3.6'
Traceback (most recent call last):
 File "<string>", line 1, in <module>
 File "/tmp/bla/.direnv/python-3.10.10/lib/python3.10/site-packages/distlib/version.py", line 125, in __init__
   vn, prefix = self.version_class(s), False
 File "/tmp/bla/.direnv/python-3.10.10/lib/python3.10/site-packages/distlib/version.py", line 33, in __init__
   self._parts = parts = self.parse(s)
 File "/tmp/bla/.direnv/python-3.10.10/lib/python3.10/site-packages/distlib/version.py", line 267, in parse
   result = _normalized_key(s)
 File "/tmp/bla/.direnv/python-3.10.10/lib/python3.10/site-packages/distlib/version.py", line 188, in _pep_440_key
   raise UnsupportedVersionError('Not a valid version: %s' % s)
distlib.version.UnsupportedVersionError: Not a valid version: 0.8.1dev

Expected behavior

$ python -c 'import distlib.version; print(distlib.version.NormalizedMatcher("mypackage<0.8.1dev"))'
mypackage<0.8.1.dev0
@con-f-use con-f-use changed the title Noralize matcher fails to parse dev versions like 0.8.1dev NormaizeMatcher fails to parse dev versions like 0.8.1dev Jun 6, 2023
@con-f-use
Copy link
Author

con-f-use commented Jun 6, 2023

I suspect there is more fineprint in PEP 440, that distlib just ignores. E.g. that all letters should be matched case-insenitively, or:

Pre-release spelling

Pre-releases allow the additional spellings of alpha, beta, c, pre, and preview for a, b, rc, rc, and rc respectively. This allows versions such as 1.1alpha1, 1.1beta2, or 1.1c3 which normalize to 1.1a1, 1.1b2, and 1.1rc3

or:

Post release spelling

Post-releases allow the additional spellings of rev and r. This allows versions such as 1.0-r4 which normalizes to 1.0.post4. As with the pre-releases the additional spellings should be considered equivalent to their normal forms.

@vsajip vsajip changed the title NormaizeMatcher fails to parse dev versions like 0.8.1dev NormalizedMatcher fails to parse dev versions like 0.8.1dev Jun 6, 2023
@vsajip vsajip closed this as completed in 6181c59 Jun 6, 2023
@vsajip
Copy link
Collaborator

vsajip commented Jun 6, 2023

This was auto-closed via commit message, but I'm not sure if it's actually fixed yet. I'm not sure if PEP440 had revisions after distlib implemented an early version of it, or whether those parts of the spec got missed in error. I'll reopen and close once the tests complete satisfactorily.

@vsajip vsajip reopened this Jun 6, 2023
@vsajip
Copy link
Collaborator

vsajip commented Jun 6, 2023

OK, tests all seem to be passing. You might want to try the latest version in this repository and report your findings. (Only distlib/version.py changed, apart from some tests. If you are using Python 3.12 on mac OS, you might also want to use the latest distlib/util.py.)

@con-f-use
Copy link
Author

con-f-use commented Jun 6, 2023

PEP 440 even specifies a regex for the version spec, that is much more complex than the one in distlib.

VERSION_PATTERN = r"""
    v?
    (?:
        (?:(?P<epoch>[0-9]+)!)?                           # epoch
        (?P<release>[0-9]+(?:\.[0-9]+)*)                  # release segment
        (?P<pre>                                          # pre-release
            [-_\.]?
            (?P<pre_l>(a|b|c|rc|alpha|beta|pre|preview))
            [-_\.]?
            (?P<pre_n>[0-9]+)?
        )?
        (?P<post>                                         # post release
            (?:-(?P<post_n1>[0-9]+))
            |
            (?:
                [-_\.]?
                (?P<post_l>post|rev|r)
                [-_\.]?
                (?P<post_n2>[0-9]+)?
            )
        )?
        (?P<dev>                                          # dev release
            [-_\.]?
            (?P<dev_l>dev)
            [-_\.]?
            (?P<dev_n>[0-9]+)?
        )?
    )
    (?:\+(?P<local>[a-z0-9]+(?:[-_\.][a-z0-9]+)*))?       # local version
"""

_regex = re.compile(
    r"^\s*" + VERSION_PATTERN + r"\s*$",
    re.VERBOSE | re.IGNORECASE,
)

For implementation, you should probably use this verbatim and use the matching groups and string.lower to construct the normalized version.

@con-f-use
Copy link
Author

con-f-use commented Jun 6, 2023

OK, tests all seem to be passing. You might want to try the latest version in this repository and report your findings. (Only distlib/version.py changed, apart from some tests. If you are using Python 3.12 on mac OS, you might also want to use the latest distlib/util.py.)

It's better but not fixed:

python -c 'from distlib.version import NormalizedMatcher; print(NormalizedMatcher("0.8.1DEV"))'
0.8.1DEV

PEP 440 states that it should be normalized to lower case with dot separator and zero at the end, i.e.: 0.8.1DEV should normalize to 0.8.1.dev0. Same goes for 0.8.1rev, 0.8.1-rev0 and 0.8.1r, which should all normalize to the correct `0.8.1.post0.

Case sensitivity

All ascii letters should be interpreted case insensitively within a version and the normal form is lowercase. This allows versions such as 1.1RC1 which would be normalized to 1.1rc1.

Implicit pre-release number

Pre releases allow omitting the numeral in which case it is implicitly assumed to be 0. The normal form for this is to include the 0 explicitly. This allows versions such as 1.2a which is normalized to 1.2a0.

Post release separators

Post releases allow a ., -, or _ separator as well as omitting the separator all together. The normal form of this is with the . separator. This allows versions such as 1.2-post2 or 1.2post2 which normalize to 1.2.post2. Like the pre-release separator this also allows an optional separator between the post release signifier and the numeral. This allows versions like 1.2.post-2 which would normalize to 1.2.post2.

Post release spelling

Post-releases allow the additional spellings of rev and r. This allows versions such as 1.0-r4 which normalizes to 1.0.post4. As with the pre-releases the additional spellings should be considered equivalent to their normal forms.

@vsajip
Copy link
Collaborator

vsajip commented Jun 6, 2023

I'm pretty sure that regex wasn't in the spec when distlib implemented its version code, or I'd have used it (parts of it might even predate PEP440, it was around the time of Python 3.3 - it started off with taking the PEP3.3 aborted packaging code, which included versions). I'm not sure I have time at the moment to do a change as big as the whole regex (and then work though all the tests etc. as necessary), though I may look at that later. NormalizedMatcher isn't supposed to suggest a normalised version, but it should match a normalized version:

>>> from distlib.version import NormalizedMatcher as NM
>>> m = NM('0.8.1dev')
>>> m.match('0.8.1.dev0')
True

@vsajip
Copy link
Collaborator

vsajip commented Jun 6, 2023

Have you found any more errors in the version parsing? The NormalizedMatcher is not supposed to suggest a canonical form via its __str__, so that's not actually a bug (though the failure to parse valid PEP440 versions most definitely is).

@con-f-use
Copy link
Author

con-f-use commented Jun 6, 2023

I don't think I found any further bugs, but if you wanted to add an actual normalizer, it might go something like this:

import re

VERSION_PATTERN = r"""
    v?
    (?:
        (?:(?P<epoch>[0-9]+)!)?                           # epoch
        (?P<release>[0-9]+(?:\.[0-9]+)*)                  # release segment
        (?P<pre>                                          # pre-release
            [-_\.]?
            (?P<pre_l>(a|b|c|rc|alpha|beta|pre|preview))
            [-_\.]?
            (?P<pre_n>[0-9]+)?
        )?
        (?P<post>                                         # post release
            (?:-(?P<post_n1>[0-9]+))
            |
            (?:
                [-_\.]?
                (?P<post_l>post|rev|r)
                [-_\.]?
                (?P<post_n2>[0-9]+)?
            )
        )?
        (?P<dev>                                          # dev release
            [-_\.]?
            (?P<dev_l>dev)
            [-_\.]?
            (?P<dev_n>[0-9]+)?
        )?
    )
    (?:\+(?P<local>[a-z0-9]+(?:[-_\.][a-z0-9]+)*))?       # local version
"""

_regex = re.compile(
    r"^\s*" + VERSION_PATTERN + r"\s*$",
    re.VERBOSE | re.IGNORECASE,
)

def normalize(maybe_version):
    segments  = _regex.match(maybe_version.lower())
    if segments is None:
        raise ValueError("Not a valid PEP 440 version")

    epoch = segments["epoch"]
    epoch = "" if epoch is None else f"{int(epoch)}!"

    release = ".".join(str(int(r)) for r in segments["release"].split("."))

    pre = ""
    if segments["pre"] is not None:
        pre = int(segments["pre_n"] or 0)
        pre = f".pre{pre}"

    post = ""
    if segments["post"] is not None:
        if segments["post_n1"] is None:
            post = int(segments["post_n2"] or 0)
        else:
            post = int(segments["post_n1"] or 0)
        post = f".post{post}"

    dev = ""
    if segments["dev"] is not None:
        dev = int(segments["dev_n"] or 0)
        dev = f".dev{dev}"

    local = ""
    if segments["local"] is not None:
        local = f"+{segments['local']}"

    return f"{epoch}{release}{pre}{post}{dev}{local}"

# Smoke Tests:
assert normalize("8.0.1DEV") == "8.0.1.dev0"
assert normalize("1r7") == "1.post7"
assert normalize("1") == "1"
assert normalize("0000.0002-rc12345") == "0.2.pre12345"
assert normalize("017!0_dEv") == "17!0.dev0"
assert normalize("17+gAefF123_a_7.7af") == "17+gaeff123_a_7.7af"
assert normalize("1-post_") == "1.post0"

(No guaranty it actually works in all edge cases)

I can make a pull request if you're interested. I think it would be functionality that might be very useful to users.

To be honest, I find the PEP 404 normalization rules a bit weird, because things like 1r7 and 1beta7 all normalize to 1.pre7, but are supposed to be ordered like so:

Summary of permitted suffixes and relative ordering
[...]

Within a numeric release (1.0, 2.7.3), the following suffixes are permitted and MUST be ordered as shown:

.devN, aN, bN, rcN, , .postN

So PEP 440 definitely has deep pitfalls. And yeah, like you, I find it pretty strange, that accepted and established PEPs are subject to change (that one as late as Oct. 2022 - eight years after it was accepted), despite it saying, its state was "Final - Accepted and implementation complete, or no longer active".

@vsajip
Copy link
Collaborator

vsajip commented Jun 6, 2023

That's helpful of you to flesh it out for me, thanks very much. There's actually a suggest() API in distlib.version which I can update to look more like this, but I'll need to check what else needs changing (docs, tests etc.) so don't worry about making a PR right now, as I'm in the middle of some other stuff and have limited time available. I still prioritise bug fixes, as you've seen (I hope) but enhancements can wait till I have more time. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants