Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure we can detect simple "computed" Python package versions in setup.py #2263

Open
pombredanne opened this issue Sep 30, 2020 · 4 comments

Comments

@pombredanne
Copy link
Member

pombredanne commented Sep 30, 2020

Description

It is a common pattern to have a computed version for a setup.py.
This means that we cannot detect the version in this case. This is even the case in the current scancode-toolkit.
See for instance:
https://github.com/nexB/scancode-toolkit/blob/c3c92ff121632ea5db835f1c460c7d483a91a5d6/setup.py#L101

In simpler cases we should be able to detect the convention of using a "dunder" __version__ field such as in six:

We also need to clean up the mess we are about to create with the new break out in repos for #2233 and the use of setuptools_scm that would completely remove any version references from setup.py/.cfg https://github.com/nexB/typecode/blob/a337e7484ec563f47c2e6d3ce650448d69b13549/setup.cfg
See aboutcode-org/typecode#3

One approach to a solution may be in the many possible tools listed in #253
See also https://packaging.python.org/guides/single-sourcing-package-version/

A good test is with this list of download URLs:

@pombredanne
Copy link
Member Author

We have a possible solution with this https://github.com/pyserial/pyserial/blob/d867871e6aa333014a77498b4ac96fdd1d3bf1d8/setup.py#L34

def find_version(*file_paths):
    """
    Search the file for a version string.
    file_path contain string path components.
    Reads the supplied Python module as text without importing it.
    """
    version_file = read(*file_paths)
    version_match = re.search(r"^__version__ = ['\"]([^'\"]*)['\"]",
                              version_file, re.M)
    if version_match:
        return version_match.group(1)
    raise RuntimeError("Unable to find version string.")


version = find_version('serial', '__init__.py')


setup(
    name="pyserial",
    description="Python Serial Port Extension",
    version=version,
....

which is MIT-licensed https://github.com/pyserial/pyserial/blob/master/LICENSE.txt

@pombredanne
Copy link
Member Author

Using the list of test packages listed in the ticket description, with this:

$ scancode -p --json-pp - ~/tmp/setup/ --only-findings -n3

we get these results package-scan.json.txt

A more feature scan with:

$ scancode -p --json-pp - ~/tmp/setup/ --only-findings -n3 --consolidate --license --copyright  --license-text --info --classify --json-pp initial.json.txt 

has these results
Uploading initial.json.txt…

@pombredanne
Copy link
Member Author

Another extra possibility would be to ensure that when we detect: PKG-INFO and a setup.py at the same level in an sdist we merge the package records in a single one, for instance:

  • six-1.14.0.tar.gz-extract/six-1.14.0/PKG-INFO
  • six-1.14.0.tar.gz-extract/six-1.14.0/setup.py

This would not replace the need to detect dunder versions, but when we have an sdist, PKG-INFO contains an already resolved version (including from running any code and dynamic lookup of a dunder version)

@pombredanne
Copy link
Member Author

I have a patch that needs testing inspired by @rob-smallshire and heavily modified from:
https://github.com/pyserial/pyserial/blob/d867871e6aa333014a77498b4ac96fdd1d3bf1d8/setup.py#L34
SPDX-License-Identifier: BSD-3-Clause
(C) 2001-2020 Chris Liechti [email protected]

diff --git a/src/packagedcode/pypi.py b/src/packagedcode/pypi.py
index e339a27..39fc666 100644
--- a/src/packagedcode/pypi.py
+++ b/src/packagedcode/pypi.py
@@ -31,21 +31,20 @@
 import json
 import logging
 import os
+import re
 import sys
 
 import attr
-from six import string_types
-
+import dparse
+from dparse import filetypes
 from pkginfo import BDist
 from pkginfo import Develop
 from pkginfo import SDist
 from pkginfo import UnpackedSDist
 from pkginfo import Wheel
-
-import dparse
-from dparse import filetypes
-
+from packageurl import PackageURL
 import saneyaml
+from six import string_types
 
 from commoncode import filetype
 from commoncode import fileutils
@@ -53,7 +52,6 @@
 from packagedcode import models
 from packagedcode.utils import build_description
 from packagedcode.utils import combine_expressions
-from packageurl import PackageURL
 
 try:
     # Python 2
@@ -63,13 +61,12 @@
     # Python 3
     unicode = str  # NOQA
 
-
 """
 Detect and collect Python packages information.
 
 """
 
-TRACE = False
+TRACE = True
 
 
 def logger_debug(*args):
@@ -315,7 +312,7 @@
     sha256 = None
     if '_meta' in data:
         for name, meta in data['_meta'].items():
-            if name=='hash':
+            if name == 'hash':
                 sha256 = meta.get('sha256')
 
     package_dependencies = parse_with_dparse(location)
@@ -347,19 +344,31 @@
     for statement in tree.body:
         # We only care about function calls or assignments to functions named `setup`
         if (isinstance(statement, ast.Expr)
-                or isinstance(statement, ast.Call)
-                or isinstance(statement, ast.Assign)
-                and isinstance(statement.value, ast.Call)
-                and isinstance(statement.value.func, ast.Name)
-                and statement.value.func.id == 'setup'):
+            or isinstance(statement, ast.Call)
+            or isinstance(statement, ast.Assign)
+            and isinstance(statement.value, ast.Call)
+            and isinstance(statement.value.func, ast.Name)
+            and statement.value.func.id == 'setup'
+        ):
+
             # Process the arguments to the setup function
             for kw in statement.value.keywords:
                 arg_name = kw.arg
+
                 if isinstance(kw.value, ast.Str):
                     setup_args[arg_name] = kw.value.s
-                if isinstance(kw.value, ast.List):
-                    # We collect the elements of a list if the element is not a function call
-                    setup_args[arg_name] = [elt.s for elt in kw.value.elts if not isinstance(elt, ast.Call)]
+
+                elif isinstance(kw.value, (ast.List, ast.Tuple, ast.Set,)):
+                    # We collect the elements of a list if the element
+                    # and tag function calls
+                    value = [
+                        elt.s for elt in kw.value.elts
+                        if not isinstance(elt, ast.Call)
+                    ]
+                    setup_args[arg_name] = value
+
+                # TODO:  what if isinstance(kw.value, ast.Dict)
+                # or an expression like a call to version=get_version or version__version__
 
     package_name = setup_args.get('name')
     if not package_name:
@@ -367,7 +376,8 @@
 
     description = build_description(
         setup_args.get('summary', ''),
-        setup_args.get('description', ''))
+        setup_args.get('description', ''),
+    )
 
     parties = []
     author = setup_args.get('author')
@@ -383,6 +393,15 @@
                 url=homepage_url
             )
         )
+    elif author_email:
+        parties.append(
+            models.Party(
+                type=models.party_person,
+                email=author_email,
+                role='author',
+                url=homepage_url
+            )
+        )
 
     declared_license = OrderedDict()
     license_setuptext = setup_args.get('license')
@@ -394,9 +413,14 @@
 
     other_classifiers = [c for c in classifiers if not c.startswith('License')]
 
+    detected_version = setup_args.get('version')
+    if not detected_version:
+        # search for possible dunder versions here and elsewhere
+        detected_version = detect_version_attribute(location)
+
     return PythonPackage(
         name=package_name,
-        version=setup_args.get('version'),
+        version=detected_version,
         description=description or None,
         homepage_url=setup_args.get('url') or None,
         parties=parties,
@@ -404,6 +428,196 @@
         keywords=other_classifiers,
     )
 
+#########################################################
+# code inspired and heavily modified from:
+# https://github.com/pyserial/pyserial/blob/d867871e6aa333014a77498b4ac96fdd1d3bf1d8/setup.py#L34
+# SPDX-License-Identifier:    BSD-3-Clause
+# (C) 2001-2020 Chris Liechti <[email protected]>
+
+
+def find_pattern(location, pattern):
+    """
+    Search the file at `location` for a patern regex on a single line and return
+    this or None if not found. Reads the supplied location as text without
+    importing it.
+    """
+    with io.open(location, encoding='utf8') as fp:
+        content = fp.read()
+
+    match = re.search(pattern, content)
+    if match:
+        return match.group(1).strip()
+
+
+def find_dunder_version(location):
+    """
+    Return a "dunder" __version__ string or None from searching the module file
+    at `location`.
+    """
+    pattern = re.compile(r"^__version__\s*=\s*['\"]([^'\"]*)['\"]", re.M)
+    match = find_pattern(location, pattern)
+    logger_debug('find_dunder_version:', 'location:', location, 'match:', match)
+    return match
+
+
+def find_plain_version(location):
+    """
+    Return a plain version attribute string or None from searching the module
+    file at `location`.
+    """
+    pattern = re.compile(r"^version\s*=\s*['\"]([^'\"]*)['\"]", re.M)
+    match = find_pattern(location, pattern)
+    logger_debug('find_plain_version:', 'location:', location, 'match:', match)
+    return match
+
+
+def find_setup_py_dunder_version(location):
+    """
+    Return a "dunder" __version__ expression string used as a setup(version)
+    argument or None from searching the setup.py file at `location`.
+
+    For instance:
+        setup(
+            version=six.__version__,
+        ...
+    would return six.__version__.
+    """
+    pattern = re.compile(r"^\s*version\s*=\s*(.*__version__)", re.M)
+    match = find_pattern(location, pattern)
+    logger_debug('find_setup_py_dunder_version:', 'location:', location, 'match:', match)
+    return match
+
+
+def detect_version_attribute(setup_location):
+    """
+    Return a detected version from a setup.py file at `location` if used as in
+    a version argument of the setup() function.
+    Also search for neighbor files for __version__ and common patterns.
+    """
+    # search for possible dunder versions here and elsewhere
+    setup_version_arg = find_setup_py_dunder_version(setup_location)
+    setup_py__version = find_dunder_version(setup_location)
+    logger_debug(
+        '    detect_dunder_version:',
+        'setup_location:', setup_location,
+    )
+    logger_debug('    setup_version_arg:', repr(setup_version_arg),)
+    logger_debug('    setup_py__version:', repr(setup_py__version),)
+    if setup_version_arg == '__version__' and setup_py__version:
+        version = setup_py__version or None
+        logger_debug('    detect_dunder_version: A:', version)
+        return version
+
+    # here we have a more complex __version__ location
+    # we start by adding the possible paths and file name 
+    # and we look at these in sequence
+
+    candidate_locs = []
+
+    if setup_version_arg and '.' in setup_version_arg:
+        segments = setup_version_arg.split('.')[:-1]
+    else:
+        segments = []
+
+    special_names = (
+        '__init__.py',
+        '__main__.py',
+        '__version__.py',
+        '__about__.py',
+        '__version.py',
+        '_version.py',
+        'version.py',
+        'VERSION.py',
+        'package_data.py',
+    )
+
+    setup_py_dir = fileutils.parent_directory(setup_location)
+    src_dir = os.path.join(setup_py_dir, 'src')
+    has_src = os.path.exists(src_dir)
+
+    if segments:
+        for n in special_names:
+            candidate_locs.append([segments] + [n])
+        if has_src:
+            for n in special_names:
+                candidate_locs.append(['src'] + [segments] + [n])
+
+        if len(segments) > 1:
+            heads = segments[:-1]
+            tail = segments[-1]
+            candidate_locs.append(heads + [tail + '.py'])
+            if has_src:
+                candidate_locs.append(['src'] + heads + [tail + '.py'])
+
+        else:
+            seg = segments[0]
+            candidate_locs.append([seg + '.py'])
+            if has_src:
+                candidate_locs.append(['src'] + [seg + '.py'])
+
+    candidate_locs = [os.path.join(setup_py_dir, *cand_loc_segs)
+        for cand_loc_segs in candidate_locs]
+
+    for fl in get_module_scripts(
+        location=setup_py_dir,
+        max_depth=4,
+        interesting_names=special_names,
+    ):
+        candidate_locs.append(fl)
+
+    for loc in candidate_locs:
+        logger_debug('    can loc:', loc)
+
+    version = detect_version_in_locations(
+        candidate_locs=candidate_locs,
+        detector=find_dunder_version
+    )
+
+    if version:
+        return version
+
+    return detect_version_in_locations(
+        candidate_locs=candidate_locs,
+        detector=find_plain_version,
+    )
+
+
+def detect_version_in_locations(candidate_locs, detector=find_plain_version):
+    """
+    Return the first version found in a location from `candidate_locs` using the
+    `detector` callable. Or None.
+    """
+    for loc in candidate_locs:
+        if os.path.exists(loc):
+            logger_debug('detect_version_in_locations:', 'loc:', loc)
+            # here the file exists try to get a dunder version
+            version = detector(loc)
+            logger_debug('detect_version_in_locations:', 'detector', detector, 'version:', version)
+            if version:
+                return version
+        
+
+
+def get_module_scripts(location, max_depth=1, interesting_names=()):
+    """
+    Yield interesting Python script paths that have a name in
+    `interesting_names` by walking the `location` directory recursively up to
+    `max_depth` path segments extending from the root `location`.
+    """
+
+    location = location.rstrip(os.path.sep)
+    current_depth = max_depth
+    for top, _dirs, files in os.walk(location):
+        if current_depth == 0:
+            break
+        for f in files:
+            if f in interesting_names:
+                path = os.path.join(top, f)
+                logger_debug('get_module_scripts:', 'path', path)
+                yield path
+
+        current_depth -= 1
+
 
 # FIXME: use proper library for parsing these
 def parse_metadata(location):

It gets things mostly right for all these

certifi-2020.6.20.tar.gz
cffi-1.14.0.tar.gz
chardet-3.0.4.tar.gz
docutils-0.16.tar.gz
flit-2.3.0.tar.gz
flit_core-2.3.0.tar.gz
idna-2.9.tar.gz
master.zip
numpy-1.19.2.zip
paho-mqtt-1.5.0.tar.gz
pexpect-4.6.0.tar.gz
pycparser-2.20.tar.gz
pyserial-3.4.tar.gz
pytoml-0.1.21.tar.gz
requests-2.24.0.tar.gz
six-1.14.0.tar.gz
urllib3-1.25.9.tar.gz
wheel-0.34.2.tar.gz

pombredanne added a commit that referenced this issue Oct 1, 2020
* use __version__ and related conventions to improve license detection
  in setup.py scripts. Most detectable version that are fetched from an
  attribute are now detected. Some complex cases cannot be detected as
  they do not use conventions.
* also add a "Party" when only the email is present.

Signed-off-by: Philippe Ombredanne <[email protected]>
pombredanne added a commit that referenced this issue Oct 2, 2020
Improve setup.py version collection #2263
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant