cEP-0025.md: Integrate pyflakes AST

Closes #143
coala · May 19, 2018 · 978e384 · 978e384
1 parent 5ae1373
commit 978e384
Showing 1 changed file with 306 additions and 0 deletions.
diff --git a/cEP-0025.md b/cEP-0025.md
@@ -0,0 +1,306 @@
+# Integration of pyflakes-enhanced-AST into coala
+
+| Metadata |                                                 |
+| -------- | ----------------------------------------------- |
+| cEP      | 22                                              |
+| Version  | 1.0                                             |
+| Title    | Integration of pyflakes-enhanced-AST into coala |
+| Authors  | Ankit Joshi <mailto:[email protected]>      |
+| Status   | Proposed                                        |
+| Type     | Feature                                         |
+
+## Abstract
+
+This document describes how to create a metabear to integrate
+pyflakes-enhanced-AST into coala and demonstrates its use by
+creating a simple dependent bear.
+
+## Introduction
+
+Flake8 is the most commonly used linter framework when it comes to static
+code analysis for python. It wraps around the tools pyflakes,
+pycodestyle and mccabe scripts. These tools are implemented on different
+interfaces and flake8 combines their potential.
+
+It even provides with a plugin mechanism. Additional linters are
+supported using the flake8 plugin system, and may be Python AST
+based or built using pycodestyle's internal logical-line-based
+checker API (checkout [Hacking](https://github.com/openstack-dev/hacking)).
+
+There has not been an option to use the pyflakes-enhanced-AST based checker
+API to create custom linters. Thus, the full potential of enhanced AST isn't
+utilized , a whole lot of rework is required to do the basic traversing
+and collection of important nodes. Pyflakes provides with a basic API
+that does the traversing. So, if a developer uses enhanced AST he just
+needs to work on the implementation of the new logic that his/her plugin
+provides and not about the fidelity of the basic node handlers.
+
+The reason for choosing pyflakes over its equivalents like
+[Asteroid](https://github.com/PyCQA/astroid) lies in its simplicity
+and provision of lots of re-usable analysis. Asteroid is more detailed and
+powerful, thus making it complicated. Plus, there is a much larger ecosystem
+of flake8 plugins.
+
+## Proposed Change
+
+Here is a brief overview of the architecture:
+
+1. There will be a separate repository named as coala-pyflakes which will
+   be installable via `pip install pyflakes-bears`.
+2. The repository will contain a new package `pyflakes-bears` which
+   would house all bears which use pyflakes-enhanced-ast.
+3. The `pyflakes-bears` would contain a metabear `PyFlakesASTBear`
+   which would be used by all other plugin bears.
+4. The repository would even contain another package `pyflakes-generic-plugins`
+   that would be independent of coala and could be run using flake8.
+
+## Management of the new `coala-pyflakes` repository
+
+The repository will use a similar mechanism of what is being
+used in `coala-bears` to test all its bears.
+
+## Implementation of PyFlakesASTBear
+
+1. PyFlakesASTBear will be LocalBear returning result of the type
+   HiddenResult since these results are not meant for the users but
+   for the dependent bear. The result structure is described by
+   PyFlakesResult class.
+2. PyFlakesResult consists of a snapshot of the module at class,
+   function, module, generator ,and doctest level. Thus, providing
+   the developer multiple views of a file allowing him to retrieve
+   results as per his demand.
+3. Apart from scope information, the metabear also returns a list of
+   messages generated by pyflakes itself. These messages can be used
+   by the developer to add an additional filter layer that is
+   required to be resolved by the user before executing the plugin.
+   This is primarily done because pyflakes emits additional syntax
+   errors which are not caught by python interpretor and may affect
+   the working of a plugin. This check is optional and may or may not
+   be incorparated by the developer.
+4. Helper function `get_node_location` provides with node location
+   in source file.
+5. Helper function `get_nodes` provides with all the nodes of a
+   particular type present in a scope.
+
+Here is a prototype for the implementations of PyFlakesASTBear:
+
+```python
+# Imports not mentioned to maintain brevity
+
+class PyFlakesResult(HiddenResult):
+
+    def __init__(self, origin, deadScopes, pyflakes_messages):
+
+        Result.__init__(self, origin, message='')
+
+        self.module_scope = self.get_scopes(ModuleScope, deadScopes)
+        self.class_scopes = self.get_scopes(ClassScope, deadScopes)
+        self.function_scopes = self.get_scopes(FunctionScope, deadScopes)
+        self.generator_scopes = self.get_scopes(GeneratorScope, deadScopes)
+        self.doctest_scopes = self.get_scopes(DoctestScope, deadScopes)
+        self.pyflakes_messages = pyflakes_messages
+
+    def get_scopes(self, scope_type, scopes):
+        return list(filter(lambda scope: isinstance(scope, scope_type),
+                           scopes))
+
+    def get_node_location(self, node):
+        return (node.source.lineno,
+                node.source.lineno + node.source.depth - 1,
+                node.source.col_offset)
+
+    def get_nodes(self, scope, node_type):
+        for _, node in scope.items():
+            if isinstance(node, node_type):
+                yield node
+
+
+class PyFlakesASTBear(LocalBear):
+    AUTHORS = {'The coala developers'}
+    AUTHORS_EMAILS = {'[email protected]'}
+    LICENSE = 'AGPL-3.0'
+
+    def run(self, filename, file):
+        """
+        Generates pyflakes-enhance-AST for the input file and returns
+        deadScopes as HiddenResult
+        :return:
+            One HiddenResult containing a dictionary with keys being
+            type of scope and values being a list of scopes generated
+            from the file.
+        """
+        tree = ast.parse(''.join(file))
+        result = Checker(tree, filename=filename, withDoctest=True)
+
+        yield PyFlakesResult(self, result.deadScopes, result.messages)
+```
+
+## Demonstrating use of PyFlakesASTBear by creating a simple bear
+
+A hassle free solution to the problem of finding all the future imports
+present in the source code.
+
+Here is a prototype for the NoFutureImportBear that uses PyFlakesASTBear:
+
+```python
+# Imports not mentioned to maintain brevity
+
+class NoFutureImportBear(LocalBear):
+    """
+    Uses pyflakes-enhance-AST to detect use of future imports
+    """
+    BEAR_DEPS = {PyFlakesASTBear}
+
+    def run(self, filename, file,
+            dependency_results=dict()
+            ):
+        for result in dependency_results.get(PyFlakesASTBear.name, []):
+            for node in result.get_nodes(result.module_scope[0],
+                                         FutureImportation):
+                lineno = result.get_node_location(node)[0]
+                corrected = self.remove_import(file, lineno, node.name)
+                yield Result.from_values(
+                    origin=self,
+                    message=('Future import %s found' % node.name),
+                    file=filename,
+                    diffs={filename: corrected},
+                    line=result.get_node_location(node)[0])
+```
+
+## Demonstrating use of PyFlakesASTBear by creating a fixes bear
+
+A hassle free solution to the problem of removing all the future imports
+present in the source code.
+
+Here is a prototype for the NoFutureImportBear that uses PyFlakesASTBear:
+
+```python
+# Imports not mentioned to maintain brevity
+
+class NoFutureImportBear(LocalBear):
+    """
+    Uses pyflakes-enhance-AST to remove future imports
+    """
+    BEAR_DEPS = {PyFlakesASTBear}
+
+    def remove_future_imports(self, file, lineno,
+                              max_col_offset, affected_lines):
+        """
+        Removes all __future__ imports from the input line
+        """
+        diff = Diff(file)
+        line = file[lineno - 1]
+        # If a line contains \ in the end
+        if line.rstrip()[-1] == '\\':
+            next_line = file[lineno]
+            semicolon_index = next_line.find(';')
+            diff.delete_line(lineno)
+            if semicolon_index == -1:
+                diff.delete_line(lineno + 1)
+            else:
+                replacement = next_line[semicolon_index + 1:].lstrip()
+                diff.modify_line(lineno+1, replacement)
+        else:
+            semicolon_indices = [i for i, a in enumerate(line) if a == ';']
+            if len(semicolon_indices) == 0:
+                diff.delete_line(lineno)
+            else:
+                for indice in semicolon_indices:
+                    if indice > max_col_offset:
+                        replacement = line[indice + 1:].lstrip()
+                        diff.modify_line(lineno, replacement)
+                        break
+
+        return diff
+
+    def run(self, filename, file,
+            dependency_results=dict()
+            ):
+        """
+        Uses PyFlakesASTBear to get only FutureImportation
+        """
+        # Preprocess all the lines containing __future__ imports
+        affected_lines = set()
+        # Get the maximum column offset of FutureImportation for a line
+        max_column_for_a_line = dict()
+        for result in dependency_results.get(PyFlakesASTBear.name, []):
+            for node in result.get_nodes(result.module_scope[0],
+                                         FutureImportation):
+                lineno = result.get_node_location(node)[0]
+                col_offset = result.get_node_location(node)[2]
+                if lineno in max_column_for_a_line:
+                    max_column_for_a_line[lineno] = max(
+                                max_column_for_a_line[lineno],
+                                col_offset)
+                else:
+                    max_column_for_a_line[lineno] = col_offset
+                affected_lines.add(lineno)
+
+        for line in affected_lines:
+            corrected = self.remove_future_imports(
+                        file, line, max_column_for_a_line[lineno],
+                        affected_lines
+                        )
+            yield Result.from_values(
+                origin=self,
+                message=('Future import/imports found'),
+                file=filename,
+                diffs={filename: corrected},
+                line=line)
+```
+
+## A generic implementation for detecting `__future__` imports
+
+This implementation doesn't requires coala to be installed and is
+designed so that it can be used in other tools like flake8. This
+implementation will be housed in `pyflakes-generic-plugins`.
+
+Here is a prototype for the NoFutureImport that uses pyflakes-AST:
+
+```python
+# Imports not mentioned to maintain brevity
+__version__ = '0.1'
+
+CODE = 'F482'
+
+
+class NoFutureImport(object):
+    name = 'no_future'
+    version = __version__
+
+    def __init__(self, tree, filename):
+        self.tree = tree
+        self.filename = filename
+        self._checker = None
+        self._module_scope = None
+
+    @property
+    def checker(self):
+        if self._checker is None:
+            self._checker = Checker(self.tree, self.filename)
+        return self._checker
+
+    @checker.setter
+    def checker(self, checker):
+        self.checker = checker
+
+    @property
+    def module_scope(self):
+        if self._module_scope is None:
+            self._module_scope = list(filter(lambda scope: isinstance(scope,
+                                                            ModuleScope),
+                                             self.checker.deadScopes))[0]
+        return self._module_scope
+
+    def set_pyflakes_data(self, checker):
+        self.checker = checker
+
+    def run(self):
+        for _, node in self.module_scope.items():
+            if isinstance(node, FutureImportation):
+                message = ('{code}: Future import {name} found'
+                           .format(name=node.name,
+                                   code=CODE))
+                yield (node.source.lineno, node.source.col_offset,
+                       message, NoFutureImport)
+```