Merge branches 'fix-values-module-indent', 'fix-readme-usage-snippet'…

…, 'optimize-any-value-parsing-1', 'fix-readme-usage-snippet-syntax-error', 'fix-setup-comment', 'improve-readme-text', 'optimize-some-if-statements', 'fix-formatter-issue-1', 'fix-formatter-issue-2', 'fix-formatter-issue-3', 'rewrite-https-hyperlinks', and 'fix-selectors-parsing'
amn · Sep 30, 2024 · 7ffb99f · 7ffb99f
13 parents 3fd2325 + 94f1ac9 + 7680929 + 2f5275e + 62b56e3 + 0d1923d + 83d3b56 + 29a482e + 28ab7dc + 27d21ca + 1b79b9b + b0402c2 + 9e4ce7b
commit 7ffb99f
Show file tree

Hide file tree

Showing 7 changed files with 246 additions and 229 deletions.
diff --git a/README.md b/README.md
@@ -29,8 +29,8 @@ It should go without saying that whether you choose to install the package with
 The code snippet below demonstrates obtaining of a _parse tree_ (in the `stylesheet` variable) by parsing the file `example.css`:
 
 ```python
-from csspring.parsing import normalize_input, parse_stylesheet
-stylesheet = parse_stylesheet(normalize_input(open('example.css', newline=''))))) # The `newline=''` argument prevents default re-writing of newline sequences in input — per the CSS Syntax spec., parsing does filtering of newline sequences so no rewriting by `open` is necessary or desirable
+from csspring.parsing import parse_stylesheet
+stylesheet = parse_stylesheet(open('example.css', newline='')) # The `newline=''` argument prevents default re-writing of newline sequences in input — per the CSS Syntax spec., parsing does filtering of newline sequences so no rewriting by `open` is necessary or desirable
 ```
 
 ## Documentation
@@ -71,15 +71,15 @@ Parsing is offered only in the form of Python modules — no "command-line" prog
 
 ### Why?
 
-We wanted a "transparent" CSS parser — one that could be used in different configurations without it imposing limitations that would strictly speaking go beyond parsing. Put differently, we wanted a parser that does not assume any particular application, a software _library_ in the classical sense of the term, or a true _API_ if you will.
+We wanted a "transparent" CSS parser — one that could be used in different configurations without it imposing limitations that would strictly speaking go beyond parsing. Put differently, we wanted a parser that does not assume any particular application — a software _library_ in the classical sense of the term, or a true _API_ if you will.
 
 For instance, the popular [Less](http://lesscss.org) software seems to rather effortlessly parse CSS [3] text, but it invariably re-arranges white-space in the output, without giving the user any control over the latter. Less is not _transparent_ like that — there is no way to use it with recovery of the originally parsed text from the parse tree — parsing with Less is a one-way street for at least _some_ applications (specifically those that "transform" CSS but need to preserve all of the original input as-is).
 
 In comparison, this library was written to preserve _all_ input, _as-is_. This became one of the requirements defining the library, contributing to its _reason d'etre_.
 
 ### Why Python?
 
-As touched upon in [the disclaimer above](#disclaimer), the parser was written "from the bottom up" - if it ever adopts a top layer exposing its features with a "command line" tool, said layer will invariably have to tap into the rest of it, the library, and so in the very least a library is offered. Without a command-line tool (implying switches and other facilities commonly associated with command-line tools) the utility of the parser is tightly bound to the capabilities of e.g. the programming language it was written in, since the language effectively functions as the interface to the library (you can hardly use a library offered in the form of a C code without a C compiler and/or a dynamic linker). A parser is seldom used in isolation, after all — its output, the parse tree, is normally fed to another component in a larger application. Python is currently ubiquitous and attractive looking at a set of metrics that are relevant here. The collective amount of Python code is currently growing steadily, which drives adoption, which makes the prospect of offering CSS parsing written in specifically Python ever more enticing.
+As touched upon in [the disclaimer above](#disclaimer), the parser was written "from the bottom up" - if it ever adopts a top layer exposing its features with a "command line" tool, said layer will invariably have to tap into the rest of it, the library, and so in the very least a library is offered. Without a command-line tool (implying switches and other facilities commonly associated with command-line tools) the utility of the parser is tightly bound to the capabilities of e.g. the programming language it was written in, since the language effectively functions as the interface to the library (you can hardly use a library offered in the form of a C code without a C compiler and/or a dynamic linker). A parser is seldom used in isolation, after all — its output, the parse tree, is normally fed to another component in a larger application. Python is ubiquitous and attractive on a number of metrics relevant to us. The collective amount of Python code is growing steadily, which drives adoption, both becoming factors for choosing to offer CSS parsing written in specifically Python.
 
 Another factor for choosing Python was the fact we couldn't find any _sufficiently capable_ CSS parsing libraries written specifically as [reusable] Python module(s). While there _are_ a few CSS parsing libraries available, none declared compliance with or de-facto support CSS 3 (including features like nested rules etc). In comparison, this library was written in close alignment with CSS 3 standard specification(s) (see [the compliance declaration](#compliance)).
 

diff --git a/expand/csspring/syntax/tokenizing.py b/expand/csspring/syntax/tokenizing.py
@@ -9,6 +9,7 @@
 from ..utils import CP, BufferedPeekingReader, is_surrogate_code_point_ordinal, IteratorReader, join, parser_error, PeekingUnreadingReader
 
 from abc import ABC
+import builtins
 from collections.abc import Callable, Iterable, Iterator
 from dataclasses import dataclass
 from decimal import Decimal
@@ -46,7 +47,7 @@ def next(n: int) -> str:
     def consume(n: int) -> None:
         """Consume the next code point from the stream.
 
-        Consuming removes a [filtered] code point from the stream. If no code points are available for consumption (the stream is "exhausted"), an empty string signifying the so-called EOF ("end of file", see https://drafts.csswg.org/css-syntax/#eof-code-point) value, is consumed instead.
+        Consuming removes a [filtered] code point from the stream. If no code points are available for consumption (the stream is "exhausted"), an empty string signifying the so-called EOF ("end of file", see http://drafts.csswg.org/css-syntax/#eof-code-point) value, is consumed instead.
         """
         nonlocal consumed # required for the `+=` to work for mutable non-locals like lists (despite the fact that the equivalent `extend` does _not_ require the statement)
         consumed += input.read(n) or [ FilteredCodePoint('', source='') ]
@@ -502,3 +503,26 @@ def is_non_printable_code_point(cp: CP) -> bool:
 def is_whitespace(cp: CP) -> bool:
     """See http://drafts.csswg.org/css-syntax/#whitespace."""
     return is_newline(cp) or cp in ('\t', ' ')
+
+# Map of values by token type, for types of tokens which do _not_ have the `value` attribute
+token_values = { # For the `token_value` procedure to work as intended, subtypes should be listed _before_ their supertype(s)
+    OpenBraceToken: '{',
+    OpenBracketToken: '[',
+    OpenParenToken: '(',
+    CloseBraceToken: '}',
+    CloseBracketToken: ']',
+    CloseParenToken: ')',
+    ColonToken: ':',
+    CommaToken: ',',
+    SemicolonToken: ';',
+    CDCToken: '->',
+    CDOToken: '!--',
+}
+
+def token_value(type: builtins.type[Token]) -> str:
+    """Get the value of a token by its type for types of tokens that do _not_ feature a `value` attribute.
+
+    :param type: Type of token to get the value of
+    :returns: The value common to the type of tokens
+    """
+    return next(value for key, value in token_values.items() if issubclass(type, key))
diff --git a/setup.py b/setup.py
@@ -22,6 +22,6 @@ def run(self, *args, **kwargs) -> None:
         subprocess.check_call(('make', '-C', self.build_lib, '-f', os.path.realpath('Makefile')))
 
 class BuildCommand(setuptools.command.build.build):
-    sub_commands = [ ('build_make', None) ] + setuptools.command.build.build.sub_commands # Makes the `build_make` command a sub-command of the `build_command`, which has the effect of the former being invoked when the latter is invoked (which is invoked in turn when the wheel must be built, through the `bdist_wheel` command)
+    sub_commands = [ ('build_make', None) ] + setuptools.command.build.build.sub_commands # Makes the `build_make` command a sub-command of the `build` command, which has the effect of the former being invoked when the latter is invoked (which is invoked in turn when the wheel must be built, through the `bdist_wheel` command)
 
 setup(cmdclass={ 'build': BuildCommand, 'build_make': MakeCommand })