Skip to content

Commit

Permalink
Merge branches 'fix-values-module-indent', 'fix-readme-usage-snippet'…
Browse files Browse the repository at this point in the history
…, 'optimize-any-value-parsing-1', 'fix-readme-usage-snippet-syntax-error', 'fix-setup-comment', 'improve-readme-text', 'optimize-some-if-statements', 'fix-formatter-issue-1', 'fix-formatter-issue-2', 'fix-formatter-issue-3', 'rewrite-https-hyperlinks', and 'fix-selectors-parsing'
  • Loading branch information
amn committed Sep 30, 2024
13 parents 3fd2325 + 94f1ac9 + 7680929 + 2f5275e + 62b56e3 + 0d1923d + 83d3b56 + 29a482e + 28ab7dc + 27d21ca + 1b79b9b + b0402c2 + 9e4ce7b commit 7ffb99f
Show file tree
Hide file tree
Showing 7 changed files with 246 additions and 229 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ It should go without saying that whether you choose to install the package with
The code snippet below demonstrates obtaining of a _parse tree_ (in the `stylesheet` variable) by parsing the file `example.css`:

```python
from csspring.parsing import normalize_input, parse_stylesheet
stylesheet = parse_stylesheet(normalize_input(open('example.css', newline=''))))) # The `newline=''` argument prevents default re-writing of newline sequences in input — per the CSS Syntax spec., parsing does filtering of newline sequences so no rewriting by `open` is necessary or desirable
from csspring.parsing import parse_stylesheet
stylesheet = parse_stylesheet(open('example.css', newline='')) # The `newline=''` argument prevents default re-writing of newline sequences in input — per the CSS Syntax spec., parsing does filtering of newline sequences so no rewriting by `open` is necessary or desirable
```

## Documentation
Expand Down Expand Up @@ -71,15 +71,15 @@ Parsing is offered only in the form of Python modules — no "command-line" prog

### Why?

We wanted a "transparent" CSS parser — one that could be used in different configurations without it imposing limitations that would strictly speaking go beyond parsing. Put differently, we wanted a parser that does not assume any particular application, a software _library_ in the classical sense of the term, or a true _API_ if you will.
We wanted a "transparent" CSS parser — one that could be used in different configurations without it imposing limitations that would strictly speaking go beyond parsing. Put differently, we wanted a parser that does not assume any particular application a software _library_ in the classical sense of the term, or a true _API_ if you will.

For instance, the popular [Less](http://lesscss.org) software seems to rather effortlessly parse CSS [3] text, but it invariably re-arranges white-space in the output, without giving the user any control over the latter. Less is not _transparent_ like that — there is no way to use it with recovery of the originally parsed text from the parse tree — parsing with Less is a one-way street for at least _some_ applications (specifically those that "transform" CSS but need to preserve all of the original input as-is).

In comparison, this library was written to preserve _all_ input, _as-is_. This became one of the requirements defining the library, contributing to its _reason d'etre_.

### Why Python?

As touched upon in [the disclaimer above](#disclaimer), the parser was written "from the bottom up" - if it ever adopts a top layer exposing its features with a "command line" tool, said layer will invariably have to tap into the rest of it, the library, and so in the very least a library is offered. Without a command-line tool (implying switches and other facilities commonly associated with command-line tools) the utility of the parser is tightly bound to the capabilities of e.g. the programming language it was written in, since the language effectively functions as the interface to the library (you can hardly use a library offered in the form of a C code without a C compiler and/or a dynamic linker). A parser is seldom used in isolation, after all — its output, the parse tree, is normally fed to another component in a larger application. Python is currently ubiquitous and attractive looking at a set of metrics that are relevant here. The collective amount of Python code is currently growing steadily, which drives adoption, which makes the prospect of offering CSS parsing written in specifically Python ever more enticing.
As touched upon in [the disclaimer above](#disclaimer), the parser was written "from the bottom up" - if it ever adopts a top layer exposing its features with a "command line" tool, said layer will invariably have to tap into the rest of it, the library, and so in the very least a library is offered. Without a command-line tool (implying switches and other facilities commonly associated with command-line tools) the utility of the parser is tightly bound to the capabilities of e.g. the programming language it was written in, since the language effectively functions as the interface to the library (you can hardly use a library offered in the form of a C code without a C compiler and/or a dynamic linker). A parser is seldom used in isolation, after all — its output, the parse tree, is normally fed to another component in a larger application. Python is ubiquitous and attractive on a number of metrics relevant to us. The collective amount of Python code is growing steadily, which drives adoption, both becoming factors for choosing to offer CSS parsing written in specifically Python.

Another factor for choosing Python was the fact we couldn't find any _sufficiently capable_ CSS parsing libraries written specifically as [reusable] Python module(s). While there _are_ a few CSS parsing libraries available, none declared compliance with or de-facto support CSS 3 (including features like nested rules etc). In comparison, this library was written in close alignment with CSS 3 standard specification(s) (see [the compliance declaration](#compliance)).

Expand Down
26 changes: 25 additions & 1 deletion expand/csspring/syntax/tokenizing.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
from ..utils import CP, BufferedPeekingReader, is_surrogate_code_point_ordinal, IteratorReader, join, parser_error, PeekingUnreadingReader

from abc import ABC
import builtins
from collections.abc import Callable, Iterable, Iterator
from dataclasses import dataclass
from decimal import Decimal
Expand Down Expand Up @@ -46,7 +47,7 @@ def next(n: int) -> str:
def consume(n: int) -> None:
"""Consume the next code point from the stream.
Consuming removes a [filtered] code point from the stream. If no code points are available for consumption (the stream is "exhausted"), an empty string signifying the so-called EOF ("end of file", see https://drafts.csswg.org/css-syntax/#eof-code-point) value, is consumed instead.
Consuming removes a [filtered] code point from the stream. If no code points are available for consumption (the stream is "exhausted"), an empty string signifying the so-called EOF ("end of file", see http://drafts.csswg.org/css-syntax/#eof-code-point) value, is consumed instead.
"""
nonlocal consumed # required for the `+=` to work for mutable non-locals like lists (despite the fact that the equivalent `extend` does _not_ require the statement)
consumed += input.read(n) or [ FilteredCodePoint('', source='') ]
Expand Down Expand Up @@ -502,3 +503,26 @@ def is_non_printable_code_point(cp: CP) -> bool:
def is_whitespace(cp: CP) -> bool:
"""See http://drafts.csswg.org/css-syntax/#whitespace."""
return is_newline(cp) or cp in ('\t', ' ')

# Map of values by token type, for types of tokens which do _not_ have the `value` attribute
token_values = { # For the `token_value` procedure to work as intended, subtypes should be listed _before_ their supertype(s)
OpenBraceToken: '{',
OpenBracketToken: '[',
OpenParenToken: '(',
CloseBraceToken: '}',
CloseBracketToken: ']',
CloseParenToken: ')',
ColonToken: ':',
CommaToken: ',',
SemicolonToken: ';',
CDCToken: '->',
CDOToken: '!--',
}

def token_value(type: builtins.type[Token]) -> str:
"""Get the value of a token by its type for types of tokens that do _not_ feature a `value` attribute.
:param type: Type of token to get the value of
:returns: The value common to the type of tokens
"""
return next(value for key, value in token_values.items() if issubclass(type, key))
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,6 @@ def run(self, *args, **kwargs) -> None:
subprocess.check_call(('make', '-C', self.build_lib, '-f', os.path.realpath('Makefile')))

class BuildCommand(setuptools.command.build.build):
sub_commands = [ ('build_make', None) ] + setuptools.command.build.build.sub_commands # Makes the `build_make` command a sub-command of the `build_command`, which has the effect of the former being invoked when the latter is invoked (which is invoked in turn when the wheel must be built, through the `bdist_wheel` command)
sub_commands = [ ('build_make', None) ] + setuptools.command.build.build.sub_commands # Makes the `build_make` command a sub-command of the `build` command, which has the effect of the former being invoked when the latter is invoked (which is invoked in turn when the wheel must be built, through the `bdist_wheel` command)

setup(cmdclass={ 'build': BuildCommand, 'build_make': MakeCommand })
Loading

0 comments on commit 7ffb99f

Please sign in to comment.