-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sorting Turtle output? #1890
Sorting Turtle output? #1890
Comments
Would be happy with such an option, and it would be useful with other formats also. My only concern is that our current approach for adding options (i.e. using **kwargs) is a bit adhoc - it would be nice to consider an option that provides validation and type safety somehow. |
Assuming you mean validation of URIs, informed opinion on stackoverflow is that URI validation is best left to a dedicated library so that would suggest the necessity of acquiring a dependency but I guess that can be handled via the pip |
No I mean that provides a way to validate the options independently, maybe something like: plugins.get("json-ld", Serializer).make_opts(...) -> JSONLDOptions and then make it possible to pass serializer options into serialize not entirely sure it is that critical though. |
Ah, so. Yes, that would address the needs of those wishing to exert more control over the JSONLD serializer. |
Interesting. I had thought that typing for keyword arguments would be compatible with a The following program runs fine through Python, print #!/usr/bin/env python3
# This software was developed at the National Institute of Standards
# and Technology by employees of the Federal Government in the course
# of their official duties. Pursuant to title 17 Section 105 of the
# United States Code this software is not subject to copyright
# protection and is in the public domain. NIST assumes no
# responsibility whatsoever for its use by other parties, and makes
# no guarantees, expressed or implied, about its quality,
# reliability, or any other characteristic.
#
# We would appreciate acknowledgement if the software is used.
from typing import Any
def g(x: int, *args: Any, y: str = "", **kwargs: Any) -> None:
if y == "a":
print(x)
def f(x: int, *args: Any, **kwargs: Any) -> None:
g(x, *args, **kwargs)
f(1)
f(2, True)
f(3, True, y="a")
f(4, True, y=True)
f(5, True, y=77)
f(6, True, y=b"c") If another keyword parameter is added to def f(x: int, *args: Any, z: str = "", **kwargs: Any) -> None:
g(x, *args, **kwargs) and adding this last call for f(7, True, y=b"c", z=4444) Summarizing for my original request: if the "front"-most Or, I'm totally missing something else with how the serializer plugin framework handles things. Please do note if that's so, I haven't reviewed that code in depth. |
This patch adds a test to start specifying what sorting Turtle output would look like. This is intended to start discussion about expectations of blank node sorting, and to set an initial interface for triggering sorted output with a propagated keyword argument in `Graph.serialize()`. This patch will fail CI, but should not fail for code-style reasons. The new test script was reviewed with black, flake8, isort, and mypy (--strict). References: * RDFLib#1890 Signed-off-by: Alex Nelson <[email protected]>
Some work I'm doing would benefit from seeing this implemented. So, to resume this conversation, I've filed PR 1978. I have a general question on the API for I've personally found it difficult to understand when arguments in functions are specified as positional with defaults, versus being keyword arguments, when there isn't a In case the motivation of this question is unclear, Patch 2 in PR 1978 tries adding The PR also leaves open some question of what would be expected of some sort-order matters. Blank node sorting seems to be a tricky topic that may involve recursion to solve. Prefixes that sort differently from expanded IRIs isn't in the unit test, but will also be worth clarifying with the hard-coded graph serialization. Feedback is welcome. |
FYI you might be interested in https://github.com/tgbugs/pyontutils/tree/master/ttlser and https://github.com/tgbugs/pyontutils/blob/master/ttlser/docs/ttlser.md. |
@tgbugs Thank you for the reference! I see it is in a many-utilities repository. Is your vision that that code be something you continue to maintain in your code base, or instead to be something that integrates into, or overrides, the associated PR 1978? I'm fine with not reliving the whole "Hack, hack, oh we need a spec." experience, and I knew at least blank node handling was going to need some degree of style spec. |
@ajnelson-nist The spec is stable, and there are a bunch of tests that I have written to ensure sane behavior. I intend to upstream as much of the code in ttlser as possible. The implementation that I have is almost completely configurable using I can also say that there are many hidden gotchas that any new implementation of such functionality is likely to re-encounter. |
This patch aligns the type signatures on `Serializer` subclasses, including renaming the arbitrary-keywords dictionary to always be `**kwargs`. This is in part to prepare for the possibility of adding `*args` as a positional-argument delimiter. References: * RDFLib#1890 (comment) Signed-off-by: Alex Nelson <[email protected]>
This patch aligns the type signatures on `Serializer` subclasses, including renaming the arbitrary-keywords dictionary to always be `**kwargs`. This is in part to prepare for the possibility of adding `*args` as a positional-argument delimiter. References: * #1890 (comment) Signed-off-by: Alex Nelson <[email protected]>
This patch aligns the type signatures on `Serializer` subclasses, including renaming the arbitrary-keywords dictionary to always be `**kwargs`. This is in part to prepare for the possibility of adding `*args` as a positional-argument delimiter. References: * #1890 (comment) Signed-off-by: Alex Nelson <[email protected]>
This patch aligns the type signatures on `Serializer` subclasses, including renaming the arbitrary-keywords dictionary to always be `**kwargs`. This is in part to prepare for the possibility of adding `*args` as a positional-argument delimiter. References: * #1890 (comment) Signed-off-by: Alex Nelson <[email protected]>
* 7.1.1 post release (#2953) * Fix Black formatting in ./admin/get_merged_prs.py (#2954) * build(deps-dev): bump ruff from 0.7.0 to 0.7.1 (#2955) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.7.0 to 0.7.1. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.7.0...0.7.1) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ashley Sommer <[email protected]> * Fix defined namespace warnings (#2964) * Fix defined namespace warnings Current docs-generation tests are polluted by lots of warnings that occur when Sphinx tries to read various parts of DefinedNamespace. * Fix tests that no longer need incorrect exceptions handled. * fix black formatting in test file * Undo typing changes, so this works on current pre-3.9 branch * better handling for any/all double-underscore properties * Don't include __slots__ in dir(). * test: earl test passing * Annotate Serializer.serialize and descendants (#2970) This patch aligns the type signatures on `Serializer` subclasses, including renaming the arbitrary-keywords dictionary to always be `**kwargs`. This is in part to prepare for the possibility of adding `*args` as a positional-argument delimiter. References: * #1890 (comment) Signed-off-by: Alex Nelson <[email protected]> * build(deps): bump orjson from 3.10.10 to 3.10.11 (#2966) Bumps [orjson](https://github.com/ijl/orjson) from 3.10.10 to 3.10.11. - [Release notes](https://github.com/ijl/orjson/releases) - [Changelog](https://github.com/ijl/orjson/blob/master/CHANGELOG.md) - [Commits](ijl/orjson@3.10.10...3.10.11) --- updated-dependencies: - dependency-name: orjson dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump ruff from 0.7.1 to 0.7.2 (#2969) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.7.1 to 0.7.2. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.7.1...0.7.2) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump ruff from 0.7.2 to 0.7.3 (#2979) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.7.2 to 0.7.3. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.7.2...0.7.3) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump ruff from 0.7.3 to 0.8.0 (#2994) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.7.3 to 0.8.0. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.7.3...0.8.0) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump orjson from 3.10.11 to 3.10.12 (#2991) Bumps [orjson](https://github.com/ijl/orjson) from 3.10.11 to 3.10.12. - [Release notes](https://github.com/ijl/orjson/releases) - [Changelog](https://github.com/ijl/orjson/blob/master/CHANGELOG.md) - [Commits](ijl/orjson@3.10.11...3.10.12) --- updated-dependencies: - dependency-name: orjson dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * added Node as an exported name from the root package location. Updated linting commands section in the developer section to use ruff check. (#2981) * build(deps-dev): bump wheel from 0.45.0 to 0.45.1 (#2992) Bumps [wheel](https://github.com/pypa/wheel) from 0.45.0 to 0.45.1. - [Release notes](https://github.com/pypa/wheel/releases) - [Changelog](https://github.com/pypa/wheel/blob/main/docs/news.rst) - [Commits](pypa/wheel@0.45.0...0.45.1) --- updated-dependencies: - dependency-name: wheel dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Nicholas Car <[email protected]> * feat: sort longturtle blank nodes (#2997) * feat: sort longturtle blank nodes in the object position by their cbd string * fix: #2767 * build(deps-dev): bump pytest from 8.3.3 to 8.3.4 (#2999) Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.3.3 to 8.3.4. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](pytest-dev/pytest@8.3.3...8.3.4) --- updated-dependencies: - dependency-name: pytest dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump poetry from 1.8.4 to 1.8.5 (#3001) Bumps [poetry](https://github.com/python-poetry/poetry) from 1.8.4 to 1.8.5. - [Release notes](https://github.com/python-poetry/poetry/releases) - [Changelog](https://github.com/python-poetry/poetry/blob/1.8.5/CHANGELOG.md) - [Commits](python-poetry/poetry@1.8.4...1.8.5) --- updated-dependencies: - dependency-name: poetry dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump ruff from 0.8.0 to 0.8.2 (#3003) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.8.0 to 0.8.2. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.8.0...0.8.2) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump ruff from 0.8.2 to 0.8.3 (#3010) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.8.2 to 0.8.3. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.8.2...0.8.3) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump berkeleydb from 18.1.11 to 18.1.12 (#3009) Bumps [berkeleydb](https://www.jcea.es/programacion/pybsddb.htm) from 18.1.11 to 18.1.12. --- updated-dependencies: - dependency-name: berkeleydb dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> # Conflicts: # poetry.lock * build(deps): bump orjson from 3.10.12 to 3.10.13 (#3018) Bumps [orjson](https://github.com/ijl/orjson) from 3.10.12 to 3.10.13. - [Release notes](https://github.com/ijl/orjson/releases) - [Changelog](https://github.com/ijl/orjson/blob/master/CHANGELOG.md) - [Commits](ijl/orjson@3.10.12...3.10.13) --- updated-dependencies: - dependency-name: orjson dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump ruff from 0.8.4 to 0.8.6 (#3025) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.8.4 to 0.8.6. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.8.4...0.8.6) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * feat: deterministic longturtle serialisation using RDF canonicalization + n-triples sort (#3008) * feat: use the RGDA1 canonicalization algorithm + lexical n-triples sort to produce deterministic longturtle serialisation * chore: normalise usage of format * chore: apply black * fix: double up of semicolons when subject is a blank node * fix: lint * jsonld: Do not merge nodes with different invalid URIs (#3011) When parsing JSON-LD with invalid URIs in the `@id`, the `generalized_rdf: True` option allows parsing these nodes as blank nodes instead of outright rejecting the document. However, all nodes with invalid URIs were mapped to the same blank node, resulting in incorrect data. For example, without this patch, the new test fails with: ``` AssertionError: Expected: @Prefix schema: <https://schema.org/> . <https://example.org/root-object> schema:author [ schema:familyName "Doe" ; schema:givenName "Jane" ; schema:name "Jane Doe" ], [ schema:familyName "Doe" ; schema:givenName "John" ; schema:name "John Doe" ] . Got: @Prefix schema: <https://schema.org/> . <https://example.org/root-object> schema:author <> . <> schema:familyName "Doe" ; schema:givenName "Jane", "John" ; schema:name "Jane Doe", "John Doe" . ``` * Fixed incorrect ASK behaviour for dataset with one element (#2989) * Pass base uri to serializer when writing to file. (#2977) Co-authored-by: Nicholas Car <[email protected]> * Dataset documentation improvements (#3012) * example printout improvements * added BN graph creation * updated tests var names & added one subtest * typos & improved formatting * updated Graph & Dataset docco * typo fix * fix code-in-comment syntax * fix code-in-comment syntax 2 * fix code-in-comment syntax - ellipses * fix code-in-comment syntax - sort print loop output * blacked * ruff fixes * Poetry 2.0.0 pyproject.toml file * move to PEP621 (Poetry 2.0.0) pyproject.toml * require poetry 2.0.0 * require poetry 2.0.0 * add in requirement for poetry-plugin-export * change from --sync to sync command * further pyproject.toml format updates * add poetry plugin to requirements-poetry.in * fix pre-commit poetry version to 2.0.0 * remove testing artifact * update license to 2025 * add me to contributors * remove outdated --check arg * typo * test add back in precommit args * test remove precommit args * match ruff version to pre-commit autoupdate PR #3026; add back in --check * re-remove --check * add David to CONTRIBUTORS * ruff in pyproject.toml to match pre-commit * updates for David's comments * fix Dataset docc ReST formatting * remove ConjunctiveGraph example; add Dataset example; add JSON-LS serialization example * Add RDFLib Path to SHACL path utility and corresponding tests (#2990) * shacl path parser: Add additional test case * shacl utilities: Add new SHACL path building utility with corresponding tests --------- Co-authored-by: Nicholas Car <[email protected]> # Conflicts: # rdflib/extras/shacl.py * fix: typing and import issues * fix: line length as int * fix: ruff version conflict * fix: berkeleydb pin to 18.1.10 for python 3.8 compatibility * 3a not 2a --------- Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Alex Nelson <[email protected]> Co-authored-by: Nicholas Car <[email protected]> Co-authored-by: Ashley Sommer <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alex Nelson <[email protected]> Co-authored-by: joecrowleygaia <[email protected]> Co-authored-by: Val Lorentz <[email protected]> Co-authored-by: jcbiddle <[email protected]> Co-authored-by: Sander Van Dooren <[email protected]> Co-authored-by: Nicholas Car <[email protected]> Co-authored-by: Matt Goldberg <[email protected]>
Hi all, probably esp. @nicholascar,
PR 1425 added the
ttl2
serializer, which since becamelongturtle
. One of the goals was to reduce Git difference noise.I unfortunately find that most of the noise in Git diffs comes from Turtle output not being sorted. This is an effect both on nodes with IRIs and with blank nodes that (if I'm guessing right) may be sorting based on their skolemized IDs.
Could an option to either of the Turtle serializers be added to sort the output, for those willing to pay the
.serialize
time cost? I personally like Git-tracking demonstration results, so in many of my uses I'm willing to pay the time cost of sorting graphs.The text was updated successfully, but these errors were encountered: