-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: sort longturtle blank nodes #2997
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@ajnelson-nist would be good to see if you can test this with some of the data you work with and confirm that this PR fixes the blank node sorting issues in your workflows. |
nicholascar
approved these changes
Nov 29, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome stuff @edmondchuc!
Merged
8 tasks
edmondchuc
added a commit
that referenced
this pull request
Jan 15, 2025
* feat: sort longturtle blank nodes in the object position by their cbd string * fix: #2767
edmondchuc
added a commit
that referenced
this pull request
Jan 15, 2025
* feat: sort longturtle blank nodes in the object position by their cbd string * fix: #2767
edmondchuc
added a commit
that referenced
this pull request
Jan 15, 2025
* feat: sort longturtle blank nodes in the object position by their cbd string * fix: #2767
edmondchuc
added a commit
that referenced
this pull request
Jan 16, 2025
* feat: sort longturtle blank nodes in the object position by their cbd string * fix: #2767
edmondchuc
added a commit
that referenced
this pull request
Jan 16, 2025
* feat: sort longturtle blank nodes in the object position by their cbd string * fix: #2767
nicholascar
added a commit
that referenced
this pull request
Jan 16, 2025
* 7.1.1 post release (#2953) * Fix Black formatting in ./admin/get_merged_prs.py (#2954) * build(deps-dev): bump ruff from 0.7.0 to 0.7.1 (#2955) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.7.0 to 0.7.1. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.7.0...0.7.1) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ashley Sommer <[email protected]> * Fix defined namespace warnings (#2964) * Fix defined namespace warnings Current docs-generation tests are polluted by lots of warnings that occur when Sphinx tries to read various parts of DefinedNamespace. * Fix tests that no longer need incorrect exceptions handled. * fix black formatting in test file * Undo typing changes, so this works on current pre-3.9 branch * better handling for any/all double-underscore properties * Don't include __slots__ in dir(). * test: earl test passing * Annotate Serializer.serialize and descendants (#2970) This patch aligns the type signatures on `Serializer` subclasses, including renaming the arbitrary-keywords dictionary to always be `**kwargs`. This is in part to prepare for the possibility of adding `*args` as a positional-argument delimiter. References: * #1890 (comment) Signed-off-by: Alex Nelson <[email protected]> * build(deps): bump orjson from 3.10.10 to 3.10.11 (#2966) Bumps [orjson](https://github.com/ijl/orjson) from 3.10.10 to 3.10.11. - [Release notes](https://github.com/ijl/orjson/releases) - [Changelog](https://github.com/ijl/orjson/blob/master/CHANGELOG.md) - [Commits](ijl/orjson@3.10.10...3.10.11) --- updated-dependencies: - dependency-name: orjson dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump ruff from 0.7.1 to 0.7.2 (#2969) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.7.1 to 0.7.2. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.7.1...0.7.2) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump ruff from 0.7.2 to 0.7.3 (#2979) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.7.2 to 0.7.3. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.7.2...0.7.3) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump ruff from 0.7.3 to 0.8.0 (#2994) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.7.3 to 0.8.0. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.7.3...0.8.0) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump orjson from 3.10.11 to 3.10.12 (#2991) Bumps [orjson](https://github.com/ijl/orjson) from 3.10.11 to 3.10.12. - [Release notes](https://github.com/ijl/orjson/releases) - [Changelog](https://github.com/ijl/orjson/blob/master/CHANGELOG.md) - [Commits](ijl/orjson@3.10.11...3.10.12) --- updated-dependencies: - dependency-name: orjson dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * added Node as an exported name from the root package location. Updated linting commands section in the developer section to use ruff check. (#2981) * build(deps-dev): bump wheel from 0.45.0 to 0.45.1 (#2992) Bumps [wheel](https://github.com/pypa/wheel) from 0.45.0 to 0.45.1. - [Release notes](https://github.com/pypa/wheel/releases) - [Changelog](https://github.com/pypa/wheel/blob/main/docs/news.rst) - [Commits](pypa/wheel@0.45.0...0.45.1) --- updated-dependencies: - dependency-name: wheel dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Nicholas Car <[email protected]> * feat: sort longturtle blank nodes (#2997) * feat: sort longturtle blank nodes in the object position by their cbd string * fix: #2767 * build(deps-dev): bump pytest from 8.3.3 to 8.3.4 (#2999) Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.3.3 to 8.3.4. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](pytest-dev/pytest@8.3.3...8.3.4) --- updated-dependencies: - dependency-name: pytest dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump poetry from 1.8.4 to 1.8.5 (#3001) Bumps [poetry](https://github.com/python-poetry/poetry) from 1.8.4 to 1.8.5. - [Release notes](https://github.com/python-poetry/poetry/releases) - [Changelog](https://github.com/python-poetry/poetry/blob/1.8.5/CHANGELOG.md) - [Commits](python-poetry/poetry@1.8.4...1.8.5) --- updated-dependencies: - dependency-name: poetry dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump ruff from 0.8.0 to 0.8.2 (#3003) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.8.0 to 0.8.2. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.8.0...0.8.2) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump ruff from 0.8.2 to 0.8.3 (#3010) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.8.2 to 0.8.3. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.8.2...0.8.3) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump berkeleydb from 18.1.11 to 18.1.12 (#3009) Bumps [berkeleydb](https://www.jcea.es/programacion/pybsddb.htm) from 18.1.11 to 18.1.12. --- updated-dependencies: - dependency-name: berkeleydb dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> # Conflicts: # poetry.lock * build(deps): bump orjson from 3.10.12 to 3.10.13 (#3018) Bumps [orjson](https://github.com/ijl/orjson) from 3.10.12 to 3.10.13. - [Release notes](https://github.com/ijl/orjson/releases) - [Changelog](https://github.com/ijl/orjson/blob/master/CHANGELOG.md) - [Commits](ijl/orjson@3.10.12...3.10.13) --- updated-dependencies: - dependency-name: orjson dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump ruff from 0.8.4 to 0.8.6 (#3025) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.8.4 to 0.8.6. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.8.4...0.8.6) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * feat: deterministic longturtle serialisation using RDF canonicalization + n-triples sort (#3008) * feat: use the RGDA1 canonicalization algorithm + lexical n-triples sort to produce deterministic longturtle serialisation * chore: normalise usage of format * chore: apply black * fix: double up of semicolons when subject is a blank node * fix: lint * jsonld: Do not merge nodes with different invalid URIs (#3011) When parsing JSON-LD with invalid URIs in the `@id`, the `generalized_rdf: True` option allows parsing these nodes as blank nodes instead of outright rejecting the document. However, all nodes with invalid URIs were mapped to the same blank node, resulting in incorrect data. For example, without this patch, the new test fails with: ``` AssertionError: Expected: @Prefix schema: <https://schema.org/> . <https://example.org/root-object> schema:author [ schema:familyName "Doe" ; schema:givenName "Jane" ; schema:name "Jane Doe" ], [ schema:familyName "Doe" ; schema:givenName "John" ; schema:name "John Doe" ] . Got: @Prefix schema: <https://schema.org/> . <https://example.org/root-object> schema:author <> . <> schema:familyName "Doe" ; schema:givenName "Jane", "John" ; schema:name "Jane Doe", "John Doe" . ``` * Fixed incorrect ASK behaviour for dataset with one element (#2989) * Pass base uri to serializer when writing to file. (#2977) Co-authored-by: Nicholas Car <[email protected]> * Dataset documentation improvements (#3012) * example printout improvements * added BN graph creation * updated tests var names & added one subtest * typos & improved formatting * updated Graph & Dataset docco * typo fix * fix code-in-comment syntax * fix code-in-comment syntax 2 * fix code-in-comment syntax - ellipses * fix code-in-comment syntax - sort print loop output * blacked * ruff fixes * Poetry 2.0.0 pyproject.toml file * move to PEP621 (Poetry 2.0.0) pyproject.toml * require poetry 2.0.0 * require poetry 2.0.0 * add in requirement for poetry-plugin-export * change from --sync to sync command * further pyproject.toml format updates * add poetry plugin to requirements-poetry.in * fix pre-commit poetry version to 2.0.0 * remove testing artifact * update license to 2025 * add me to contributors * remove outdated --check arg * typo * test add back in precommit args * test remove precommit args * match ruff version to pre-commit autoupdate PR #3026; add back in --check * re-remove --check * add David to CONTRIBUTORS * ruff in pyproject.toml to match pre-commit * updates for David's comments * fix Dataset docc ReST formatting * remove ConjunctiveGraph example; add Dataset example; add JSON-LS serialization example * Add RDFLib Path to SHACL path utility and corresponding tests (#2990) * shacl path parser: Add additional test case * shacl utilities: Add new SHACL path building utility with corresponding tests --------- Co-authored-by: Nicholas Car <[email protected]> # Conflicts: # rdflib/extras/shacl.py * fix: typing and import issues * fix: line length as int * fix: ruff version conflict * fix: berkeleydb pin to 18.1.10 for python 3.8 compatibility * 3a not 2a --------- Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Alex Nelson <[email protected]> Co-authored-by: Nicholas Car <[email protected]> Co-authored-by: Ashley Sommer <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alex Nelson <[email protected]> Co-authored-by: joecrowleygaia <[email protected]> Co-authored-by: Val Lorentz <[email protected]> Co-authored-by: jcbiddle <[email protected]> Co-authored-by: Sander Van Dooren <[email protected]> Co-authored-by: Nicholas Car <[email protected]> Co-authored-by: Matt Goldberg <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary of changes
Fixes #1890 - Sorting Turtle output?
This change improves git diffing Turtle data serialized with the
longturtle
serializer. Previously, blank nodes were not sorted, and round-trips using RDFLib'sturtle
parser andlongturtle
serializer would have blank node objects flip flop around, making it difficult to read real changes using git diff.This PR fixes the above by implementing a sort on values where triples in the object position are blank nodes. The blank nodes are sorted by grabbing their concise-bounded description graph and sorting it as a string in their
longturtle
serialization.This adds an additional cost to the
longturtle
serializer, but I think the cost is worth it if we are after a deterministic output with blank nodes.Note that this depends on RDF data parsed using the
turtle
parser and its behaviour of how it assigns blank nodes. Exact serialization with data added via the graph object cannot be guaranteed. This would require implementing RDF Canonicalization to guarantee the same blank node identifiers in the graph.Once RDF Canonicalization is implemented, sorting by the blank node identifier directly will be enough to guarantee deterministic serialization, and the expensive CBD sorting can be removed.
Update: looks like top-level blank nodes with no inbound relationships don't get sorted. I think this makes sense since we're only applying the sort to blank nodes in the object position. If we cared about sorting blank nodes in the subject position, we can apply the same kind of sorting based on the text serialization of the CBD onto the subject blank nodes.
rdflib/rdflib/plugins/serializers/longturtle.py
Lines 106 to 112 in 08dd4b7
Fixes #2767 - Bug in
longturtle
serializationThis PR also fixes the missing trailing whitespace in the special case described by @mschiedon.
This fix was previously fixed in PR #2700 but was inadvertently reverted in PR #2731 when I was trying to fix the ruff linting rule in the test case. To get around the ruff linting rule, I've now moved the target result of the test case into a text file.
Checklist
the same change.
./examples
.so maintainers can fix minor issues and keep your PR up to date.