-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: proposed Dataset API changes #3060
base: 8.x
Are you sure you want to change the base?
feat: proposed Dataset API changes #3060
Conversation
* 7.1.1 post release (RDFLib#2953) * Fix Black formatting in ./admin/get_merged_prs.py (RDFLib#2954) * build(deps-dev): bump ruff from 0.7.0 to 0.7.1 (RDFLib#2955) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.7.0 to 0.7.1. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.7.0...0.7.1) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ashley Sommer <[email protected]> * Fix defined namespace warnings (RDFLib#2964) * Fix defined namespace warnings Current docs-generation tests are polluted by lots of warnings that occur when Sphinx tries to read various parts of DefinedNamespace. * Fix tests that no longer need incorrect exceptions handled. * fix black formatting in test file * Undo typing changes, so this works on current pre-3.9 branch * better handling for any/all double-underscore properties * Don't include __slots__ in dir(). * test: earl test passing * Annotate Serializer.serialize and descendants (RDFLib#2970) This patch aligns the type signatures on `Serializer` subclasses, including renaming the arbitrary-keywords dictionary to always be `**kwargs`. This is in part to prepare for the possibility of adding `*args` as a positional-argument delimiter. References: * RDFLib#1890 (comment) Signed-off-by: Alex Nelson <[email protected]> * build(deps): bump orjson from 3.10.10 to 3.10.11 (RDFLib#2966) Bumps [orjson](https://github.com/ijl/orjson) from 3.10.10 to 3.10.11. - [Release notes](https://github.com/ijl/orjson/releases) - [Changelog](https://github.com/ijl/orjson/blob/master/CHANGELOG.md) - [Commits](ijl/orjson@3.10.10...3.10.11) --- updated-dependencies: - dependency-name: orjson dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump ruff from 0.7.1 to 0.7.2 (RDFLib#2969) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.7.1 to 0.7.2. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.7.1...0.7.2) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump ruff from 0.7.2 to 0.7.3 (RDFLib#2979) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.7.2 to 0.7.3. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.7.2...0.7.3) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump ruff from 0.7.3 to 0.8.0 (RDFLib#2994) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.7.3 to 0.8.0. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.7.3...0.8.0) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump orjson from 3.10.11 to 3.10.12 (RDFLib#2991) Bumps [orjson](https://github.com/ijl/orjson) from 3.10.11 to 3.10.12. - [Release notes](https://github.com/ijl/orjson/releases) - [Changelog](https://github.com/ijl/orjson/blob/master/CHANGELOG.md) - [Commits](ijl/orjson@3.10.11...3.10.12) --- updated-dependencies: - dependency-name: orjson dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * added Node as an exported name from the root package location. Updated linting commands section in the developer section to use ruff check. (RDFLib#2981) * build(deps-dev): bump wheel from 0.45.0 to 0.45.1 (RDFLib#2992) Bumps [wheel](https://github.com/pypa/wheel) from 0.45.0 to 0.45.1. - [Release notes](https://github.com/pypa/wheel/releases) - [Changelog](https://github.com/pypa/wheel/blob/main/docs/news.rst) - [Commits](pypa/wheel@0.45.0...0.45.1) --- updated-dependencies: - dependency-name: wheel dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Nicholas Car <[email protected]> * feat: sort longturtle blank nodes (RDFLib#2997) * feat: sort longturtle blank nodes in the object position by their cbd string * fix: RDFLib#2767 * build(deps-dev): bump pytest from 8.3.3 to 8.3.4 (RDFLib#2999) Bumps [pytest](https://github.com/pytest-dev/pytest) from 8.3.3 to 8.3.4. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](pytest-dev/pytest@8.3.3...8.3.4) --- updated-dependencies: - dependency-name: pytest dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump poetry from 1.8.4 to 1.8.5 (RDFLib#3001) Bumps [poetry](https://github.com/python-poetry/poetry) from 1.8.4 to 1.8.5. - [Release notes](https://github.com/python-poetry/poetry/releases) - [Changelog](https://github.com/python-poetry/poetry/blob/1.8.5/CHANGELOG.md) - [Commits](python-poetry/poetry@1.8.4...1.8.5) --- updated-dependencies: - dependency-name: poetry dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump ruff from 0.8.0 to 0.8.2 (RDFLib#3003) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.8.0 to 0.8.2. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.8.0...0.8.2) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump ruff from 0.8.2 to 0.8.3 (RDFLib#3010) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.8.2 to 0.8.3. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.8.2...0.8.3) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps): bump berkeleydb from 18.1.11 to 18.1.12 (RDFLib#3009) Bumps [berkeleydb](https://www.jcea.es/programacion/pybsddb.htm) from 18.1.11 to 18.1.12. --- updated-dependencies: - dependency-name: berkeleydb dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> # Conflicts: # poetry.lock * build(deps): bump orjson from 3.10.12 to 3.10.13 (RDFLib#3018) Bumps [orjson](https://github.com/ijl/orjson) from 3.10.12 to 3.10.13. - [Release notes](https://github.com/ijl/orjson/releases) - [Changelog](https://github.com/ijl/orjson/blob/master/CHANGELOG.md) - [Commits](ijl/orjson@3.10.12...3.10.13) --- updated-dependencies: - dependency-name: orjson dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * build(deps-dev): bump ruff from 0.8.4 to 0.8.6 (RDFLib#3025) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.8.4 to 0.8.6. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.8.4...0.8.6) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * feat: deterministic longturtle serialisation using RDF canonicalization + n-triples sort (RDFLib#3008) * feat: use the RGDA1 canonicalization algorithm + lexical n-triples sort to produce deterministic longturtle serialisation * chore: normalise usage of format * chore: apply black * fix: double up of semicolons when subject is a blank node * fix: lint * jsonld: Do not merge nodes with different invalid URIs (RDFLib#3011) When parsing JSON-LD with invalid URIs in the `@id`, the `generalized_rdf: True` option allows parsing these nodes as blank nodes instead of outright rejecting the document. However, all nodes with invalid URIs were mapped to the same blank node, resulting in incorrect data. For example, without this patch, the new test fails with: ``` AssertionError: Expected: @Prefix schema: <https://schema.org/> . <https://example.org/root-object> schema:author [ schema:familyName "Doe" ; schema:givenName "Jane" ; schema:name "Jane Doe" ], [ schema:familyName "Doe" ; schema:givenName "John" ; schema:name "John Doe" ] . Got: @Prefix schema: <https://schema.org/> . <https://example.org/root-object> schema:author <> . <> schema:familyName "Doe" ; schema:givenName "Jane", "John" ; schema:name "Jane Doe", "John Doe" . ``` * Fixed incorrect ASK behaviour for dataset with one element (RDFLib#2989) * Pass base uri to serializer when writing to file. (RDFLib#2977) Co-authored-by: Nicholas Car <[email protected]> * Dataset documentation improvements (RDFLib#3012) * example printout improvements * added BN graph creation * updated tests var names & added one subtest * typos & improved formatting * updated Graph & Dataset docco * typo fix * fix code-in-comment syntax * fix code-in-comment syntax 2 * fix code-in-comment syntax - ellipses * fix code-in-comment syntax - sort print loop output * blacked * ruff fixes * Poetry 2.0.0 pyproject.toml file * move to PEP621 (Poetry 2.0.0) pyproject.toml * require poetry 2.0.0 * require poetry 2.0.0 * add in requirement for poetry-plugin-export * change from --sync to sync command * further pyproject.toml format updates * add poetry plugin to requirements-poetry.in * fix pre-commit poetry version to 2.0.0 * remove testing artifact * update license to 2025 * add me to contributors * remove outdated --check arg * typo * test add back in precommit args * test remove precommit args * match ruff version to pre-commit autoupdate PR RDFLib#3026; add back in --check * re-remove --check * add David to CONTRIBUTORS * ruff in pyproject.toml to match pre-commit * updates for David's comments * fix Dataset docc ReST formatting * remove ConjunctiveGraph example; add Dataset example; add JSON-LS serialization example * Add RDFLib Path to SHACL path utility and corresponding tests (RDFLib#2990) * shacl path parser: Add additional test case * shacl utilities: Add new SHACL path building utility with corresponding tests --------- Co-authored-by: Nicholas Car <[email protected]> # Conflicts: # rdflib/extras/shacl.py * fix: typing and import issues * fix: line length as int * fix: ruff version conflict * fix: berkeleydb pin to 18.1.10 for python 3.8 compatibility * 3a not 2a --------- Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Alex Nelson <[email protected]> Co-authored-by: Nicholas Car <[email protected]> Co-authored-by: Ashley Sommer <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Alex Nelson <[email protected]> Co-authored-by: joecrowleygaia <[email protected]> Co-authored-by: Val Lorentz <[email protected]> Co-authored-by: jcbiddle <[email protected]> Co-authored-by: Sander Van Dooren <[email protected]> Co-authored-by: Nicholas Car <[email protected]> Co-authored-by: Matt Goldberg <[email protected]>
Bumps [orjson](https://github.com/ijl/orjson) from 3.10.13 to 3.10.15. - [Release notes](https://github.com/ijl/orjson/releases) - [Changelog](https://github.com/ijl/orjson/blob/master/CHANGELOG.md) - [Commits](ijl/orjson@3.10.13...3.10.15) --- updated-dependencies: - dependency-name: orjson dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.8.6 to 0.9.2. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md) - [Commits](astral-sh/ruff@0.8.6...0.9.2) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Minimal updates to get function signatures conformant with proposal Update examples/datasets.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for pushing this!
I allowed myself a few API nits but I have not read the code carefully, please ignore them if they are not relevant
Re:
Does it make sense to apply removal of I'm asking without having reviewed all of the proposed typing here yet. |
# Iterate on triples in the Default Graph only | ||
# ============================================ | ||
|
||
for triple in d.triples(graph="default"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I question the usefulness of this. Why not simply:
d.default_graph.triples()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Providing "default_graph" as a convenience necessarily means there will be more than one way to iterate over the triples. There's no functional change from the current classes here, just name changes, you can already Dataset.triples(context=) and you can also Dataset.default_context.triples()
# Access quads in Named Graphs only | ||
# ============================================ | ||
|
||
for quad in d.quads(graph="named"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be equivalent to simply d.quads()
? Since the default graph does not produce quads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is the graph element of the default graph None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the proposal is to have the "graph" of triples in the default graph set to None.
My general comment is that I don't like the new arguments of If filtering needs to be done, it can be achieved with a simple |
# Access quads in the Default Graph and specified Named Graphs. | ||
# ============================================ | ||
|
||
for quad in d.quads(graph=["default", URIRef("http://example.com/graph-B")]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for quad in (q for q in d.quads() if q[3] in (None, URIRef("http://example.com/graph-B"))):
not much longer really.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think this is the point to get a broader consensus on. The way I see it, if including the graph parameter:
Pros:
- can restrict "named", "default" enums to only be used in the graph= attribute, and not in the quads methods.
- can separate concerns a bit better, similar to how dataset clauses are used in SPARQL. E.g. set up an instance with graph= to restrict the scope to certain named graphs, then at runtime graphs can be passed in using quads
- provides a convenience/clean interface for what is a common pattern (for me at least!)
Cons:
- two ways to do the same thing, as you've pointed out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm too used to Jena where Dataset
is used via getDefaultModel
and getNamedModel
, but I don't really see myself needing the new parameters 🤷♂️
# Access triples in the Default Graph and specified Named Graphs. | ||
# ============================================ | ||
|
||
for triple in d.triples(graph=["default", URIRef("http://example.com/graph-B")]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d.triples()
doesn't really make sense? There should be Graph.triples()
and Dataset.quads()
only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm comfortable with it - SPARQL queries in triplestores where named graphs are used frequently omit the graph, only having basic graph patterns, and we understand this to be across all graphs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Union graph is an extension feature though, not a feature of an RDF dataset.
One more question arised while I commenting: what is the graph component for the default graph in |
I wouldn't think so - as ConjunctiveGraph is used in a few different places and there's an inheritance hierarchy that removing it would break, so it's not a simple change. The JSON-LD serializer could potentially switch to Dataset while leaving ConjunctiveGraph in though, I haven't looked at it. |
Good question. I would suggest None is an expected result/output from methods and can means that results are from the default graph, but that we are not in fact making them equivalent. When providing input to functions, None != default graph, and a user should not use it to mean this. The following methods can be used to unambiguously refer to the default graph: ds.triples(graph="default") |
This draft PR is intended promote discussion/reach a consensus on the proposed changes to the Dataset API, by concretely describing what the interfaces would look like. As such, at this point in time, it is not intended that all of the required changes to dependencies/tests are implemented.
The following discussion gives context:
#2591
Additionally:
Summary of changes: