Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate from PyLD to RDFLib #2944

Open
ff137 opened this issue May 10, 2024 · 8 comments
Open

Migrate from PyLD to RDFLib #2944

ff137 opened this issue May 10, 2024 · 8 comments

Comments

@ff137
Copy link
Contributor

ff137 commented May 10, 2024

I'd like to propose that our usage of the PyLD dependency be swapped out for the more versatile and better maintained RDFLib.

Let me bullet point some reasons for this suggestion:

  • the current state of the PyLD project is not clear to me. The most recent comment on their most recent PR basically says "let's merge this now", and that was 3 months ago.
  • v2.0.3 was released in 2020. And then v2.0.4 was released in Feb this year.
  • the repo has no automated testing. When I tested it on my local, I get ~20 test failures, so as far as I can tell it's in a broken state.
  • their jsonld.py file is 6700 lines, which was "ported from the JavaScript implementation of JSON-LD"
  • their requirements.txt has unpinned dependencies

Long story short, the impression I get is that PyLD is not maintained or maintainable.

RDFLib, on the other hand, seems to be the gold standard for Resource Description Framework (RDF) processing in python. They offer:

  • parsers and serializers for RDF/XML, N3, NTriples, N-Quads, Turtle, TriX, Trig and JSON-LD

This repo has much more adoption and activity, and so I think it would be better to migrate to this instead. This is just an idea and open for discussion.

PyLD is currently used in quite a few places in ACA-Py, and probably represent quite a refactoring job to replace it. But I think this may be a worthwhile operation, to reduce technical debt and improve the functionality of a core feature.

@ff137
Copy link
Contributor Author

ff137 commented May 10, 2024

Relates to #2941

@PatStLouis
Copy link
Contributor

PyLd is a library from the authors of the json-ld specification, Digital Bazaar. It aims to be compliant with the specification by conformity with the json-ld processor test-suite. We updated v2.0.4 since we uncovered a bug in the library and submitted a PR. Going forward there are other bugs in the library which creates interoperability issues with aca-py at the moment for some specific cases.

I'm not opposed to changing the library but I think more tests are needed to see if its a suitable replacement.

Pyld can use an asynchronous document loader as well

@ff137
Copy link
Contributor Author

ff137 commented Jun 4, 2024

@PatStLouis thanks! The synchronous requests document loader was the main reason I thought a replacement is necessary. Because, in ACA-Py, there's a JsonLdDocumentDownloader, which uses the requests framework for synchronous downloads, which is then sort of hacked into an async method in the DocumentLoader class. This leads to nest_asyncio been necessary, to support nested event loops, which all feels like a very smelly way of doing things. That's what got me looking for alternatives.

If an asynchronous document loader can be used instead, that may be a sufficient improvement. So I'll keep this in mind. But yes, it may be that RDFLib is ultimately the way to go, given it has more activity and wider support

@PatStLouis
Copy link
Contributor

@ff137 is the async pyld document loader sufficient for your use case?

@ff137
Copy link
Contributor Author

ff137 commented Jun 19, 2024

Hi @PatStLouis, I'll need some time to investigate. Will have to test and confirm if the existing synchronous logic can be replaced. I still think migrating to RDFLib is worthwhile overall, although it's probably out of scope for the 1.0.0 release. Either way, over the next couple weeks I'll see if I can put something together to test the async doc loader 👌

@PatStLouis
Copy link
Contributor

pyld is core to a lot of the issuance/verification processes, something to keep in mind. Can you point me into the code where pyld is causing issues?

@PatStLouis
Copy link
Contributor

I think trying to change from request to aiohttp for the document loader is a good place to start and aligns with the rest of the code base

@ff137
Copy link
Contributor Author

ff137 commented Nov 6, 2024

Just to follow up, now that it came up from elsewhere: I tried replacing request with aiohttp, in that JsonLD document loader, but I encountered one big knot of mixing async and sync functions, and didn't have patience to persist through with it at the time. I'll see if I can pick up the stashed work and try again sometime. Soon ™️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants