-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List recommended RDF serializations #465
Comments
My personal view on this is that the list should at least include Turtle and JSON-LD, since these are already often used within the Solid ecosystem. I would even go a step further, and suggest that all W3C-recommended RDF serializations should be included in this list. Concretely:
This broader list is important, for example for static file servers. Static file servers may not be able to do content negotiation, so they should be able to provide content in a single format (e.g. RDFa or JSON-LD snippets in HTML, for file servers that only serve HTML, but also want to include RDF). |
I agree. As a side note and just for your information, in #463, the issue is actually the opposite. Having a requirement like this would impose breaking support for traditional OIDC because traditional OIDC does not require the use of an |
As long as this list is imposed on clients to support parsing, while servers may expose their data in one of the recommended serializations, then I think this would still work. So concretely, servers MUST support at least one recommended RDF serializations, and clients MUST support parsing of all recommended RDF serializations. |
I think we can't push the burden on clients. Especially PWA/SPA clients who need to keep their bundles small. I would really need to see someone demonstrate ES library which can use dynamic imports to load parsers as required.
Broader Linked Data already allows all of this. I see a solid focus on read-write LD with authn/authz, if the server can support all the needed features they most likely can support content negotiation as well. |
In the short-term, this may indeed increase bundle size (even though parsers can be very small), or require dynamic imports. But in the long-term, I see (Solid-specific?) browsers shipping with these parsers by default. So that would then be similar to browsers supporting many different image formats.
Not all actors in the Solid ecosystem should necessarily be able to perform full read/write IMO. |
@rubensworks, as you already mentioned in, the long discussion in w3c/WebID#3 can already give us a lot of insight in people's preferences/goals having to do with serialisations. I try to summarise some of the key points.
Finally, let me add a single recommendation of my own, concerning the I linked to the original posts everywhere I could, but should someone feel I misrepresented their opinion, I’ll gladly edit this post. |
Summarizing the gathered recommendations, specs should:
If we then try to put this in a spec for Solid specs, we could for example start with recommendation [7], and translate it to the following, which also adheres to recommendations [2], [3] and [5]:
This alone obviously places a heavy burden on the clients and/or makes interoperability a complex issue. To alleviate this, we could add the following, which goes slightly against [2], but additionally adheres to [1], [4] and [6].
Additionally, we could add arbitrary recommendations of the following form. Based on [2], that should either by for ALL or for NONE of the W3C-recommended RDF serializations.
Note that we could transform these to MUSTs, but only by letting recommendation [5] fall. Taking them together, I think these three rules could be a good foundation for Solid specs to build on, taking into account all recommendations that I found in the the referenced issues. |
I see in the first linked comment:
So it talks about RDF embedded in HTML using
I think issues with |
@elf-pavlik, for myself HTML is not a major concern. I just mentioned it since multiple people seem to find it important (including either RDFa or structured data islands). As I already mentioned, without that concern all other recommendations can be captured by:
PS: Could you link to the issues you refer to? Thanks! |
I see bits of pieces are all over:
It could be nice to create one document from all of them that could serve as a reference. |
Pavlik, for the umpteenth time, stop mischaracterising things. RDFa is a concrete RDF syntax as per W3C. You don't have to accept that reality or even like it. We, the people, do use RDFa in Solid. RDFa checks many considerations that other formats do not and cannot. You don't want to use or like RDFa? No problem. Stop turning every discussion that touches HTML or RDFa into why no one else should use it in Solid. Consider understanding what diversity or an ecosystem entails. People have invested a lot of time and given you plethora of explanations and links for you to study over the years. Consider developing or authoring something to build up some experience.
Is specifically about WebID Profile Documents, and not a general point about RDFa, and Tim is referring to using HTTP PATCH to update using a specific application and its choices. PATCH is literally not the only way to update a document. Have a look at PUT.
I've responded to you there (and elsewhere) but it is disappointing that you are not even processing what's being said.
You are sharing this but I don't think you understand the discussion.
It was already clarified in these issue that this conversion (or any other format in fact) is not particularly important for the spec in that servers can accept any concrete RDF syntax (or even any equivalent representation in fact) but that they have to provide Turtle or JSON-LD when requested.
Again, you are reaching/cherry picking. See the context of the discussion. |
No need. Each technical report will require the formats it needs. Solid fundamentally recommends Linked Data. RDF as the language. Specific formats are asked for in each technical report with focus on interoperability. Different classes of products are welcome to use anything else in addition to that because technical reports alone do not address every use case in the ecosystem. The system needs to be evolvable and so it is best not to draw hard lines with lists as such which may not even matter in the end, and all things considered. |
This may be a bit too strict. I guess a MAY is sufficient here.
It introduces problems indeed.
This approach is currently causing issues, as raised by @tomhgmns in #463.
Fully agree that this should be evolvable. However, to ensure interoperability between server and client, developers must have some guarantees on what they must or should implement, hence the suggestion of a list. |
@rubensworks, then there is no guaranteed serialisation, however, and clients would have to bundle all parsers (or import the right one) to be interoperable. |
The core of #463 is not just specific formats. It is specific formats being required as default, i.e. to be served on requests without ConNeg. Having all requirements specify an |
Indeed, which is acceptable IMO. To quote my earlier comment:
|
Hard disagree there as some formats are tremendously more difficult to work with than others (JSON-LD and RDFa vs. Turtle and RDF islands). But, all the time and energy spent on debating serialization formats also demonstrates that mandating any one given format lands us nowhere. IMHO, quoting @woutermont's quoting of my own writing:
To which I might add, perhaps another of the suggested formats should be RDF data islands in EDIT: to clarify, this comment mostly reflect my experience discussing the WebID spec. However, I do think such experience also applies here and that it'd be better to have the two specs as aligned as possible, within reason. |
Indeed, parsing can sometimes be tricky. But I think we're losing track of that fact that parsers will be (and already are) available as reusable libraries. This means that Solid apps won't have to re-implement parsing support for every RDF serialization. Instead, they can just import an existing lib. AFAIK, all major programming languages have implementations of all recommended RDF serializations. This is similar to handling different image formats on Web pages. Image format parsers already exist, and can easily be reused. If the same reasoning were to be applied to image formats, then there would be no room for innovation in this regard, and highly compressed image formats such as WebP could never be adopted. |
They are, but my concern is more about general performance rather with the availability of parsing libraries. Both WebID and Solid should make as few assumptions as possible when it comes to the environment they are used in. Bundle size and CPU / memory utilization in desktop applications served over high-bandwidth networks can often tolerate 10x performance penalties with very little consequence for the end user, just as they can be trivially updated by publishing a new bundle. The same cannot be said of many other environments (low power boards, low-bandwidth networks, ...). Maintenance is also another issue, with the combination of all parsers leading to an explosion in the number of dependencies that poses a serious issue from a security standpoint (esp. when dealing with clearing processes in corporate environments). In any case, as stated I am in favor of suggesting (as in SHOULD) a couple of preferred formats. That list could also be updated over time, as per your suggestion @rubensworks . But, those formats should be selected based on how easy they are to use across the spectrum of environments that WebID and Solid might reasonably be used in. Hence my choice of Turtle and Turtle data islands. I'll stop beating on this particular drum as I've made this point here and elsewhere cited above; I don't want to pollute the conversation. |
Thanks for your elaboration, @jacoscaz! @ruben, could you maybe explain in more detail WHY you think it too strict to mandate servers to serve a concrete serialisation (e.g. (X)HTML+RDFa)?
|
I think it's important that Solid enables (parts of) pods to be hosted on static file servers, which usually don't have the ability to perform content negotiation, and they can thereby only serve a single format. For instance, if servers are allowed to only serve a single RDF serialization that they can choose, it's possible to host a (read-only) Solid pods on platforms such as GitHub Pages, which is (IMO) a use case we would want to enable. So if a server stores RDFa in HTML, and can only serve that, this should be fine. |
Solid is a project to extend the web with access control, content negotiation, etc. in order to create very much needed decentralised social networks. So looking at what servers that have not embarked on that project are doing to guide us, when that means making it more difficult to get solid going because we then create huge technical requirements, is putting the cart before the horse. Turtle is a key format there: it is simple to understand and parse. |
I agree with @bblfish in as far that we should not let existing practices be the ultimate guide for our decisions. However, I definitely see value in trying our best to let them remain compatible for as long as possible. @rubensworks, I get what your point is, as it has been raised/echoed by a number of other people as well. I do think, however, that my proposal (i.e. a MUST for serving (X)HTML+RDFa) provides sufficient room for static file servers without content negotiation, while still guaranteeing a sturdy foundation of interoperability. A primary group of statically served RDF resources will be (X)HTML+RDFa, served as Am I missing something that makes you still prefer a MAY, with the consequences for interoperability? EDIT: Simply serving existing files would indeed not be possible anymore, which could be a good compromise, as I argue further. |
Are you familiar with the WWW architecture principle of orthogonal specifications? RDF syntaxes, among other things, are clearly orthogonal to the Solid's core specification. So why are you trying so hard to include all of it under Solid? |
Because a MUST on HTML conflicts with the possibility to statically serve other formats, such as Turtle.
I follow this. |
Yes, I am, and I have taken that into consideration. Thanks for linking to it for those who don't. As I suggested, amongst others based on your own comments in w3c/WebID#3, we should not prefer one serialisation over another just for preferences' sake: the third proposed rule should probably be included for ALL or NONE of the serialisations, since it is indeed not up to Solid to decide what syntax is to be prefered. However, it IS Solid's concern that servers and clients in its ecosystem should have a minimum of guaranteed interoperability, i.e. at LEAST one serialisation should be mandated. It is ALSO a concern of Solid that a number of practices (in casu statically served files) remain compatible, and thus at MOST one serialisation should be mandated. It is because of these two reasons, that I think a MUST regarding (X)HTML+RDFa could be considered (EVEN if I would personally rather mandate ALL serialisations). |
That is true. I think that it is a fair compromise between interoperability and compatibility. RDFa is quite readable (even if a bit verbose), and transforming Turtle to it is trivial, so using RDFa as a syntax when you want to store something statically does not seem like too heavy a burden. |
@woutermont even though I still favor having zero mandated (as in MUST) serialization formats, what you’re proposing would be my second favorite option for WebID, too, and even more so if the format were to include both RDFa and Turtle data islands. I still think data islands alone would be better but I do realize that there is a vast amount of RDFa material already out there. |
@jacoscaz thanks for the support But then it seems that you, as well as @rubensworks and probably @namedgraph would all prefer to mandate nothing to the server and put the burden completely with the client? If that is the case, we might want to evolve in that direction, unless someone fervently wants to defend the clientside here (@elf-pavlik @bblfish @jonassmedegaard @csarven @timbl ?). If we take that other route, reconsidering the recommendations in my original comment, we could adhere to [2], [3], [5] and [7], maybe to [1] and [6], but not to [4], with the following.
|
This is true for some languages and contexts: JavaScript, Java and Python are good examples. This is not universally true, though, and the devil is in the details. For example, say you are writing an iOS application in Swift. There is no RDF library in Swift or Objective-C; there is no viable RDF library in C/C++. One could proxy out to a python library, but this gets quite complicated. For Android, which doesn't support newer Java11 features, Jena doesn't work at all; RDF4J partially works, but that timeframe is likely limited, as the main development of RDF4J has already moved to Java11. This gets even more complicated with constrained, embedded devices, which have, in some cases, very limited CPU/Memory resources. Supporting all possible RDF serializations in such a client is, in many cases, simply not possible. Secondly, Linked Data applications will likely interact with other, non-Solid services. Those services (e.g. Linked Data Fragments, Verifiable Credentials, Web Of Things) may have their own requirements on serializations -- and those requirements exist for a reason. Requiring that all clients can interact with all Linked Data services using all possible serializations is an even higher bar to set, especially if a particular client may only need to interact with a known subset of these services. IOW, defining a single canonical list of serialized forms is both a very high bar to set for clients and will also be incomplete once you look at the wider Linked Data ecosystem. I would encourage keeping specifications orthogonal, which to me means not mandating a specific list of serialized forms for all of Solid. |
@acoburn, thanks pointing out the contextual nature of parser support. However, if we cannot place the burden with the clients, me must place it on the servers, or else abandon the idea of full interoperability. While for a client a canonical list is hard, it shouldn't be for servers, so having those deliver RDF in all W3C-recommended serialisations seems perfectly acceptable to me. @rubensworks, in that light I would like to reconsider the importance of servers without content negotiation. Could you elaborate more on why a SOLID recommendation of syntaxes should affect it? I presume it is not your aim to have static file servers be fully compliant Solid servers? If not, then where's the connection with this issue? Without being constrained by Solid specs, those servers can still serve Turtle, JSON-LD and/or (X)HTML+RDFa. The only relevant constraint I see is then in what format you should host your WebID Profile Document. Or again, am I missing something? |
I don't see clearly where you draw the distinction between broader Linked Data and Solid. I see also see solid adding access control and read-write over HTTP. GitHub Pages seem to me only fit for general Linked Data. The question for me might go more towards How Solid fits into the broader Linked Data ecosystem, for example, how Solid applications can work with data which is not published on solid storage. When it comes to WebID, I believe there is no assumption that it is hosted on solid storage.
|
Whereas I agree with the sentiment, I would argue that a small but active ecosystem is a much easier beast to tame than a big and established one. Unification efforts at the current scale are still within the capabilities of a relatively small group of devs. IMHO, and the H there is a very pronounced one, if something needs to be mandated, then mandating JSON-LD instead of a human-friendly format to complement Turtle is not ideal. |
I think a dedicated issue to discuss Again, unless we want to dedicate this whole issue to continuing |
@RubenVerborgh, I'm not against starting small and re-evaluating based on use cases, but I had the impression that we were here to gather intentions and goals to formulate a long-term strategy, as kickstarted by yourself in #454 🤔 In particular, I ended up in that issue because of the concrete and urgent issue for which use cases are raised in #463, and which has also raised by @acoburn in w3c/WebID#3, before (just as now) it got stranded in "the bigger picture". |
Here's my suggestion: All Solid protocol related formats MUST use Turtle (which can be thought of as a subset of Trig or N3, which may be needed at some point later). Of course a Solid server can accept and publish every other format in existence, including binary rdf, rdfa, binary formats like Parquet, Avro, HTML, CSV, XML, JSON, any of the last marked up with GRDDLE like tech to view any format as RDF, ... and also of course visual and audio media types JPGs, ogg, video streams, etc... etc... |
When it comes to serializations, I think we could look more into using dataset formats, at least on GET. In #291 (comment) I suggested that this could help to keep auxiliary resources nicely separated (eg. client and server-managed statements) while allowing combining them in a single response which would include multiple named graphs. I realize that doing writes with dataset formats is more tricky but taking advantage of them for reading shouldn't be too hard. Since JSON-LD already supports datasets, I would imagine Turtle/Trig and JSON-LD on GET as a nice step in that direction. |
I agree with that pragmatic approach. I think we should favor Turtle on the whole. Json-LD makes sense for the OpenId-Connect use case as there is an interoperability requirement with another ecosystem. JSON-LD really is designed for that interoperability scenario, and it does it very well. But I am a bit concerned that we probably don't yet have many scalable Json-LD parsers. In a recent PR for IO support to banana-rdf I had to hack around with the Titanium parser that Jena uses and I found the following problems:
If I had time and money I would love to fix that. But it's probably a 2 month project. Actually if I had money and I could find someone who wanted to get going in this space I'd tutor them :-) |
☝️ I created a dedicated issue for RDF in |
This issue builds upon the serialization format goals and strategy discussion #454, and aims to determine a list of RDF serializations that are considered "recommended" across all Solid specs, to avoid serialization conflicts across specs (as seen in #463).
The goal of this issue is to discuss whether or not such a list makes sense, and which RDF serializations should be contained in this list.
In contrast to #454, this is the place for discussing preferences on which RDF serializations we want (and don't want) to recommend.
The text was updated successfully, but these errors were encountered: