Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typescript #373

Merged
merged 29 commits into from
Dec 17, 2019
Merged

Typescript #373

merged 29 commits into from
Dec 17, 2019

Conversation

joepio
Copy link
Collaborator

@joepio joepio commented Dec 3, 2019

This PR converts most of RDFlib.js to Typescript (#355). It fixes various minor bugs and it adds a lot of documentation, using TypeDoc instead of JSDoc. Since it's such a big PR with most of the code changed, I've added a changelog with some of my thoughts. For the most important ones, I've already opened some issues, but for the less important ones I've just added some bullets below.

Changes included in PR #363 (already approved)

  • Converted some of the most fundamental classes to typescript, including Node, Literal, BlankNode, NamedNode, Collection, Statement.
  • Introduced a .types file for shared types.
  • JSDoc is replaced with Typedoc. Combined with types and comments from @types/rdflib, this makes the documentation far more complete.
  • I used many of the types and comments from @types/rdflib by Cénotélie. Added credits in package.json, discussed this with Cénotélie.

New changes

  • Migrated formula, variable, store, update-manager, data-factory, default-graph, namespace, parse,serialize, parse, uri and utils tot ts.
  • Added, and later removed fork-ts-checker-webpack-plugin, which enables errors in the log when ts errors occur. Since this gave errors, we use tsc in the prepare script. This means that only builds without type errors will pass.
  • Added typeguards, e.g. isTFNamedNode and isTFPredicate in Utils, and used these at various locations.
  • Use enums for termType and contentType, without breaking compatibility with regular strings.
  • Formula Constructor arguments are optional - since some functions initialize empty Formulas.
  • In Formula.fromNT() return this.literal(str, lang || dt) seemed wrong, converted it to
  • The various fromValue methods conflict with the base Node class, type wise. Since they don't use this, I feel like they should be converted to functions.

Compatibility with RDFJS taskforce and external datafactories

We've worked quite a bit on compliance with the TF (TaskForce) spec. The internal types (including the factory) extend Taskforce types, which live in a separate tf-types.ts file. This makes it easier to use RDFLib in combination with external, custom data factories.

  • Switched internal calls from sameTerm to equals in order to comply to TF spec, so that these functions also work with external datafactories. Alias still exists, so nothing changes externally.
  • Switched internal calls from .why to .graph. The .why alias still exists, so nothing changes for external users.
  • Calls to kb.sym have been replaced with kb.rdfFactory.namedNode, which makes all these functions more compatible with external datafactories. Added a deprecation warning to .sym.

Minor fixes

  • Removed the last conditional of Formula.holds(), since it did not make sense
  • Removed some unreachable code, unused variables and functions that didn't do anything such as Node.substitute().
  • Removed the justOne argument from formula.statementsMatching, since it was unused.
  • The uri.document function called .uri on a string, I removed that.
  • Transformed inline comments to JSDoc, moved them to type declarations instead of constructor.
  • Some types are set to any, because I didn't fully understand them. I've added TODO comments for these.
  • Removed the fourth argument from the parser.parse function in fetcher.parse, since the function only takes three.
  • Removed the response argument from fetcher.parse, XHTMLHandler.parse, RDFXMLHander.parse, XMLHandler, since it was not used.
  • Fetcher.failfetch added strings as objects to the store. Changed that to literals.
  • Internal calls to NamedNode.uri are changed to .value to comply with TF spec. This enables these functions to work with external datafactories.
  • Removed unused second argument from Fetcher.cleanupFetchRequest
  • Created one huge Options type for Fetcher. Not sure if this is the way to go.
  • In Node.toJS, the boolean only returned true if the xsd:boolean value is '1', now it it should also work for 'true'.
  • Converted kb.add(someStrin2g) to kb.add(new Namednode(somestring)) to enhance compatibility with other datafactories. This happens in Fetcher and
  • Fetcher.refreshIfExpired passed an array of headers, but it needs only one string.
  • Fethcer uses Headers a lot. I've changed empty objects to empty new Headers instances, which enhances compatibility with default Fetch behavior.
  • Serializer.tripleCallback had an unused third argument.
  • UpdateManager.update checked an undefined secondTry variable. Since later in the same function, .update is called with a 4th argument, I assume this is secondTry. I've added it as an optional argument. Perhaps this is
  • Formula.add() now uses this.rdfFactory.defaultGraph instead of the non-existent this.defaultGraph
  • IndexedFormula.replaceWith now passes a Node instead of a string to .add in the if (big.value) block
  • Added support for true values in xsd:boolean literals, according to xsd spec.

Possible bugs discovered, which are not fixed by this PR

  • Formula.substitute uses this.add(Statments[]), which will crash. I think it should be removed, since IndexedFormula.substitute is used all the time anyway.
  • The Formula.serialize function calls serialize.ts with only one argument, so without a store. I think this will crash every time. AlsoFormula.serialize uses this.namespaces, but this is only defined in IndexedFormula. Is it rotten code and should it be removed?
  • IndexedFormula.add() accepts many types of inputs, but this will lead to invalid statements (e.g. a Literal as a Subject). I suggest we make this more strict and throw more errors on wrong inputs. Relates to Node.fromValue() type assumptions - null / undefined #362. We could still make the allowed inputs bigger by allowing other types with some explicit behavior, e.g. in subject arguments, create NamedNodes from URL objects and strings that look like URLs . In any case, I thinkg the Node.fromValue behavior is too unpredictable for store.add. For now, I've updated the docs to match its behavior.
  • The types for Node.fromValue and Literal.fromValue show how unpredictable these methods are. I suggest we make them more strict (also relates to Node.fromValue() type assumptions - null / undefined #362), so they either return a TFTerm (node) or throw an error - they should not return undefined or null. Also, I think they should be converted to functions in Utils: this would fix the circular dependency issue (why we need node_internal) and it would fix the type issues in Literal.fromValue (which tends to give issues since it's signature does not correctly extend from Node.fromValue)
  • In Fetcher.addtype, the final logic will allways return true, since redirection is a NamedNode. Should it call .value?
  • Various Hanlder.parse() functions in Fetcher return either a Response or a Promise<Error>. This seems like weird behavior - should it not always return an array?
  • The defaultGraph iri is set to chrome:theSession, but this errors in Firefox. I suggest we change it to something else. See Change defaultGraph iri #370.
  • The Parse.executeErrorCallback conditional logic is always true.
  • I've added many // @ts-ignore comments. Ideally, these should be resolved by fixing the underlying type issues.
  • UpdateManager.update_statement seems to refer to the wrong this. It calls this.anonimize, but it is not available in that scope.
  • UpdateManager.updateLocalFile uses Component, but this is not defined anywhere. Is this deprecated?
  • Data-factory-internal.id() returns string | undefined, I feel like undefined should not be possible - it should throw an error. This would resolve the type incompatibility on line 146.
  • IndexedFormula.copy runs .copy on a Collection, but that method is not available there.
  • IndexedFormula.predicateCallback is checked, but never used in this codebase.

Other things I noticed

  • Literals can apparently be null or undefined, when nodes are created using the .fromValue method. This causes faulty behavior. This happens in the new Statement() constructor as well. See Node.fromValue() type assumptions - null / undefined #362.
  • The IndexedFormula.add() method has logic for Statement array inputs and store inputs, but this behavior is not documented. It also refers to this.fetcher and this.defaultGraph, which both should not be available. I've added types that accept these arrays.
  • The filenames of major classes differ from their default exports, e.g. store.ts is called IndexedFormula.
  • Aliases (e.g. IndexedFormula.match for IndexefFormula.statementsMatching) introduce complexity, documentation and type duplication. I suggest adding deprecation warnings.
  • The various calling methods of Fetcher.nowOrWhenFetched are quite dynamic. A simpler, stricter input type might be preferable.
  • The Variable type (or TFVariable) really messes with some assumptions. I feel like they should not be valid in regular quads, since it's impossible to serialize them decently. If I'd add it to the accepted types, we'd require a lot of typeguards in functions.
  • Fetcher StatusValues can be many types in RDFlib: string, number, true, undefined... This breaks compatibility with extending Response types, since these only use numbers. I feel we should only use the 499 browser error and add the text message to the requestbody. I've created a type for the internal InternalResponse; it shows how it differs from a regular Response. The .responseText property, for example, is used frequently in Fetcher, but
  • The IndexedFormula and Formula methods have incompatible types, such as in compareTerm, variable and add. I've added //@ts-ignore lines with comments.
  • The fourth reponse argument in .parse() methods in Handler classes was unused (except in N3Handler), so I removed it everywhere it was unused.
  • Serializer's fourth options argument is undocumented, and I couldn't find out how it worked.
  • Fetcher saveResponseMetadata creates literals
  • Many functions in Fetcher assume that specific Opts are defined. I've included all these in a single Options type and added documentation for the props I understood. I've also created an AutoInitOptions type, which sets auto-initialized. I extended Options in each function where specific opts seemed to be required. I'm not entirely confident about the types I've set. I feel like the truly required items should never be Opts, since they aren't optional. Refactoring this requires a seperate PR.
  • Fetcher.load allows arrays as inputs. This causes the output types to be more unpredictable. Promise<Result> | Result[]. I suggest splitting this in two functions, e.g. add loadMany
  • Utils.callbackify seems to be used only in Fetcher.
  • UpdateManager.editable has a confusing return type (string | boolean | undefined). I suggest we refactor it to always return one of multiple pre-defined strings,.
  • The optional argument in formula.js does not seem to be documented, used or tested - should it be removed?

Need review

  • Some of the Formula and IndexedFormula functions (e.g. anyStatementMatching) might have too strict types - perhaps Collections are allowed in some of them.
  • IndexedFormula.declareExistential & newExistential have different assumptions on types - should they be blanknodes or namednodes?
  • IndexedFormula.check passes a single statement to checkStatementList, which expects an array
  • I've added many type assertions (e.g. as TFObject), but Ideally, these do not exist. Ultimately, these should be replaced by TypeGuards that work on runtime

Some thoughts on simplifying language

Getting started with Linked Data or RDF can be difficult, since it introduces quite a few new concepts.
Since this library is powerful and generic, it might be one of the first and most important RDF tools that a developer will use.
Therefore, we should try to use consistent langauge and keep synonyms to a minimum.

  • The name Node and Term seem to refer to the same concept. Both are used in this repo. I think Term is slightly more suited, partially because it complies to the TF spec, but also because it seems more sementically correct. A Literal, for example, is not really a node in the mathematical sense, it's more of an edge, since it cannot connect to other nodes.
  • Statement, Triple and Quad refer to the same concept, at least technically. Maybe we could pick one. I suggest Statement, because it covers both triples and quads.
  • The concept graph is referred to as why, doc and graph in the code and API. I think this might be confusing - should we just call it graph everywhere?
  • the IndexedFormula default export name is different from the store filename. It might be easier to just call it store everywhere, including where it's called kb.

Notes for PR reviewer

  • All my commits have been squashed into one commit, since rebasing onto @Fletcher91's changes proved very difficult. The individual commits can be seen here.
  • Github does not show line-by-line changes in the renamed files, which makes reviewing this PR in Github unnecessarily hard. I recommend using an IDE with decent typescript support and a Git blame plugin.

Fletcher91 and others added 11 commits November 29, 2019 14:58
Sans formula, which technically is a term
Some utils to TS linkeddata#355

Add datafactory type to formula linkeddata#355

Taskforce compatibility linkeddata#355

Add fork-ts-checker-webpack-plugin linkeddata#355

URI to typescript linkeddata#355

moved typeguards to utils

Datafactory & serializer types,  enums linkeddata#355

WIP Fetcher & UpdateManager linkeddata#355

Fetcher WIP linkeddata#355

WIP Fetcher linkeddata#355

Continue working on typescript migration linkeddata#355

Data factory types fix linkeddata#355

Various minor type improvements linkeddata#355

Add documentation to typeguards linkeddata#355

Various fetcher type improvements, use enums linkeddata#355

Use doc.value instead of doc.uri in update-manager

Update typescript linkeddata#355

Don't use const enums util supported linkeddata#355

Fixed tests, improved typeguards, fetcher linkeddata#355

Rename uriCreator to nodeValue, various ts linkeddata#355

Fether typings linkeddata#355

Fetcher & Formula typing improvements linkeddata#355

Update manager typing linkeddata#355

Data factory, store, fether types linkeddata#355

Various type changes, almost no type errors left linkeddata#355

Replace .sym with factory.namednode

Cleanup

Move typing/type-testing utils to separate file
The inclusion of dirty-chai requires all `.to.be.false/true` assertions
to be called explicitly.
Rename and move factory-types
@timbl timbl requested a review from megoth December 8, 2019 21:44
@timbl timbl self-assigned this Dec 8, 2019
@timbl timbl requested review from RubenVerborgh and timbl December 8, 2019 21:44
@timbl timbl removed their assignment Dec 8, 2019
Copy link
Member

@RubenVerborgh RubenVerborgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns about the whole TF thing:

  1. TF is not a good prefix for RDF/JS-compatible things.
  2. Typings for RDF/JS already exist; we should just reuse them: https://github.com/rdfjs-base/data-model/blob/master/index.d.ts
  3. rdflib.js is RDF/JS-compatible, so the distinction is not necessary. Only where code relies on rdflib.js-specific extensions, those should be used. (i.e., Term is the generic thing, and RdfJsTerm the extension, not the other way round).

src/utils/terms.ts Outdated Show resolved Hide resolved
src/utils/terms.ts Outdated Show resolved Hide resolved
src/utils/terms.ts Outdated Show resolved Hide resolved
src/utils/terms.ts Show resolved Hide resolved
src/utils/terms.ts Outdated Show resolved Hide resolved
tests/unit/fetcher-test.js Outdated Show resolved Hide resolved
tests/unit/fetcher-test.js Outdated Show resolved Hide resolved
tests/unit/util-test.js Show resolved Hide resolved
src/tf-types.ts Outdated Show resolved Hide resolved
src/tf-types.ts Show resolved Hide resolved
Copy link
Collaborator

@Vinnl Vinnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is a major improvement! The main changes I'd like to see:

  • Re-using @types/rdf-js
  • Make rdflib's own types extend those and have that be the special case, e.g. NamedNode and RdflibNamedNode instead of TFNamedNode and NamedNode, respectively.

Other than that a number of a small inline comments/questions, but given the size of the PR and my relative unfamiliarity with rdflib internals, most of those are probably the result of my misunderstanding :)

I have yet to review update-manager.ts and store.ts, but since GitHub hides their diffs by default since they're so large, I'm submitting this now to avoid me losing my work due to accidentally refreshing or something. I've also not reviewed changes that came in while I was reviewing yet.

Edit: Alright, reviewed everything that was included 16:45 CET, December 9th!

src/types.ts Outdated Show resolved Hide resolved
src/named-node.ts Outdated Show resolved Hide resolved
src/blank-node.ts Outdated Show resolved Hide resolved
src/blank-node.ts Outdated Show resolved Hide resolved
src/blank-node.ts Outdated Show resolved Hide resolved
src/utils.ts Show resolved Hide resolved
tests/unit/factories/canonical-data-factory-test.ts Outdated Show resolved Hide resolved
tests/unit/factories/canonical-data-factory-test.ts Outdated Show resolved Hide resolved
tests/unit/factories/canonical-data-factory-test.ts Outdated Show resolved Hide resolved
tests/unit/factories/extended-term-factory-test.ts Outdated Show resolved Hide resolved
src/update-manager.ts Outdated Show resolved Hide resolved
Copy link
Collaborator

@megoth megoth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of good work, and a very good step in the right direction for rdflib.

I would only request that tf-types are removed in favor of https://www.npmjs.com/package/@types/rdf-js. Further, I'm ok with aliasing the interfaces (e.g. import { NamedNode as TFNamedNode } from 'rdf-js'), but do not let it bleed into the public API (e.g. isTFLiteral should be renamed isLiteral, and the methods that are specific to rdflib should be named something different instead).

.npmignore Show resolved Hide resolved
src/fetcher.ts Outdated Show resolved Hide resolved
src/fetcher.ts Outdated Show resolved Hide resolved
src/fetcher.ts Outdated Show resolved Hide resolved
src/formula.ts Show resolved Hide resolved
src/namespace.ts Show resolved Hide resolved
src/node-internal.ts Show resolved Hide resolved
src/store.ts Outdated Show resolved Hide resolved
src/tf-types.ts Show resolved Hide resolved
"es2019"
]
"es2019",
"dom"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the introduction of dom ok considering the use of rdflib in Node?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Response type used in rdflib is the one that Fetch uses. I tried the Express response, but it was not compatible.

Is there a way to only use the Response type, without requiring the dom lib?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure =\ Have you been able to test the transpiled files with Node- (e.g. NSS) and web-based (e.g. mashlib) systems? If yes, I'll approve this PR, if no, I'll try to test it out myself later and report back.

@joepio
Copy link
Collaborator Author

joepio commented Dec 10, 2019

I tried (see commit) replacing my TF types with those from @types/rdf-js, but it seems these are incompatible - both with RDFlib and with the original spec.

One of the most fundamental problems is that @types/rdf-js defines Term as a Union of its other Terms:

export type Term = NamedNode | BlankNode | Literal | Variable | DefaultGraph;

The RDF/JS spec defines Term as an abstract interface.

Since RDFlib uses custom terms (like Collection), it is compatible with the RDF/JS spec, and it works with my types, but it does not work with the types from @types/rdf-js. This problem with Term, unfortunately, trickles down quite a bit and makes it impossible to switch, at the moment.

There are a couple of things we can do here:

  1. Issue a PR for DefinitelyTyped/rdf-js, change Term to fit the spec. This does require that these changes do not break anything in rdf-js.
  2. Add the types to the RDF/JS repo.
  3. Do not import the @types/rdf-js types, but use my previous implementation.

@RubenVerborgh What are your thoughts on this?

@megoth
Copy link
Collaborator

megoth commented Dec 10, 2019

To me it sounds like we want to start a process to make sure that the interfaces expressed by RDFJS are compatible with the work in rdflib. That is a lengthier discussion though, so I would suggest removing those parts and keep this PR simple.

We could of course mitigate that for now by having src/tf-types, but I think removing them altogether is better as a first step?

@joepio
Copy link
Collaborator Author

joepio commented Dec 10, 2019

To me it sounds like we want to start a process to make sure that the interfaces expressed by RDFJS are compatible with the work in rdflib. That is a lengthier discussion though, so I would suggest removing those parts and keep this PR simple.

I'll make a separate issue for migrating to external types. (Edit: #374)

We could of course mitigate that for now by having src/tf-types, but I think removing them altogether is better as a first step?

Removing them has quite a big impact, I'd suggest we use the tf-types for now and do the migration to an external type definition later. As for now, the types in this PR seem to resemble the spec more closely.

Copy link
Collaborator

@megoth megoth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A think this is getting close to something that should be merged. I'm vary of the longer we postpone this, the larger it might become, and at some point we need to get back to smaller PRs that are easier and quicker to review. (That being said, as I said before, this is a lot of good work, and I really appreciate your efforts.)

@timbl @joepio @RubenVerborgh @Vinnl We might want to release these changes as a pre-release, to allow people to test them out and weed out any bugs that might have sneaked inside?

@joepio @Fletcher91 Is there anything more you need to do before we can merge this PR?

@joepio
Copy link
Collaborator Author

joepio commented Dec 17, 2019

@megoth

I believe testing in other applications (e.g. in mashlib, some node env) is smart before doing a definitive release and bumping the version. U agree that a pre-release is the way to go!

Copy link
Collaborator

@Vinnl Vinnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My remaining comments are nice-to-haves, but certainly do not warrant holding off merging this. Great job!

src/types.ts Outdated Show resolved Hide resolved
src/node-internal.ts Show resolved Hide resolved
@RubenVerborgh
Copy link
Member

RubenVerborgh commented Dec 17, 2019

About src/tf-types.ts, why do we copy instead of use? I only see disadvantages. There are more stakeholders to the RDF/JS domain than the maintainer of that one package. If there are incompatibilities, the package should be adjusted.

@RubenVerborgh
Copy link
Member

Also, we should use proper naming. The word "taskforce" shouldn't occur in code, comments, or anywhere else. The proper terminology is the RDF/JS Data Model specification: https://rdf.js.org/data-model-spec/

@joepio
Copy link
Collaborator Author

joepio commented Dec 17, 2019

@RubenVerborgh We copied from the @types/rdflib since these were the types specific to this repo. Since the repo is migrated to typescript in this PR, the @types/rdflib types are no longer useful - the ones in this PR are more accurate anyway, since they have to be fully internally consistent to even compile. For why I'm not using @types/rdfjs, see #374 (in short: I'm working on a PR to change Term, hopefully we can use these types after that is merged).

I've removed the 'taskforce' mentions.

@RubenVerborgh
Copy link
Member

@joepio Thanks, lost track of all the issues. Only remnants now are TF prefixes as in TFDataFactory, which should also be replaced. Perhaps RdfJsDataFactory or RSDataFactory.

Happy to withdraw my objections after the last commits, and if we are using #374 to ensure that the interfaces are in fact copies and thus straight imports. If we can't something is wrong on either our side or their side.

@RubenVerborgh RubenVerborgh self-requested a review December 17, 2019 16:16
Copy link
Member

@RubenVerborgh RubenVerborgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No further objections after TF prefix removal.

@megoth megoth merged commit d5000f5 into linkeddata:master Dec 17, 2019
@megoth megoth added the release-patch Add to PR to indicate that merging it denotes a patch (semver) release label Dec 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-patch Add to PR to indicate that merging it denotes a patch (semver) release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants