-
-
Notifications
You must be signed in to change notification settings - Fork 7
Features
Keven Ates edited this page Nov 8, 2022
·
7 revisions
- Cleaned and refactored UI elements (but it still looks mostly like the original RDF extension)
- Resizable dialogs
- Uses a template system that is exportable / importable (like OntoRefine) for use between different (but similar data structure) projects
- The template is stored as a JSON formatted structure
- The structure, while similar to OntoRefine, uniformly normalizes key names and substructures
- The same structure is used within the native OpenRefine data store for project and change management
- Transform Tab:
- RDF Node Editor: Added Prefix selection to the editor with Prefix and LocalPart management throughout the code
- Preview Expression Editor:
- Two new GREL functions:
- toIRIString() - transforms and properly validates a string as an IRI component (replaces urlify())
- toStrippedLiteral() - minimally clean a literal string by converting each known Unicode whitespace character to a normal space and end trim the result
- The preview tab display results from all the rows in the current OpenRefine data display.
- Two new GREL functions:
- Preview Expression Editor:
- RDF Property Editor: Initializes with the existing property in the textbox instead of blank text
- Added universal Expand (>) and Collapse (v) control on all nodes and properties
- Added universal delete (x) control on all nodes, properties, and types
- RDF Node Editor: Added Prefix selection to the editor with Prefix and LocalPart management throughout the code
- Preview Tab:
- All changes made in the Transform tab are reserved until the user switches to the Preview tab
- Editable sample record / row count for preview
- Editable preview checkbox to display output based on Jena Stream or Pretty format
There are several user editable files and settings avaiable for customization.
The following files in the .../extensions/rdf-transform/module/MOD-INF/classes/files/ directory can be customized:
- Namespaces - contains an extensive list of known prefixes with their namespaces to help when adding a namespace
- PredefinedVocabs - contains a short list of automatically loaded namespaces
- 1st Entry: Prefix
- 2nd Entry: Namespace
- 3rd Entry: Optional web address to load associated namespace elements NOTE: Installing new RDF Transform version will naturally overwrite the files, so backup any customized files to reapply.
The files in the .../extensions/rdf-transform/module/langs/ directory are used to manage language related display strings. Copy and translate to your own favorite language. Submit a language file to the project for inclusion.
Five preferences manage server output (see OpenRefine preferences)
- "RDFTransform.verbose" preference aids with process feedback and debugging
- A general "verbose" preference is rcognized as a default (HINT: OpenRefine might use it as a base preference)
- 0 == no verbosity and unknown, uncaught errors (stack traces, of course)
- 1 == basic functional information and all unknown, caught errors
- 2 == additional info and warnings on well-known issues: functional exits, permissibly missing data, etc
- 3 == detailed info on functional minutiae and warnings on missing, but desired, data
- 4 == controlled error catching stack traces, RDF preview statements, and other highly anal minutiae
- A missing verbose preference defaults to 0
- "RDFTransform.exportLimit" preference limits the statement buffer and optimizes output
- The statement buffer (i.e., an internal memory RDF repository) stores statements created from the data
- The old system created one statement in the buffer, then flushed the buffer to disk--very inefficient
- The new system holds many statement before before flushing to disk.
- This buffer can become large if the data is large and produces many statements, so it is somewhat optimized:
- Given a default statement size of 100 bytes, the default buffer is limited to 1024 * 1024 * 1024 / 100 = 1GiB / 100 = 10737418 statements
- The 100 byte statement size is likely large as the average statement size is likely smaller
- Regardless, this keeps memory usage to about 1GiB or less and a user can set the preference to optimize for a given memory footprint and data size
- Then, the buffered statements optimize the creation and flush processes to speed the disk write
- (FUTURE) An enhancement may examine the project data size and system memory to determine an optimize buffer size and allocations
- "RDFTransform.previewStream" preference selects a default for the preview output:
- "true" == a Stream preview
- "false" == a Pretty preview
- A missing preview stream preference defaults to a Pretty preview (false)
- See the User Interface: Preview tab above to change "on the fly"
- "RDFTransform.debug" preference aids debugging
- Controls the output of specifically marked "DEBUG" messages
- Includes many verbose output messages as well
- "RDFTransform.debugJSON" preference aids debugging JSON formatted elements, like the transform template
- Separates the general debug messages from the debug dump of JSON formatted output
- JavaScript code has been updated to use "classified" coding
- Loops use iterators whenever possible
- Dependent library management has resulted in space savings
- RDF4J has been removed as it's completely replaced by Jena
- OpenRefine now includes the Jena library and as been removed from RDF Transform
- The compressed extension is reduced from 71.5 MiB to 8.9 MiB--an over 87% reduction!
- RDF Exports
- Exports for all pretty compliant Jena formats
- RDF/XML
- Turtle
- N3
- TriG
- JSON-LD
- RDF JSON
- Exports for all stream compliant Jena formats
- Blocks
- Turtle
- N3
- TriG
- Line
- NTriples
- NQuads
- TriX
- RDFNull (Test)
- Binary
- ProtoBuf
- RDFThrift
- Blocks
- Currently, only triples are produced
- (FUTURE) Quad syntax is not currently supported--add context "s p o c" extension for Quads
- (FUTURE) Star syntax is not currently supported--add star "s p << s p o >> p o" syntax
- Exports for all pretty compliant Jena formats
- Properly recognize the Row verses Record parameters and processing (row and record visitors)
- (FUTURE) Process inner record groupings as sub-records
- Properly parse IRIs for valid structure, prefix and local part, absolute and relative, using the base IRI as needed
- Properly process an IRI's Condensed IRI Expression (CIRIE, a.k.a., Prefix + LocalPart) for output / export
- Reserve flushing of scaled statements buffers to speed exports (user definable--see "RDFTransform.exportLimit" below)
- The "Namespaces" and "PredefinedVocabs" support files are processed using general whitespace separation (not strictly tab delimited)
- The code differentiates between "namespace" (a prefix and its IRI namespace) versus "prefix" (just the prefix value of the namespace)
- General cleanup and verbose commenting throughout the code
To streamline RDF Transform, the RDF reconcile functionality has been removed from this project and will likely not be recreated. OpenRefine's general reconciliation may be considered an adequate substitute.