-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate with loaders.gl 2.2 #1156
Conversation
jsonpaths: [ | ||
'$', // JSON Row array | ||
'$.features', // GeoJSON | ||
'$.datasets' // KeplerGL JSON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datasets is an array of datasets, which could be less than 10, we can make it be $.datasets.0.data
so it always tries to stream the first dataset. And while we save in kepler, we can save the dataset with the longest rows first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. The reason I left it as $.datasets
for now is because it looks like the array indexing in jsonpaths isn't handled properly in loaders.gl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think you can just add a test case in jsonpaths.spec and streaming-json-parser.spec in loaders.gl/json and it should be easy to pin down this little bug (at a guess, some string/number conversion issue perhaps, or openobject, openarray treated differently).
src/processors/file-handler.js
Outdated
@@ -200,9 +178,9 @@ export function processFileData({content, fileCache}) { | |||
} | |||
|
|||
if (format && processor) { | |||
console.time('process file content'); | |||
Console.time('process file content'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was me testing, it can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
Apart from some nits, I think the post processing could merit another look, especially the CSV functions, it feels like there is overlap with loaders and it would be nice to push generic post-processing code upstream into loaders.gl so we can keep the logic here in the app as clean and minimal as possible.
Edit: That would be for another PR.
if (!Array.isArray(result) || result.length < 2) { | ||
// looks like an empty file, throw error to be catch | ||
throw new Error('Read File Failed: CSV is empty'); | ||
export function processCsvData(rawData, header) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to duplicate the CSVLoader with header detection etc. Can we unify?
Is this unique to CSV data? not other tables? Rename the functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I think there might be an opportunity to remove this functionality altogether, but it seems to be used for loading sample datasets and other places (if I understand this correctly). @heshan0131 what do you think?
src/processors/file-handler.js
Outdated
@@ -157,6 +69,10 @@ export function isRowObject(json) { | |||
return Array.isArray(json) && isPlainObject(json[0]); | |||
} | |||
|
|||
export function isCsvRows(content) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case does not happen any more with the changes in @loaders.gl/csv
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right! this isn't needed anymore because a CSV gets transformed to a JSON row array by Loaders.gl. removed.
src/processors/file-handler.js
Outdated
|
||
let format; | ||
let processor; | ||
if (isCsvRows(data)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to know if this was originally CSV then we can use the metadata batch instead of sniffing like this. The idea with loaders.gl is that we can add additional streaming parsers do we need to keep track of what type things were originally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check isn't actually needed since, as you said, the CSV gets parsed as an array of records (will get captured by the isRowObject(data)
instead.
Parse JSON using loaders.gl Parse CSV using loaders.gl Let loaders.gl handle file detection, not working though Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Shan He <[email protected]> Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Shan He <[email protected]> Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Shan He <[email protected]> Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Shan He <[email protected]> Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Shan He <[email protected]> Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Shan He <[email protected]> Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Wesam Manassra <[email protected]>
… and use jsonpaths Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Wesam Manassra <[email protected]>
Parse JSON using loaders.gl Parse CSV using loaders.gl Let loaders.gl handle file detection, not working though Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Shan He <[email protected]> Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Shan He <[email protected]> Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Shan He <[email protected]> Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Shan He <[email protected]> Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Shan He <[email protected]> Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Shan He <[email protected]> Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Shan He <[email protected]> Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Shan He <[email protected]> Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Wesam Manassra <[email protected]>
Signed-off-by: Wesam Manassra <[email protected]>
src/processors/file-handler.js
Outdated
|
||
let format; | ||
let processor; | ||
if (isCsvRows(data)) { | ||
format = DATASET_FORMATS.csv; | ||
processor = processCsvData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not delete this. kepler processCsvData does 2 things:
- type detection, specifically to find
time
geo
as well as the exact time formatyyyy-hh-dd mm:ss
- basic data cleaning. lots of times people have string values like
N/A
Null
N/
mixed up with numbers. kepler finds them and fill it in withnull
so type detect won't fail.
For case1, there is a fall back in reducer, if format
if time is note detect, keler will call type detection again. Not sure loaders.gl will handle case 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now loaders doesn't do this, but it will have to when we switch the loaders to return columnar data batches (typed arrays) and arrow batches.
I think we should look at moving the data cleaning out as a second step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add to Ib's point, once the dataset is processed via loaders.gl, the output format for CSV files and JSON Row objects becomes the same (an array of records), so the isCsvRows
function doesn't get evaluated. Any ideas on how to tackle this?
Signed-off-by: Shan He <[email protected]>
Signed-off-by: Shan He <[email protected]>
add probe.gl browser drive test test row object and kepler map Signed-off-by: Shan He <[email protected]>
Most changes in this PR come from #1135 by @heshan0131. The changes this PR brings in addition to those are mostly in
src/processors/file-handler.js
, where we went through a few iterations of integration with Loaders.gl.Key New Changes:
metadata
flag to assist with parsing objectsjsonpaths
property to guide theparseInBatches
function to what to streamcolumn1
,column2
, ....)Next Steps: