Roadmap comments from DAI perspective #2700
Replies: 4 comments 3 replies
-
Any news on time-estimates? |
Beta Was this translation helpful? Give feedback.
-
regarding:
My understanding: Right now, we load data in DTBL format, do munging, but before pushing into LGBM we need to convert data into format expected by LGBM (=copy of data). @arnocandel was proposing to integrate DTBL directly to LGBM to be able to accept data in DTBL format.
Able to use MOJO in "map" call on data frame in a similar way how H2O-3/Spark are scoring with MOJO: split data to partitions and score partitions in parallel. |
Beta Was this translation helpful? Give feedback.
-
Actions
|
Beta Was this translation helpful? Give feedback.
-
Current priority from H2O:
From community perspective:
|
Beta Was this translation helpful? Give feedback.
-
This is the list of issues annotated by the priority from DAI perspective:
🚨 Top Priority
Goal: optimize computation speed, minimize data conversions
to_arrow
docs)High Priority
Eliminate "Rereading" pass in fread? #1843,streaming dt.fread(path).to_jay() #1750) (DAI: HIGH, time needed: ?)Other
Full support of datasets with >2B rows (Implement sorting of columns with >2B rows #2336) (DAI: MEDIUM, time needed: ?)
Inner/outer joins (Implement inner/outer joins for non-keyed frames #1080) (DAI: LOW)
Ability to score dataframe using MOJO in efficient way (
f.score(mojo)
) (DAI: MEDIUM, time needed: ?) [clarify]Fread improvements:
Eliminate "Rereading" pass in fread? #1843,streaming dt.fread(path).to_jay() #1750) (DAI: LOW, time needed: ?)fread may sometimes detect incorrect newline character #1343, Unable to parse attached dataset. #1045, File containing a single unescaped " out-of-sample is read incorrectly #1036, Improve headers detection logic when all columns are of "string" type #946, If last field has unclosed quote, then it will not be parsed properly #934, fread should not detect sep within quoted fields #922, fread erroneously guesses sep=' ' #518) (DAI: LOW, time needed: ?)Functions for string manipulations (DAI: MEDIUM, time needed: ?)
*
)Functions for date/time manipulations (DAI: LOW, time needed: ?)
Rolling windows support (Rolling aggregate support based on windows within a DT #1500) (DAI: LOW, time needed: ?)
New proposals
Beta Was this translation helpful? Give feedback.
All reactions