-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introducing higher-order functions on columns #53
Conversation
Comment from astigsen: Very cool. A lot of programmers from a functional background will love this :-) A few questions/thoughts:
|
When it can be done on entire rows at a time, it obviously opens up wide possibilities. But have you thought of any special use cases for this, when single column only? I will assume that we end up implementing most statistical operations on columns as intrinsics, so that they don't have to unpack values and can do SSE optimizations. What kind of operations (on columns), that we would not be likely to implement, do you imagine users could be interested in? |
Conflicts: src/tightdb/array.cpp src/tightdb/array.hpp src/tightdb/array_basic.hpp src/tightdb/table.hpp
Conflicts: src/tightdb/array_string_long.hpp src/tightdb/column_string.hpp src/tightdb/column_string_enum.cpp
Conflicts: src/tightdb/column.cpp
Conflicts: src/tightdb/array_basic_tpl.hpp src/tightdb/column.hpp src/tightdb/column_basic_tpl.hpp src/tightdb/table.hpp
Conflicts: src/tightdb/column.hpp
Conflicts: src/tightdb/array.hpp src/tightdb/array_string.hpp src/tightdb/array_string_long.cpp src/tightdb/array_string_long.hpp
Conflicts: src/tightdb/array_basic.hpp src/tightdb/array_string.cpp src/tightdb/array_string.hpp src/tightdb/array_string_long.cpp src/tightdb/array_string_long.hpp src/tightdb/column.cpp src/tightdb/column.hpp src/tightdb/column_string.cpp src/tightdb/column_string.hpp src/tightdb/column_string_enum.cpp
Conflicts: src/tightdb/array_basic.hpp src/tightdb/array_string.hpp src/tightdb/column.hpp src/tightdb/column_basic.hpp src/tightdb/column_string.hpp src/tightdb/table.hpp
Conflicts: src/tightdb/array.cpp src/tightdb/array.hpp
Conflicts: src/tightdb/array.hpp src/tightdb/array_basic.hpp src/tightdb/column.hpp src/tightdb/column_basic.hpp src/tightdb/column_basic_tpl.hpp src/tightdb/column_string.hpp
Test FAILed. |
Conflicts: src/tightdb/array.hpp src/tightdb/array_basic.hpp src/tightdb/array_basic_tpl.hpp src/tightdb/column.cpp src/tightdb/column.hpp src/tightdb/column_basic.hpp src/tightdb/column_basic_tpl.hpp src/tightdb/column_string.cpp src/tightdb/column_string.hpp
Test FAILed. |
Test FAILed. |
Test PASSed. |
Test PASSed. |
@kspangsege Is this still WIP? Did we abandon the idea? |
It is still WIP. As far as I know, we have not abandoned it [scarry thought On Tue, Feb 4, 2014 at 6:55 AM, Tim Anglade [email protected]:
|
Conflicts: src/tightdb/array_string_long.cpp src/tightdb/array_string_long.hpp src/tightdb/column.cpp src/tightdb/column.hpp src/tightdb/column_basic.hpp src/tightdb/column_basic_tpl.hpp src/tightdb/column_string.hpp
Test PASSed. |
Test FAILed. |
Test FAILed. |
I'm excited to see this merged into core! I like the name "reduce" instead of "foldl" if we are not going to have "foldr" as well. BTW, this is more formally called "second-order functions". |
Uh! Filter would be nice as well. |
Abandonned |
Note: This is a work in progress.
The ambition is to provide a set of higher-order functions that combines extensibility with efficiency.
For example, here is how you could compute the variance in an integer column using Knuth's online algorithm (if that is your favourite version):
Note: This already works. It would also have worked for a floating point column.
Note: This is actually a good algorithm, since it is both online (one pass) and numerically stable. It also produces the mean (or average) for free.
Note also that because the function is passed as argument, it does not suffer from the penalty of regular iteration. Also, the specified function can in general be inlined inside a low-level loop.
Another relished higher order function is 'fold'. Here is how you could compute the square sum of a double column using 'fold left':
Note:
foldl_string()
could be used, for example, to compute the size of the largest string in a string column.I already use these functions in the tightdb_tools repository for the SQL-like prompt.
I plan to also provide:
I also plan to provide multi-column versions - somehow.
I realize that these functions have substantial overlap with already provided functionality, however, these higher-order functions will provide much more flexibility for the customer.
/cc @astigsen @kneth @rrrlasse @bjchrist