Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing higher-order functions on columns #53

Closed
wants to merge 30 commits into from
Closed

Introducing higher-order functions on columns #53

wants to merge 30 commits into from

Conversation

kspangsege
Copy link
Contributor

Note: This is a work in progress.

The ambition is to provide a set of higher-order functions that combines extensibility with efficiency.

For example, here is how you could compute the variance in an integer column using Knuth's online algorithm (if that is your favourite version):

size_t n = 0;
double mean = 0;
double m2 = 0;
my_table.foreach_int(column_ndx, [&](int v) {
    n = n + 1;
    double delta = v - mean;
    mean += delta/n;
    m2 += delta * (v-mean);
});
variance = m2 / (n-1);

Note: This already works. It would also have worked for a floating point column.

Note: This is actually a good algorithm, since it is both online (one pass) and numerically stable. It also produces the mean (or average) for free.

Note also that because the function is passed as argument, it does not suffer from the penalty of regular iteration. Also, the specified function can in general be inlined inside a low-level loop.

Another relished higher order function is 'fold'. Here is how you could compute the square sum of a double column using 'fold left':

double sq_sum = my_table.foldl_double(column_ndx, [](double a, double v) {
    return a + v*v;
}, 0.0);

Note: foldl_string() could be used, for example, to compute the size of the largest string in a string column.

I already use these functions in the tightdb_tools repository for the SQL-like prompt.

I plan to also provide:

my_table.map_int(column_ndx, [](int v) { return roof-v; });
my_table.find_double(column_ndx, [](double v) { return sin(v) < -0.5; });

I also plan to provide multi-column versions - somehow.

I realize that these functions have substantial overlap with already provided functionality, however, these higher-order functions will provide much more flexibility for the customer.

/cc @astigsen @kneth @rrrlasse @bjchrist

@kspangsege
Copy link
Contributor Author

Comment from astigsen:

Very cool. A lot of programmers from a functional background will love this :-)

A few questions/thoughts:

  • I assume you will make it available through the typed interface somewhat like this: my_table.column().age.map([](int v) { return roof-v; });
  • Do they work with old style predicates as well?
  • Have you thought about how to expose them to other language bindings?
  • How about doing this lazily on top of a query?

@astigsen
Copy link
Contributor

When it can be done on entire rows at a time, it obviously opens up wide possibilities. But have you thought of any special use cases for this, when single column only?

I will assume that we end up implementing most statistical operations on columns as intrinsics, so that they don't have to unpack values and can do SSE optimizations. What kind of operations (on columns), that we would not be likely to implement, do you imagine users could be interested in?

Conflicts:
	src/tightdb/array.cpp
	src/tightdb/array.hpp
	src/tightdb/array_basic.hpp
	src/tightdb/table.hpp
Conflicts:
	src/tightdb/array_string_long.hpp
	src/tightdb/column_string.hpp
	src/tightdb/column_string_enum.cpp
Conflicts:
	src/tightdb/column.cpp
Conflicts:
	src/tightdb/array_basic_tpl.hpp
	src/tightdb/column.hpp
	src/tightdb/column_basic_tpl.hpp
	src/tightdb/table.hpp
Conflicts:
	src/tightdb/column.hpp
Conflicts:
	src/tightdb/array.hpp
	src/tightdb/array_string.hpp
	src/tightdb/array_string_long.cpp
	src/tightdb/array_string_long.hpp
Conflicts:
	src/tightdb/array_basic.hpp
	src/tightdb/array_string.cpp
	src/tightdb/array_string.hpp
	src/tightdb/array_string_long.cpp
	src/tightdb/array_string_long.hpp
	src/tightdb/column.cpp
	src/tightdb/column.hpp
	src/tightdb/column_string.cpp
	src/tightdb/column_string.hpp
	src/tightdb/column_string_enum.cpp
Conflicts:
	src/tightdb/array_basic.hpp
	src/tightdb/array_string.hpp
	src/tightdb/column.hpp
	src/tightdb/column_basic.hpp
	src/tightdb/column_string.hpp
	src/tightdb/table.hpp
Conflicts:
	src/tightdb/array.cpp
	src/tightdb/array.hpp
Conflicts:
	src/tightdb/array.hpp
	src/tightdb/array_basic.hpp
	src/tightdb/column.hpp
	src/tightdb/column_basic.hpp
	src/tightdb/column_basic_tpl.hpp
	src/tightdb/column_string.hpp
@jenkins-tightdb
Copy link

Test FAILed.
Refer to this link for build results: https://ci.tightdb.com/job/core_pullreqs/17/

Conflicts:
	src/tightdb/array.hpp
	src/tightdb/array_basic.hpp
	src/tightdb/array_basic_tpl.hpp
	src/tightdb/column.cpp
	src/tightdb/column.hpp
	src/tightdb/column_basic.hpp
	src/tightdb/column_basic_tpl.hpp
	src/tightdb/column_string.cpp
	src/tightdb/column_string.hpp
@jenkins-tightdb
Copy link

Test FAILed.
Refer to this link for build results: https://ci.tightdb.com/job/core_pullreqs/93/

@jenkins-tightdb
Copy link

Test FAILed.
Refer to this link for build results: https://ci.tightdb.com/job/core_pullreqs/94/

@jenkins-tightdb
Copy link

Test PASSed.
Refer to this link for build results: https://ci.tightdb.com/job/core_pullreqs/100/

@jenkins-tightdb
Copy link

Test PASSed.
Refer to this link for build results: https://ci.tightdb.com/job/core_pullreqs/103/

@timanglade
Copy link

@kspangsege Is this still WIP? Did we abandon the idea?

@kspangsege
Copy link
Contributor Author

It is still WIP. As far as I know, we have not abandoned it [scarry thought
:-)].

On Tue, Feb 4, 2014 at 6:55 AM, Tim Anglade [email protected]:

@kspangsege https://github.com/kspangsege Is this still WIP? Did we
abandon the idea?

Reply to this email directly or view it on GitHubhttps://github.com//pull/53#issuecomment-34033014
.

Conflicts:
	src/tightdb/array_string_long.cpp
	src/tightdb/array_string_long.hpp
	src/tightdb/column.cpp
	src/tightdb/column.hpp
	src/tightdb/column_basic.hpp
	src/tightdb/column_basic_tpl.hpp
	src/tightdb/column_string.hpp
@jenkins-tightdb
Copy link

Test PASSed.
Refer to this link for build results: https://ci.tightdb.com/job/core_pullreqs/193/

@jenkins-tightdb
Copy link

Test FAILed.
Refer to this link for build results: https://ci.tightdb.com/job/core_pr/13/

@jenkins-tightdb
Copy link

Test FAILed.
Refer to this link for build results: https://ci.tightdb.com/job/core_pr/24/

@oleks
Copy link
Contributor

oleks commented Jun 18, 2014

I'm excited to see this merged into core!

I like the name "reduce" instead of "foldl" if we are not going to have "foldr" as well.

BTW, this is more formally called "second-order functions".

@oleks
Copy link
Contributor

oleks commented Jun 18, 2014

Uh! Filter would be nice as well.

@bmunkholm bmunkholm changed the title [WIP] Introducing higher-order functions on columns [PARKED] Introducing higher-order functions on columns Apr 13, 2015
@danielpovlsen danielpovlsen changed the title [PARKED] Introducing higher-order functions on columns Introducing higher-order functions on columns Jul 15, 2015
@jedelbo
Copy link
Contributor

jedelbo commented May 18, 2018

Abandonned

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants