Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an efficient Count method to find the number of items in each index. #83

Open
invincible-akshay opened this issue Nov 26, 2020 · 3 comments

Comments

@invincible-akshay
Copy link

Is there a reason why functions like COUNT(), MIN(), etc are not provided?
Or is there a way to achieve these using existing functionalities?

@radeksimko
Copy link
Member

I can see how a dedicated Count() could be useful. I'm not sure if there's any any more elegant way to count records by introspecting the underlying radix tree somehow, but here's a way:

txn := db.Txn(false)

it, err := txn.Get("person", "id")
if err != nil {
	panic(err)
}
count := 0
for obj := it.Next(); obj != nil; obj = it.Next() {
	count += 1
}

As for MIN() and MAX() - Unlike in case of COUNT() that would only be relevant for numeric indexes, so I'm not sure how well that would fit into the API.

However given the right index (e.g. IntFieldIndex or UintFieldIndex), you can essentially just use txn.First() and txn.Last() as the index will maintain ordered records.

@banks
Copy link
Member

banks commented Jul 14, 2023

There are a few subtleties here.

First up count seems common enough that the underlying radix tree root node could probably be updated to maintain a separate counter so it's cheap to fetch without walking the whole tree (or index in MemDB terms). That would need a change to our radix tree though but a welcome PR! We could then have something like Count(table, index string) int in the API. It would be possible to implement that right now but under the hood would be no more efficient than Radek's example of just iterating the whole index until the radix tree supports maintaining a counter.

It's possible to get the min/max of the keys of any index efficiently already: I'm not sure any further API can make that much simpler.

min, _ := db.First(table, index)
max, _ := db.Last(table, index)

But if you were thinking the min/max of some numeric field within each object in a table... that's way more involved and not something we support now - only indexes are even in the schema so as Radek said it would only work over numeric fields that were indexed and in that case the methods above should work. Any non-indexed field would require a whole lot more schema handling etc. I think at that point it's best to just iterate the index or maintain summary statistics about the data in a separate table that you updated transactionally with each update.

Id say the same for any more complex aggregation - either calculate it yourself by iterating (which is the best we could do internally anyway) or if that's too expensive then de-normalize and keep the stats up to date in another table!

I'll change the title of this because I think the count part is reasonable and would be a nice addition but does depend on changes to iradix to do better than Radek's example above of just iterating every time.

@banks banks changed the title Common Aggregation Functions like COUNT(), MIN(), MAX() can be added Add an efficient Count method to find the number of items in each index. Jul 14, 2023
@invincible-akshay
Copy link
Author

Thanks for the explanation @banks and @radeksimko
I can take a look at the node and probably work on adding the Count using your suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants