Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide on the future of SketchMap, CMS, SpaceSaver #360

Closed
miguno opened this issue Nov 19, 2014 · 1 comment
Closed

Decide on the future of SketchMap, CMS, SpaceSaver #360

miguno opened this issue Nov 19, 2014 · 1 comment

Comments

@miguno
Copy link

miguno commented Nov 19, 2014

This ticket is a follow-up on the discussion we had in the pull request 354, i.e. the CMS[K] pull request.

Algebird currently has the three data structures with overlapping functionality, and the question is whether we could/should consolidate them in some way -- primarily to reduce maintenance cost.

Overlapping functionality

  • SketchMap[K, V] and CMS[K] are almost identical except that SketchMap also parameterizes the value.
    • However, the current implementation of SketchMap is significantly slower than CMS (in this experiment, SketchMap[BigInt] achieved 0.3k writes/s compared to 65k writes/s for CMS[BigInt], i.e. it was slower by two orders of magnitude).
  • SpaceSaver may be an alternative to a top-N CMS.
    • Note that a top-N CMS is not commutative, which limits is applicability in practice. (A top-% CMS is commutative though.)

Past discussions

The snippets below are taken from the conversations in GH-345 and the pull request 354, where we discussed the work on turning the old Long-based CMS into a parameterized CMS[K].

@johnynek wrote (source):
I am concerned that we have now SketchMap and CMS that are almost identical except that SketchMap parameterizes the value. Given that making the key generic didn't hurt performance, I tend to think we must just have some small bottleneck in SketchMap that could be fixed. [...] It would be great to give this a going over with yourkit and see if we can identify a few hotspots [...].

@avibryant wrote (source):
I don't think SpaceSaver replaces SketchMap, but I do think it might be a good alternative to TopNLogic. (It would be interesting to do some benchmarking of error rates between these two). But SpaceSaver doesn't even parameterize the value type currently, which is something else we should fix.

@koertkuipers
Copy link
Contributor

@avibryant
Hey Avi, what do you mean when you say: "But SpaceSaver doesn't even parameterize the value type currently, which is something else we should fix."?
I am happy to work on SpaceSaver to fix it as needed once i understand what the issue is.
Best, Koert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants