Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Method for deduping reports #26

Closed
tomwilkie opened this issue May 12, 2015 · 6 comments
Closed

Method for deduping reports #26

tomwilkie opened this issue May 12, 2015 · 6 comments
Assignees
Milestone

Comments

@tomwilkie
Copy link
Contributor

With #13, the possibility of an app connecting to a probe multiple times has been introduced. This will allow an app to receive the same report more than once, and potentially double count. We need to disallow this.

Options:

  • Make report merge fully idempotent ie r1.merge(r1) == r1 and so on
  • Make probes only accept one connection per app
  • Include a random unique id in each report, and dedupe on the app

I like (1), as it has nice properties for the future; but it is also the hardest.

@peterbourgon
Copy link
Contributor

I like (1), too, but I can't think of a way to do it that isn't infeasibly inefficient, or explodes the complexity of managing (merging) reports. The other idea is having apps do discovery + leader election + request forwarding.

@tomwilkie
Copy link
Contributor Author

I think one might be able to make it a relatively efficient operation, but I'm worries about an explosion in space. It basically boils down to idempotent counters.

Simple idea: Consider keeping the counter as a list of increments (an increment being an id and a value); you then keep a running summary and a bloom filter of ids. You can add to the summary count if you get a miss on the bloom filter; if you get a hit, you must re-count. I think thats close to O(1) probabilistic update cost, but O(N) size.

@tomwilkie
Copy link
Contributor Author

Note this is effectively the same as (1), except one is done on the entire report, and assumes reports are only merged once.

@tomwilkie
Copy link
Contributor Author

Whats more, make the running summary lazy; only calculate it on a read of the counter.

@tomwilkie
Copy link
Contributor Author

Short-term we'll only accept one connection per app, based on source ip.

@peterbourgon
Copy link
Contributor

Closed by #56

rade added a commit that referenced this issue May 19, 2015
Rewrite of ConnectionMaker

Fixes #23. Fixes #26.
jml added a commit that referenced this issue Jul 15, 2016
e9e7e6b Merge pull request #26 from weaveworks/this-time-for-sure
df494d6 Remove dependencies
c045d16 Properly exclude vendor from lint
2cfcf08 Add blacklist to wcloud client
ca6ebfb Merge pull request #25 from weaveworks/fix-brokenness
bfb1747 Test directories need ./ prefixes, obviously.
5b9b314 Merge pull request #24 from weaveworks/find-files
8786427 Remove spurious debugging code from test
8b7ec6e Speed up test by using git ls-files
cf53dc1 Exclude vendor from shell linting
b2ab380 Fix field name
c86fd3d Add notification config for wcloud
f643920 Merge pull request #23 from weaveworks/only-lint-git-files
47a0152 Only lint git files
50d47f9 Merge pull request #22 from weaveworks/shell-lint

git-subtree-dir: tools
git-subtree-split: e9e7e6b
2opremio pushed a commit that referenced this issue Aug 26, 2016
Properly exclude vendor from lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants