Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create unique ids even for random conflicts #476

Merged

Conversation

danielmitterdorfer
Copy link
Member

@danielmitterdorfer danielmitterdorfer commented Apr 20, 2018

With this commit we ensure that generated ids are always unique, even
when generating random conflicts for bulk requests. The general strategy
to simulate conflicts is as follows:

  1. Generate a list of ids upfront.
  2. Iterate through this list and pick a duplicate in 25% of all cases.

In order to avoid accidentally creating more conflicts than expected,
the list of ids generated in step one needs to contain unique ids.
However, we create ids at random without considering which ids have
already been generated. This is now changed by first generating all ids
and then shuffling them.

With this commit we ensure that generated ids are always unique, even
when generating random conflicts for bulk requests. The general strategy
to simulate conflicts is as follows:

1. Generate a list of ids upfront.
2. Iterate through this list and pick a duplicate in 25% of all cases.

In order to avoid accidentally creating more conflicts than expected,
the list of ids generated in step one, needs to contain unique ids.
However, we create ids at random without considering which ids have
already been generated. This is now changed by first generating all ids
and then shuffling them.
@danielmitterdorfer danielmitterdorfer added bug Something's wrong :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. labels Apr 20, 2018
@danielmitterdorfer danielmitterdorfer added this to the 0.10.2 milestone Apr 20, 2018
Copy link
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, LGTM!

all_ids[i] = "%10d" % (offset + i)
else: # RandomConflicts
all_ids[i] = "%10d" % rand(offset, offset + docs_to_index)
all_ids[i] = "%10d" % (offset + i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sticking to % as we are in the hot code path right? (instead of "{:>10d}".format(offset+i)?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something's wrong :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants