FIX: rank-size test by inc. sample size #556
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As mentioned in PR #551 the test for
rank_size
method checks for linearity in the output log-log plot. I did this by performing linear regression oflog(size_data)
onlog(rank_data)
and comparing the resultingr_squared
value to 1.Turns out an error tolerance of
1e-4
for a sample size of1000
was a little optimistic. I have increased the sample size to10000
, reduced the tolerance to1e-3
and the comparison works as expected now.Another mistake was setting the seed for exponential draw but not the Pareto draw, I've fixed that as well. I'm not sure if setting a seed here is a good practice in the first place, if not I can remove it and increase sample size / reduce tolerance further.
Please let me know if this is an acceptable fix. This should close #553.