-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No feedback if iteration does not reach exact result in utils.sampling.sample_rows #178
Comments
Hi, this is not a part of the code I am familiar with. Can you link to the loop? As always just starting a discussion, and willing to help implement any fix. |
the loop is at sampling.py:59 what sampling does is:
however, the permutation of the list is only done once.
==> in this case the loop cannot converge |
So, As used in the code, What is the intended behavior of sampling from the distribution [1, 5, 1, 5, 1, 5] with a target of 10? What shud 20 draws from that distribution look like? What is the intended behavior of sampling from the distribution [1, 5, 1, 5, 1, 5] with a target of 2? What shud 20 draws from that distribution look like? |
Well, sorry for being less than clear. |
I actually wrote this method. Providing notification if the total is not met is a good idea. Something I just never implemented. Per the removal not converging, I think this issue is in lines 83- 84: if not replace:
np.append(sample_idx, curr_rows.index.values) The intent here was to add the removed items back to the end of the list, to avoid this. But it was implemented improperly, this should probably be: if not replace:
sample_idx = np.append(sample_idx, curr_rows.index.values) Sorry, I can test this a little further to see if this fixes the problem. Having said that, I could still see cases where the total cannot be matched exactly, so it makes sense to provide feedback as you suggest. |
hello, I just stumbled upon the problem while updating tests and in some bad luck random cases, the test would pass while in others, it would fail. Most of the time, the model will not care if the exact value is reached or not. However the behavior becomes unpredictable for automatic tests. That's why a notification could be useful when the target is approximated. I'll be glad to further discuss the topic if necessary. |
Per the testing issue, I've been wrapping my calls inside a function that 1st sets up the environment: https://github.com/AZMAG/smartpy_core/blob/master/smartpy_core/sampling.py#L74 So instead of sample_rows(20, my_df, replace=False, accounting_column='my_col') Your call would then be: seed = 123
func = sample_rows
seeded_call(seed, func, my_df, replace=False, accounting_col='my_col') There's perhaps a better way to do this, but I've found that it provides consistent test results without too much overhead. Using pyest fixtures are another way to handle this, this how the tests for the samle_rows method were written: https://github.com/UDST/urbansim/blob/master/urbansim/utils/tests/test_sampling.py#L8 The non-converging issue you describe is still a concern, I'll look into it. |
So in use, a sample that does not exactly add up is correct, but for testing it is annoying? |
For the case when
Untested as there isn't test code for this yet, as per this discussion. |
I think you're right. That's a lot cleaner. But maybe after doing the search sorted, sample a household with size equal to the overage (assuming one exists)? That way the total could be hit exactly. |
Such a fix over samples the small elements. That may be the intended behavior, but I'd find it surprising givan the current dock string. |
Good point. I have a new version that does the cumsum and then iterates over each remaining element until the total is met. I'll put in a PR after I do some more testing. |
See the above pull request. Let me know if you see any issues with these changes or if this doesn't adequately address the issue. |
Closing this issue for now as #180 is merged. |
Function
utils.sampling.sample rows
includes a loop which shall converge to an exact pick of samples.In some cases, this loop cannot converge.
Example:
Even if an exact pick is possible, the random permutation l.51 causes a further problem. If the values needed for an exact pick are at the head of the randomized list
sample_idx
, they are not available in the loop to adjust for an inexact pick, the loop cannot converge.This aspect depends on the random seed, so it can make integration tests fail randomly.
Suggested solution:
there should be some feedback from the
sample_rows
function whenever convergence to an exact pick is not achieved. The caller may use this information if he wishes. In this way a more robust test suite can be obtained.The text was updated successfully, but these errors were encountered: