-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add range access methods to BitArray #4397
Conversation
spec/std/bit_array_spec.cr
Outdated
a[0, 0].should eq(a) | ||
end | ||
|
||
it "gets 0 ... 0 on empty array" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You meant 0 .. 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/bit_array.cr
Outdated
return LibC.memcmp(@bits, other.@bits, malloc_size) == 0 | ||
end | ||
|
||
def ==(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this? An equality check against anything else shouldn't gives a compile error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it shouldn't. There have been discussions around this before but this is how the rest of the stdlib is implemented.
src/bit_array.cr
Outdated
|
||
count = Math.min(count, size - start) | ||
|
||
if @size <= 32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You always access the attribute size
through his getter, for consistency you can remove the @
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/bit_array.cr
Outdated
@@ -125,4 +222,20 @@ struct BitArray | |||
private def malloc_size | |||
(@size / 32.0).ceil.to_i | |||
end | |||
|
|||
# FIXME: this was copied from Array, we should deduplicate implementations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add this not in Array also. So the duplication can be discovered from either side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could put this method as a nodoc method on Enumerable
or Indexable
. But if we find its ugly to deduplicate then yes, duplicating the message would be wise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move it as a :nodoc:
in Indexable
(and also deduplicate String#range_to_index_and_size
;-) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Another one! I'll ensure to do that.
src/bit_array.cr
Outdated
def ==(other : BitArray) | ||
return false if size != other.size | ||
# NOTE: If BitArray implements resizing, there may be more than 1 binary | ||
# representation for equivalent BitArrays after a downsize. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean because of the leading unused bits would not be guaranteed to be 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you add that reason as a clarification?
This should be done now, after you've reviewed I can rebase this into 3 nice commits :) |
I might be picky, but it feels more natural here to use instance methods so you don't need to BTW, kudos for doing the bitwise operations as efficient as possible 👍 |
@bcardiff String doesn't implement Indexable so it has to be a class level method. The bit ops aren't nearly optimal:
I think I'll have a go at these extra optimizations tonight. |
Oh! I read it wrong. So the optimization is only when size < 32. I would say that we need to apply that for the general case then. Read by chunks of UInt32, shift / merge the bytes of the source to write just once in the new BitArray. No need to 64/128 bits I think. I were right regarding String and Indexable. Let's keep that helper as a module method then for now. But I would change the |
@bcardiff I added the 64bit optimisation before I read your message (it's a pretty simple optimization so it shouldn't matter), and implemented proper multi-bit bitshifts for I also completely changed the variable naming in |
Oh, forgot to mention the CI failure for this build is very strange and probably unrelated to this PR. Unfortunately the stacktrace doesn't give much info on which spec failed and it's likely to be unreproducible. |
@RX14 one more round,
|
@bcardiff I prefer random tests because they make me more confident that edge cases I haven't appreciated have been tested. With 10 000 iterations the chances that it doesn't test more than a singular static test case would is low. However, I've expanded the large test to be more comprehensive and cover ranges which cover 3 loop iterations. I hope you'll leave the random test in (many test harnesses have provisions for randomised testing), but it should be fine without. I've also added the missing test for |
Why 10_000 and not 100 or 1_000_000? I am more in favor of edge & coverage cases. Relying on random specs it could lead to an excuse of not thinking cases. If you would like to run that to discover edge cases, ok, is like a test generation. If someone is working on other unrelated feature and suddenly this spec fails, I doubt the programmer will switch context to grab that seed, report, or even fix it. If we choose to do this kind of testing we should split them in other target since they could be really slower. If so, we should have a criteria when to do this or not. Currently the implicit is never. But besides that, for example in string mixing ascii and non ascii could be a better candidate for this. But again I see that more like a test generation rather than a std spec. And the story should be complete, searching/emitting a minimals test case as in https://en.wikipedia.org/wiki/QuickCheck So, would you support that string spec's should randomly create thousands of strings and perform thousand of string manipulations in every single spec run? |
@bcardiff Good points, I agree with you. I've removed the test case. |
Thanks @RX14 ! |
Closes #3968.