Add range access methods to BitArray #4397

RX14 · 2017-05-10T17:25:32Z

Closes #3968.

bew · 2017-05-10T19:36:13Z

spec/std/bit_array_spec.cr

+      a[0, 0].should eq(a)
+    end
+
+    it "gets 0 ... 0 on empty array" do


You meant 0 .. 0 ?

bew · 2017-05-10T19:40:24Z

src/bit_array.cr

+    return LibC.memcmp(@bits, other.@bits, malloc_size) == 0
+  end
+
+  def ==(other)


Why this? An equality check against anything else shouldn't gives a compile error?

No, it shouldn't. There have been discussions around this before but this is how the rest of the stdlib is implemented.

bew · 2017-05-10T19:45:55Z

src/bit_array.cr

+
+    count = Math.min(count, size - start)
+
+    if @size <= 32


You always access the attribute size through his getter, for consistency you can remove the @ here

bcardiff · 2017-05-11T15:21:39Z

src/bit_array.cr

@@ -125,4 +222,20 @@ struct BitArray
  private def malloc_size
    (@size / 32.0).ceil.to_i
  end
+
+  # FIXME: this was copied from Array, we should deduplicate implementations


I would add this not in Array also. So the duplication can be discovered from either side.

We could put this method as a nodoc method on Enumerable or Indexable. But if we find its ugly to deduplicate then yes, duplicating the message would be wise.

Let's move it as a :nodoc: in Indexable (and also deduplicate String#range_to_index_and_size ;-) )

Ah! Another one! I'll ensure to do that.

bcardiff · 2017-05-11T15:27:49Z

src/bit_array.cr

+  def ==(other : BitArray)
+    return false if size != other.size
+    # NOTE: If BitArray implements resizing, there may be more than 1 binary
+    # representation for equivalent BitArrays after a downsize.


you mean because of the leading unused bits would not be guaranteed to be 0?

Would you add that reason as a clarification?

RX14 · 2017-05-11T23:20:10Z

This should be done now, after you've reviewed I can rebase this into 3 nice commits :)

bcardiff · 2017-05-12T00:27:05Z

I might be picky, but it feels more natural here to use instance methods so you don't need to Indexable. ... , size). And if either the method is named index_and_size or the variable size should be renamed to count.

BTW, kudos for doing the bitwise operations as efficient as possible 👍

RX14 · 2017-05-12T08:15:36Z

@bcardiff String doesn't implement Indexable so it has to be a class level method.

The bit ops aren't nearly optimal:

I could use 64bit load/stores to optimise for sizes below 64, or even 128bit if we ever get Int128 support. (I just realised this optimization in the shower)
Currently the loop only transfers 1 bit per iteration, but I could do a bit more work to make it transfer 1 whole byte per iteration. (I just realised this was easier than I thought in the shower)

I think I'll have a go at these extra optimizations tonight.

bcardiff · 2017-05-12T17:27:41Z

Oh! I read it wrong. So the optimization is only when size < 32. I would say that we need to apply that for the general case then. Read by chunks of UInt32, shift / merge the bytes of the source to write just once in the new BitArray. No need to 64/128 bits I think.

I were right regarding String and Indexable. Let's keep that helper as a module method then for now. But I would change the _count for _size still. It bothers to read size / count mixed 🙏

RX14 · 2017-05-12T18:33:59Z

@bcardiff I added the 64bit optimisation before I read your message (it's a pretty simple optimization so it shouldn't matter), and implemented proper multi-bit bitshifts for BitArray. I added an extra spec which checks BitArray vs Array(Bool) (which prints the seed when it fails).

I also completely changed the variable naming in range_to_index_and_count.

RX14 · 2017-05-12T18:45:32Z

Oh, forgot to mention the CI failure for this build is very strange and probably unrelated to this PR. Unfortunately the stacktrace doesn't give much info on which spec failed and it's likely to be unreproducible.

bcardiff · 2017-05-15T17:49:50Z

@RX14 one more round,

I don't see specs that covers the case for bitarrays between 32 and 64. (I don't think it is needed, but if it is there, then we need a spec, but also I would suggest to check that llvm generates an efficient code, without additional bitwise operations to build that 64bits integer).
I don't think random specs should be used here. There is no guarantee which case is been stressed. I would prefer to build a concrete bitarray to shift. I get that printing the seed should be enough to reproduce it but I find it simpler to not use randomness here.

RX14 · 2017-05-17T18:21:38Z

@bcardiff I prefer random tests because they make me more confident that edge cases I haven't appreciated have been tested. With 10 000 iterations the chances that it doesn't test more than a singular static test case would is low.

However, I've expanded the large test to be more comprehensive and cover ranges which cover 3 loop iterations. I hope you'll leave the random test in (many test harnesses have provisions for randomised testing), but it should be fine without.

I've also added the missing test for 32 < size <= 64.

bcardiff · 2017-05-17T19:06:56Z

Why 10_000 and not 100 or 1_000_000?
Doing thousands of iterations right now will make this specs slower.

I am more in favor of edge & coverage cases. Relying on random specs it could lead to an excuse of not thinking cases. If you would like to run that to discover edge cases, ok, is like a test generation.

If someone is working on other unrelated feature and suddenly this spec fails, I doubt the programmer will switch context to grab that seed, report, or even fix it.

If we choose to do this kind of testing we should split them in other target since they could be really slower. If so, we should have a criteria when to do this or not. Currently the implicit is never. But besides that, for example in string mixing ascii and non ascii could be a better candidate for this. But again I see that more like a test generation rather than a std spec. And the story should be complete, searching/emitting a minimals test case as in https://en.wikipedia.org/wiki/QuickCheck

So, would you support that string spec's should randomly create thousands of strings and perform thousand of string manipulations in every single spec run?

RX14 · 2017-05-17T19:46:49Z

@bcardiff Good points, I agree with you. I've removed the test case.

bcardiff · 2017-05-18T17:09:29Z

Thanks @RX14 !

RX14 added 2 commits May 10, 2017 18:06

Implement BitArray#==

8df7686

Add BitArray range access methods

7733de1

bew reviewed May 10, 2017

View reviewed changes

fixup! Add BitArray range access methods

9796c0c

bcardiff reviewed May 11, 2017

View reviewed changes

RX14 added 3 commits May 12, 2017 00:16

Move range_to_index_and_count to Indexable

faa7d07

fixup! Implement BitArray#==

846b12f

fixup! Add BitArray range access methods

d639ffe

RX14 added 2 commits May 12, 2017 18:39

fixup! Add BitArray range access methods

5091999

fixup! Move range_to_index_and_count to Indexable

37a94e9

fixup! Add BitArray range access methods

6e7ec34

fixup! Add BitArray range access methods

7e100ba

bcardiff added kind:feature topic:stdlib labels May 18, 2017

bcardiff added this to the Next milestone May 18, 2017

bcardiff merged commit 2a71284 into crystal-lang:master May 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add range access methods to BitArray #4397

Add range access methods to BitArray #4397

RX14 commented May 10, 2017

bew May 10, 2017

RX14 May 10, 2017

bew May 10, 2017

RX14 May 10, 2017

bew May 10, 2017

RX14 May 10, 2017

bcardiff May 11, 2017

RX14 May 11, 2017

bcardiff May 11, 2017

RX14 May 11, 2017

bcardiff May 11, 2017

RX14 May 11, 2017

bcardiff May 11, 2017

RX14 commented May 11, 2017

bcardiff commented May 12, 2017

RX14 commented May 12, 2017

bcardiff commented May 12, 2017

RX14 commented May 12, 2017

RX14 commented May 12, 2017

bcardiff commented May 15, 2017

RX14 commented May 17, 2017 •

edited

Loading

bcardiff commented May 17, 2017

RX14 commented May 17, 2017 •

edited

Loading

bcardiff commented May 18, 2017

Add range access methods to BitArray #4397

Add range access methods to BitArray #4397

Conversation

RX14 commented May 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RX14 commented May 11, 2017

bcardiff commented May 12, 2017

RX14 commented May 12, 2017

bcardiff commented May 12, 2017

RX14 commented May 12, 2017

RX14 commented May 12, 2017

bcardiff commented May 15, 2017

RX14 commented May 17, 2017 • edited Loading

bcardiff commented May 17, 2017

RX14 commented May 17, 2017 • edited Loading

bcardiff commented May 18, 2017

RX14 commented May 17, 2017 •

edited

Loading

RX14 commented May 17, 2017 •

edited

Loading