Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add range access methods to BitArray #4397

Merged
merged 10 commits into from
May 18, 2017

Conversation

RX14
Copy link
Contributor

@RX14 RX14 commented May 10, 2017

Closes #3968.

a[0, 0].should eq(a)
end

it "gets 0 ... 0 on empty array" do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You meant 0 .. 0 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

src/bit_array.cr Outdated
return LibC.memcmp(@bits, other.@bits, malloc_size) == 0
end

def ==(other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this? An equality check against anything else shouldn't gives a compile error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it shouldn't. There have been discussions around this before but this is how the rest of the stdlib is implemented.

src/bit_array.cr Outdated

count = Math.min(count, size - start)

if @size <= 32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You always access the attribute size through his getter, for consistency you can remove the @ here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

src/bit_array.cr Outdated
@@ -125,4 +222,20 @@ struct BitArray
private def malloc_size
(@size / 32.0).ceil.to_i
end

# FIXME: this was copied from Array, we should deduplicate implementations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add this not in Array also. So the duplication can be discovered from either side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could put this method as a nodoc method on Enumerable or Indexable. But if we find its ugly to deduplicate then yes, duplicating the message would be wise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move it as a :nodoc: in Indexable (and also deduplicate String#range_to_index_and_size ;-) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Another one! I'll ensure to do that.

src/bit_array.cr Outdated
def ==(other : BitArray)
return false if size != other.size
# NOTE: If BitArray implements resizing, there may be more than 1 binary
# representation for equivalent BitArrays after a downsize.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean because of the leading unused bits would not be guaranteed to be 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you add that reason as a clarification?

@RX14
Copy link
Contributor Author

RX14 commented May 11, 2017

This should be done now, after you've reviewed I can rebase this into 3 nice commits :)

@bcardiff
Copy link
Member

I might be picky, but it feels more natural here to use instance methods so you don't need to Indexable. ... , size). And if either the method is named index_and_size or the variable size should be renamed to count.

BTW, kudos for doing the bitwise operations as efficient as possible 👍

@RX14
Copy link
Contributor Author

RX14 commented May 12, 2017

@bcardiff String doesn't implement Indexable so it has to be a class level method.

The bit ops aren't nearly optimal:

  1. I could use 64bit load/stores to optimise for sizes below 64, or even 128bit if we ever get Int128 support. (I just realised this optimization in the shower)
  2. Currently the loop only transfers 1 bit per iteration, but I could do a bit more work to make it transfer 1 whole byte per iteration. (I just realised this was easier than I thought in the shower)

I think I'll have a go at these extra optimizations tonight.

@bcardiff
Copy link
Member

Oh! I read it wrong. So the optimization is only when size < 32. I would say that we need to apply that for the general case then. Read by chunks of UInt32, shift / merge the bytes of the source to write just once in the new BitArray. No need to 64/128 bits I think.

I were right regarding String and Indexable. Let's keep that helper as a module method then for now. But I would change the _count for _size still. It bothers to read size / count mixed 🙏

@RX14
Copy link
Contributor Author

RX14 commented May 12, 2017

@bcardiff I added the 64bit optimisation before I read your message (it's a pretty simple optimization so it shouldn't matter), and implemented proper multi-bit bitshifts for BitArray. I added an extra spec which checks BitArray vs Array(Bool) (which prints the seed when it fails).

I also completely changed the variable naming in range_to_index_and_count.

@RX14
Copy link
Contributor Author

RX14 commented May 12, 2017

Oh, forgot to mention the CI failure for this build is very strange and probably unrelated to this PR. Unfortunately the stacktrace doesn't give much info on which spec failed and it's likely to be unreproducible.

@bcardiff
Copy link
Member

@RX14 one more round,

  1. I don't see specs that covers the case for bitarrays between 32 and 64. (I don't think it is needed, but if it is there, then we need a spec, but also I would suggest to check that llvm generates an efficient code, without additional bitwise operations to build that 64bits integer).

  2. I don't think random specs should be used here. There is no guarantee which case is been stressed. I would prefer to build a concrete bitarray to shift. I get that printing the seed should be enough to reproduce it but I find it simpler to not use randomness here.

@RX14
Copy link
Contributor Author

RX14 commented May 17, 2017

@bcardiff I prefer random tests because they make me more confident that edge cases I haven't appreciated have been tested. With 10 000 iterations the chances that it doesn't test more than a singular static test case would is low.

However, I've expanded the large test to be more comprehensive and cover ranges which cover 3 loop iterations. I hope you'll leave the random test in (many test harnesses have provisions for randomised testing), but it should be fine without.

I've also added the missing test for 32 < size <= 64.

@bcardiff
Copy link
Member

Why 10_000 and not 100 or 1_000_000?
Doing thousands of iterations right now will make this specs slower.

I am more in favor of edge & coverage cases. Relying on random specs it could lead to an excuse of not thinking cases. If you would like to run that to discover edge cases, ok, is like a test generation.

If someone is working on other unrelated feature and suddenly this spec fails, I doubt the programmer will switch context to grab that seed, report, or even fix it.

If we choose to do this kind of testing we should split them in other target since they could be really slower. If so, we should have a criteria when to do this or not. Currently the implicit is never. But besides that, for example in string mixing ascii and non ascii could be a better candidate for this. But again I see that more like a test generation rather than a std spec. And the story should be complete, searching/emitting a minimals test case as in https://en.wikipedia.org/wiki/QuickCheck

So, would you support that string spec's should randomly create thousands of strings and perform thousand of string manipulations in every single spec run?

@RX14
Copy link
Contributor Author

RX14 commented May 17, 2017

@bcardiff Good points, I agree with you. I've removed the test case.

@bcardiff bcardiff added this to the Next milestone May 18, 2017
@bcardiff bcardiff merged commit 2a71284 into crystal-lang:master May 18, 2017
@bcardiff
Copy link
Member

Thanks @RX14 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants