Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grisu #7291

Merged
merged 17 commits into from
Aug 27, 2014
Merged

Grisu #7291

merged 17 commits into from
Aug 27, 2014

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Jun 17, 2014

Implement grisu algorithm in native Julia. Fixes #5959. In particular, this creates a more Julian API to grisu in allowing any size float to be printed in its "shortest" form (while still being parsed back to the correct value), which is a surprisingly tricky problem (link).

Net code reduction is nice from 6K C++ to about 1K Julia. Initial performance benchmarks show it's just under 2x the native C++.

I would also appreciate someone testing this out on a 32-bit machine; I don't foresee any cross-platform issues, but some of the bit-twiddling may cause problems between 32/64-bit.

@JeffBezanson
Copy link
Member

Wow!!

I hope that test file was automatically generated? :)

@staticfloat
Copy link
Member

Wow indeed. That is a massive test file. :P

If you want to remove double-conversion from the Makefiles, you just need to look at deps/Makefile, grep for double-conversion and grisu and remove them all. (We use the two names interchangeably sometimes)

Chunks like this can just be removed in their entirety, and when you're removing dependencies you'll need to excise them from makefile rules that depend on them. These two are the only ones you'll need to worry about in deps/Makefile.

You should remove any patches or wrappers we have stored in deps/ for double-conversion, update README's, licenses and such. Just searching for "grisu"/"double-conversion" will show up all these little places you need to modify. A lot of makefile-fu is just finding all the little places this stuff is scattered around. :/ If you run into an error, feel free to ping me about it.

@nalimilan
Copy link
Member

So you mean I've packaged double-conversion in Fedora for nothing? I can't keep up! ;-)

@jiahao
Copy link
Member

jiahao commented Jun 17, 2014

quelle surprise!

@ivarne
Copy link
Member

ivarne commented Jun 17, 2014

@nalimilan This is not tagged for 0.3, and I can't see any reason why this should be merged right before a release. That way (I think) you'll have plenty of time to maintain the fedora package until it becomes obsolete.

@nalimilan
Copy link
Member

@ivarne Glad to hear that!

@JeffBezanson
Copy link
Member

Very flashy, posting this the day after the Dates.jl PR :) How does he do it??!

@quinnj
Copy link
Member Author

quinnj commented Jun 17, 2014

Well, I figured if I was going to do a PR that was net +3K code and tests (Dates.jl), I should also do one that made up for it! ;)

Yeah, there's no reason this needs to be 0.3. The great thing about a change like this is its purely internal machinery.

A few more thoughts I forgot to share:

  • This actually simplified the previous Grisu module because we don't need separate methods for every different float type. Using dispatch, you just send in your float and the grisu algorithm figures out how to handle it
  • Thus, full support for Float16 is ready to go. Support for a Float80 type would be trivial (3-4 one liners). A full Float128 type would require a little refactoring of the internal Float type to make sure it can fit a full 112 bit significand, but probably wouldn't be too much work.
  • Another thing to note is the use of BigInts for the fallback bignumdtoa code. The double conversion library actually rolled their own BigInt of sorts for this code, and I opted to skip those 2K lines of code :) That said, it might be nice to have some kind of fallback/workaround if gmp isn't available for some reason (along the lines of default packages? #1906). On the other hand, grisu is claimed to handle 99.4% of all 64-bit floats, so it probably wouldn't even be terrible to show Float64(0x9af9083727) or something in those cases.
  • As large as the test file may be, it's actually about 1/10th of the full test suite! They have some massive precomputed files (~20MB or so) with a couple hundred thousand tests to run. I have the package Grisu.jl to run all of them on Pkg.test("Grisu"), but it takes a good minute or two to run, so I opted for the hand-rolled tests committed above. I can add in the full testsuite if we like, but I thought it might be a little overkill.

…nce with the grisu alogrithm (printing the shortest form that can be read back in accurately).
@ViralBShah
Copy link
Member

Really cool! It would be great to avoid the dependency on double-conversion.

@quinnj
Copy link
Member Author

quinnj commented Jun 18, 2014

One other thing to note (now that I fixed the travis failure for it), is that, following up on #5948, realmax(Float16) now prints float16(65500.0) (again), but this time in congruence with the grisu algorithm. Grisu guarantees that the representation with the fewest digits that can be parsed back in accurately will be generated, which in this case is 65500.0, even though the actual value is 65504.0 (as can be seen by showcompact(realmax(Float16)). Just wanted to make a note on this in case people were wondering why a test case for a previous issue had been changed.

@andrioni
Copy link
Member

The use of GMP here seems to be very specific, so we could probably roll out a small pure-Julia quasi-BigInt implementation if desired/needed.

@quinnj
Copy link
Member Author

quinnj commented Jun 18, 2014

Yeah, the BigInt double-conversion rolls doesn't actually look too complicated, so that could be a good template to go off.

I did have a question for all the license-experts out there: the original library is here, being almost completely ignorant of license stuff, do I need to copy over the license from the original files? Anything else I need to do?

/dsfmt-*
/fftw-*
!fftw-config-nopthreads.patch
/git-*
/gmp-*
/grisu-*
/libgrisu.so
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be good to keep these gitignore patterns so that one can switch branches without risking accidentally checking in these files (or be curious as to why they suddenly appeared. It is somewhat annoying when git complains about files not checked inn.

Maybe we could have a block at the end of the .gitignore file something like:

# obsolete dependencies
/double-conversion-*
grisu-*
/libgrisu.so

Some automatic way of deletion would probably also be good. What about adding a few rm -rf to make clean

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I'll revert the .gitignore changes. For the deletion part, you're saying one someone runs make clean to have a rule that gets rid of old deps/double-conversion stuff?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't worry about the automatic deletion. In general, we do not delete old versions when we update to new versions. Over time, people will compile a fresh checkout of julia and it will be clean. @staticfloat Thoughts?

@Keno
Copy link
Member

Keno commented Jun 18, 2014

Depends. If you looked at the code while implementing this (rather than the original paper), this probably counts as a derived work, so you'll have to include the license.

@StefanKarpinski
Copy link
Member

We may be able to ask Florian Loitisch if we can relicense it as MIT since BSD 2-clause is equivalent and link to his library as the original code.

@JeffBezanson
Copy link
Member

In the longer term, we have to consider that many people might not want to depend on GMP, but want to print floats.

@quinnj
Copy link
Member Author

quinnj commented Jun 18, 2014

@JeffBezanson, for sure, and just to reiterate, we're only talking about 0.6% of all Float64s that can't be reliably printed with the native grisu algorithm, which in my mind, makes it not as big a deal if the fallback is simply Float64(0x39da9f90adf).

I definitely spent a good chunk of time digging through the actual code (as well as reading thru the paper). @StefanKarpinski, that seems like a good idea, so I would just stick a link to his library code at the top of each file?

@nalimilan
Copy link
Member

We may be able to ask Florian Loitisch if we can relicense it as MIT since BSD 2-clause is equivalent and link to his library as the original code.

Unfortunately, it appears that Google owns the double-conversion code base. Florian asked me to sign a CLA when I added a few lines to make a package. So any relicensing will probably have to be accepted by the Google management...

@StefanKarpinski
Copy link
Member

In a sense that's good news since there's a single owner. For now, we can just stick the BSD license and on each of those files. BSD isn't viral and is compatible with MIT so it's not problematic. It would just be simpler if we could have these be MIT-licensed. We could also ask for one-off permission to create an MIT-licensed derived work.

@ViralBShah
Copy link
Member

The BSD 2-clause license is fully compatible with MIT. Why bother with relicensing? It's not like the rest of the codebase and dependencies are all MIT licensed anyways.

@Keno
Copy link
Member

Keno commented Jun 19, 2014

The original code is BSD 3-clause

@quinnj
Copy link
Member Author

quinnj commented Jun 23, 2014

Ok, make the .gitignore tweaks and just reposted the double-conversion license into the grisu/ directory. Is that enough? Do we need to have the license visible anywhere else? Do I need to add anything else to the license?

@ivarne
Copy link
Member

ivarne commented Jun 23, 2014

Maybe a comment block on the top of each file that says something like:

# This file implements the GRISU algorigthm, but its implementation was
# inspired by the double-conversion library from Google. The original 
# copyright and 3 clause BSD licence in base/grisu/LICENSE may apply

The licence file might also benefit from a header explaining why it appares in the middle of a Julia repository.

@IainNZ
Copy link
Member

IainNZ commented Aug 27, 2014

lol

@StefanKarpinski
Copy link
Member

Wooooooooooo!!!!

@StefanKarpinski
Copy link
Member

I love the dev period.

@IainNZ
Copy link
Member

IainNZ commented Aug 27, 2014

[PkgEval] Every package may have a testing issue on Julia 0.4 (2014-08-28)

@StefanKarpinski
Copy link
Member

Well, that's exciting.

@quinnj
Copy link
Member Author

quinnj commented Aug 27, 2014

How bad can it be with a travis green light, right? Plus, this crazy-everything's-going-to-break dev period has been far too mild so far IMO. I think I'm the only one who's actually broken anything... 😉

@quinnj quinnj deleted the jq/grisu branch August 27, 2014 23:07
@StefanKarpinski
Copy link
Member

Was that not a real warning message, @IainNZ?

@IainNZ
Copy link
Member

IainNZ commented Aug 27, 2014

No sorry was just a joke! I'm sure it'll be fine. We'll find out in 12 hours or so!

@quinnj
Copy link
Member Author

quinnj commented Aug 27, 2014

I'm skeptical there are many packages relying on/testing how floats are printed anyway. It'll be interesting to see.

@StefanKarpinski
Copy link
Member

Haha. Ok, I was not sure but even if we did break all packages on 0.4-dev, that would be ok.

@vtjnash
Copy link
Member

vtjnash commented Aug 28, 2014

So what library is going to be ported next?

:)

@quinnj
Copy link
Member Author

quinnj commented Aug 28, 2014

I think good candidates for porting are

  • Relatively stable or not actively developed
  • Provide more functionality than Julia actually needs/wraps
  • or the functionality it does provide isn't quite flexible/extensible enough for the richness of Julia

That said, I have no idea what other dependencies we'd want to eventually get rid of; libuv? openlibm? pcre? mpfr/gmp for better performance?

@jiahao
Copy link
Member

jiahao commented Aug 28, 2014

OpenBLAS. You know you want to.

@tkelman
Copy link
Contributor

tkelman commented Aug 28, 2014

Single best candidate for replacing next is arpack IMO. It's been causing problems, is not that well maintained, is in Fortran so difficult to work on, and isn't as enormous a library as say OpenBLAS.

@vtjnash
Copy link
Member

vtjnash commented Aug 28, 2014

librmath? zlib? gmp/mpfr? and openblas, like Jiahao said.

suitesparse would also be a good candidate (due to license).

low level libraries (llvm, libunwind, libuv) would probably be the last to go.

@vtjnash
Copy link
Member

vtjnash commented Aug 28, 2014

eliminating fortran codes (openspecfun?) would make julia easier to compile

@StefanKarpinski
Copy link
Member

Not using LLVM and libuv would be actively detrimental – those give us regular free improvements and but fixes when they get upgraded.

@jiahao
Copy link
Member

jiahao commented Aug 28, 2014

@tkelman we're working on ARPACK. Promise.

@quinnj
Copy link
Member Author

quinnj commented Aug 28, 2014

Are we still depending on librmath? I thought that was pretty much excised.

@johnmyleswhite
Copy link
Member

We still need librmath.

@tkelman
Copy link
Contributor

tkelman commented Aug 28, 2014

@jiahao good, it's obviously not a trivial exercise, that much is clear. Jutho and Viral have been doing good work on that front.

The modularity issues apply here - for the remaining large and useful deps, it'll be good to refactor in order to make them optional. Jeff has mentioned doing this for LLVM, FFTW is underway, GMP and MPFR and SuiteSparse are good targets for modularizing. Hell, even trying to make OpenBLAS/Lapack optional is a good long-term goal.

Merging pure-Julia Grisu just made GMP more necessary, didn't it?

@quinnj
Copy link
Member Author

quinnj commented Aug 28, 2014

@tkelman yes, pure-Julia Grisu uses BigInts. That's another 600 LOC or so to roll a minimal BigInt that's needed. Maybe if the modularizing gets more serious, I'll bump that up on the priority list.

@ViralBShah
Copy link
Member

We will need a lot of compiler improvements and multi threading to get rid of the various linear algebra libraries. It is in sight but not there yet. Of we can get a comparable gemm performance kernel, we can dump OpenBLAS quickly. I feel that LAPACK can just be translated to julia with an f2j script, but you really want to rewrite for multi threading.

@IainNZ
Copy link
Member

IainNZ commented Aug 29, 2014

First blood! Keno/SIUnits.jl#38

@IainNZ
Copy link
Member

IainNZ commented Aug 29, 2014

Second! Keno/TexExtensions.jl#2

@quinnj
Copy link
Member Author

quinnj commented Aug 30, 2014

Leave it to @Keno to use unexported macros from an unexported module in Base....:)

@IainNZ
Copy link
Member

IainNZ commented Aug 30, 2014

And GiovineItalia/Gadfly.jl#409
apparently, guess I missed that yesterday somehow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement proper print_shortest for Float16