Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further speedups in flisp? Possible, or not? #8310

Closed
timholy opened this issue Sep 11, 2014 · 23 comments
Closed

Further speedups in flisp? Possible, or not? #8310

timholy opened this issue Sep 11, 2014 · 23 comments

Comments

@timholy
Copy link
Member

timholy commented Sep 11, 2014

Since @JeffBezanson improved flisp, loading times have gone down by a factor of 2 (on top of the 2-fold improvement he got at the end of the 0.3 cycle). In case there's room for yet further improvement, here's a link to profiling includeing a file that takes approximately 12 seconds to load (it's autogenerated by an up-and-coming version of #8235 that outputs all the defined methods to a file).

You may want to download the file locally and then view with your browser. It takes a long time to render (minutes, see timholy/ProfileView.jl#18), but if you just wait it out until it finally finishes rendering, interaction is reasonably fast.

Bottom line: flisp still accounts for about 75% of the total time. That's a huge improvement from where we started. Is there room for further improvement, or at this point do we need to turn our attention to file caching?

Load times continue to be a favorite topic on julia-users.

@timholy
Copy link
Member Author

timholy commented Sep 11, 2014

Oh, and the C line numbers need to be viewed with a grain of salt; they're close, but they sometimes seem to be off by a bit. See #7986.

@timholy
Copy link
Member Author

timholy commented Sep 11, 2014

For the record:

julia> versioninfo()
Julia Version 0.4.0-dev+542
Commit 17abdc3 (2014-09-11 05:07 UTC)
Platform Info:
  System: Linux (x86_64-redhat-linux)
  CPU: Intel(R) Xeon(R) CPU           E5506  @ 2.13GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT NO_AFFINITY NEHALEM)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

@ArchRobison
Copy link
Contributor

What's required to reproduce the problem? E.g., could the included file be posted as a gist?

@timholy
Copy link
Member Author

timholy commented Sep 11, 2014

That particular file (or one like it) will come soon as part of #8235, but I'm still working on it. Until then, the auto-generated file created by #7977 (comment) might be helpful (I haven't recently generated a profile specifically for that function, can do upon request).

But in my experience, flisp dominates loading time for any file or package I've tried (less so now, but still true).

@timholy
Copy link
Member Author

timholy commented Sep 11, 2014

Oh, and @ArchRobison, if you don't know the history in #7977, grab a cup of coffee and read from top to bottom---it's a real thriller. Jeff wrote the final chapters in commits that, as far as I know, ended with f522f8f.

@jakebolewski
Copy link
Member

I tested the package load times (timing using Pkg) for v0.3 vs v0.4-dev. The numbers I'm getting are here: https://gist.github.com/fb2f577249018591e4f2 The numbers show some performance improvements, some regressions. I'm not seeing the numbers you are quoting however (Images takes slightly longer to load for me in 0.4). @timholy what are you measuring exactly?

@timholy
Copy link
Member Author

timholy commented Sep 12, 2014

That's a pretty awesome test, @jakebolewski!

Images has come out with a pretty radical overhaul since I made those measurements; in particular, the new code is riddled with for CV in subtypes(AbstractRGB)... statements, and there was nothing like that in the version I tested.

But in your numbers there's little that shows anything like a 2x improvement. Hmm. A little disappointing, I have to say. It almost makes me wonder if there was something wrong with my test.

Did you always do each package with a cold start of julia?

@timholy
Copy link
Member Author

timholy commented Sep 12, 2014

I just tested Images again (after using Color, to isolate it to Images proper). Results: v0.3: 6.6s; v0.4: 5.9s. So it's better, but not by a huge margin.

@timholy
Copy link
Member Author

timholy commented Sep 12, 2014

In loading the file profiled above, there seem to be a couple of key recursive loops, which show up as:

  • this alternating with this. (I'm not sure which branch of the try/catch this invokes.) When this finishes, all the real work seems to be in apply_cl.
  • A three-step cycle, this, to this, to this and then back to the first of these. When it exits this cycle, again all the work is in apply_cl.

Those two types of recursive calling account for almost all the time in flisp.c. Mean anything to you?

@jakebolewski
Copy link
Member

The times were measured from a cold start. Might as well profile the "user experience" rather than isolating out the dependencies.

@timholy flisp uses a virtual stack machine interpreter. SP is the stack pointer. What you were measuring is the cost of function application :-)

@timholy
Copy link
Member Author

timholy commented Sep 12, 2014

I was pretty sure that had to be it. So unless recursion proper has some overhead, then the real time-sinks are scattered throughout lines 1096-1889 of apply_cl.

@ViralBShah
Copy link
Member

Now flisp needs a JIT too!

@timholy
Copy link
Member Author

timholy commented Sep 12, 2014

I looked at the specific lines in apply_cl that are acting as bottlenecks, and basically they're all manipulations of the stack. So it seems pretty deeply intertwined into the architecture of flisp.

@timholy
Copy link
Member Author

timholy commented Sep 12, 2014

@jakebolewski, any idea how to solve #7280 (comment)? Since macro application appears to be done in Scheme, it's out of my territory. If we could develop a list of what other modules the parsing of a given file depends on, then the path towards merging #5061 would be easier.

@ArchRobison
Copy link
Contributor

I had the same thought about JITing flisp. Or rewrite the .scm parts in C++. (Yes, heresy, but it ducks the bootstrap problem that rewriting the .scm as .jl brings on.) I suspect that caching is the right answer, since skipping/reusing work beats speeding up the worker.

@jakebolewski
Copy link
Member

Macros don't really live in a file they live in some runtime module, so invalidating a cache based on file changes doesn't seem correct to me. To me this all ties into #265.

@timholy
Copy link
Member Author

timholy commented Sep 12, 2014

As long as we can say "rebuild package X if package Y changes" I think we're covered. Maybe we'll rebuild more often than we strictly need to, but currently we "rebuild" on every startup, so the bar for progress is very, very low 😄.

I personally don't think we need to worry about #265 to solve this problem---just a datestamp on every package and dependency chains. I suppose we could rely on the REQUIRE file, but that works only for registered packages.

@JeffBezanson
Copy link
Member

Aside from caching, I think the next improvements will come from

  1. Possibly switching to @jakebolewski 's parser and rewriting the lowering passes, OR
  2. Using better algorithms in the front-end code. The parser does a large recursive descent that could probably be improved on, and the lowering code could probably be simplified, e.g. folding some passes together.
  3. Somehow generating less native code.

@timholy
Copy link
Member Author

timholy commented Sep 12, 2014

While I sympathize with the 3rd, for most packages that doesn't yet seem to be the first-order problem. Of course, once you start using the package it becomes the problem, but I think it will make many people's day if just using XYZ gets faster. (A lot faster.)

In terms of prioritizing lowering vs parsing, I found #7977 (comment) interesting. Although #7977 (comment) seems to suggest the ratio might not be quite that high. Not quite sure what the difference between those tests was.

From my perspective, following Arjan's advice I'd say we should be looking to achieve an approximate 20x speedup from where we are now---Gadfly in 1 second is my personal mantra, although I recognize we may only arrive at that point in stages. But I'm a little skeptical that tweaking the front end will take us most of the way there.

My suspicion is we should implement caching first, then go back and see how much of a speedup we can get on the front end. That will still be very relevant, for all those times when the cache has to be invalidated.

Time and talent permitting, I'm happy to help as best I can with this.

@JeffBezanson
Copy link
Member

Ok, that is reasonable. A good first step is to save fully-expanded top-level expressions if they contain no macro calls, and otherwise just save the parsed (not expanded) form. #5061 is nearly there.

It would be nice to come up with a way to cache expressions generated by @eval, since I suspect a fair amount of time is there. You want to save everything except the final actual call to eval, but this is quite tricky to achieve.

One case I care a lot about is the productivity of package developers like yourself; you will still have to wait to reload after making changes to a big package. That's why caching is not the be-all-end-all to me, though of course it's still good to have.

@JeffBezanson
Copy link
Member

To answer the question posed by this issue more directly, no I do not think flisp can be sped up significantly without major effort, e.g. compiling to native code. As I said above, I think we'll get more from optimizing the code in the .scm files, or replacing flisp. Rewriting in julia has advantages beyond just speed, making that option attractive.

I will close since this is more of a question than an issue in itself.

@timholy
Copy link
Member Author

timholy commented Sep 15, 2014

Upon further reflection, @jakebolewski's comment about #265 started making me nervous about how inlining across packages might work. Seems almost like the macro case. Will this cause trouble?

In general, some earlier benchmarking I did indicated that the Julia side of macros seem to take almost no time, although macros do slow things down. Presumably this is again on the flisp side.

And I'm very glad to hear about your interest in keeping package developers happy! I am prepared to wait after changes to my packages, and happy about getting faster loading of packages that mine depend on. Obviously it would be great to make it all fast.

The news about flisp is not unexpected, and it's fine to close this.

@timholy
Copy link
Member Author

timholy commented Sep 16, 2014

Also, if it's possible to do the caching on a file-by-file basis, rather than package-by-package basis, then for most developers it will dramatically decrease load times of large packages. Most large packages have many files, and most changes don't touch most files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants