Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

static compile part 5 (automatic recompilation) #12259

Closed
tkelman opened this issue Jul 22, 2015 · 25 comments · Fixed by #12458
Closed

static compile part 5 (automatic recompilation) #12259

tkelman opened this issue Jul 22, 2015 · 25 comments · Fixed by #12458
Labels
needs decision A decision on this change is needed

Comments

@tkelman
Copy link
Contributor

tkelman commented Jul 22, 2015

Moved from #8745 (comment):

Given release schedules we might not want to commit to one approach within base just yet, and rather leave it to be implemented a few different ways in packages until we can find what works best. But there may be a few small things we could do soon within base to make that job easier. For example is there an easily-extractable way to get out of the .ji file the paths of all .jl source files that went into it? If so, we could do the hashing and rebuilding via external scripting or tools without needing to change base. @yuyichao had started to do some work in this direction for the purposes of teaching cmake how to build the system image, and I think that might be worth pursuing for packages too.

@tkelman tkelman added the needs decision A decision on this change is needed label Jul 22, 2015
@yuyichao
Copy link
Contributor

FWIW, what I need for making cmake happy is very minimum so basically anything reasonably simple and flexible should work for that.

@tkelman
Copy link
Contributor Author

tkelman commented Jul 22, 2015

Should we make a CMake.jl package that bootstraps a binary of cmake across different platforms and includes a few common useful-for-Julia-packages cmake modules in a central place?

@yuyichao
Copy link
Contributor

Should we make a CMake.jl package that bootstraps a binary of cmake across different platforms and includes a few common useful-for-Julia-packages cmake modules in a central place?

Currently I can't come up with anything to put there (well, at least on linux, where installing cmake is trival).

Might be useful when this issue and the cmake PR materialized (now that what my todo for 0.4 is almost done hopefully I can spend a little more time on the cmake PR.... :P ).

@tkelman
Copy link
Contributor Author

tkelman commented Jul 22, 2015

cmake isn't especially hard to install anywhere, but it's also not too hard to just automate that which would be a more user-friendly approach for managing julia packages. We might also need to bootstrap ninja while we're at it (or maybe just ninja, and skip cmake?) to avoid relying on command-line tools or having a visual studio / cygwin / msys installation present.

I think the cmake build system for base Julia is an almost entirely independent issue from the idea of possibly leveraging cmake for managing recompilation of packages and Julia source dependencies. The latter happens to be somewhat similar to the system image build step of the former though, so they might be able to use the same mechanism for determining ji->jl dependency lists.

@tkelman
Copy link
Contributor Author

tkelman commented Jul 22, 2015

Other things worth looking at are Redo (esp. https://github.com/apenwarr/redo#why-not-always-use-checksum-based-dependencies-instead-of-timestamps) and Tup http://gittup.org/tup/

Interestingly both of those, as well as ninja, appear to be primarily based on time stamps rather than checksums. Looks like scons uses md5, though getting python distributed reliably (mostly on Windows) has proven troublesome elsewhere.

And there's always the option of re-implementing any one of these in Julia, or coming up with our own thing - for just the Julia parts of packages anyway, once we start talking about building binary dependencies this is a bad idea.

@ufechner7
Copy link

Tup looks awesome! Easy to install, simple to write the rules, fast.

@timholy
Copy link
Member

timholy commented Jul 22, 2015

FWIW I'm "faking" this currently with the following script:

function autocompile(sym)
    if Base.require(sym) != true
        Base.compile(sym)
    end
end

autocompile(:Images)
autocompile(:JLD)
autocompile(:MAT)
autocompile(:Gadfly)
autocompile(:Gtk)
...

It would be nice not to have to double-require, so moving the uuid checks into something standalone sounds like a big step forward.

@stevengj
Copy link
Member

@tkelman, note that Python also uses only timestamps, rather than checksums, for invalidating its .pyc files.

@stevengj
Copy link
Member

I agree that the first step is for compile to spit out a list of the included files and imported modules.

But I'm skeptical that cmake (or similar) buys us much for managing the package recompilation; it doesn't seem difficult for require of a @cacheable module to simply check all of the timestamps and then recompile it if needed. And I really think that we want this to be automated so that it occurs on require (hence using and import).

@yuyichao
Copy link
Contributor

But I'm skeptical that cmake (or similar) buys us much for managing the package recompilation

I agree with this. What I think might be useful is to use cmake to make pre-compiled packages and that's what I think could be more clear after we have more experience on this issue (because we would know what we need better) and the cmake port (because they can share cmake scripts etc).

@yuyichao
Copy link
Contributor

I think cmake could be useful as a replacement for the python build system (I know many people will probably hate me for saying this...) because many python modules are written in C/C++, either the glue code or even the core logic themselves to gain performance. This (build system for a foreign language) is what cmake is good for.

However, in julia, we encourage people to write the core logic in julia itself and the binary dependencies we have are usually existing 3rd party libraries which have their own build systems (Cxx.jl is probably an exception). Therefore, we usually only need to call their build system and this logic can be equally easily done in julia or cmake.

Making precompiled packages for developers or distribution packagers is a different issue and that's where I think an existing build system (like cmake) could be useful.

@stevengj
Copy link
Member

To be concrete on the next steps, the .ji file should store an array of strings giving the dependencies

  • files that were included (usually .jl files)
  • .ji files of other modules directly imported
  • arbitrary dependencies declared with a dependency(pathname::AbstractString) function call in the module

There should also be a function dependencies(pathname::AbstractString) to retrieve this information as a Vector{UTF8String} from the .ji filename.

Once we have this information, then we can decide how to use it. (e.g. auto-recompile on require if any of these files have a timestamp that is newer than the .ji file being imported, or if any of the imported modules is recursively judged to be out-of-date).

@stevengj
Copy link
Member

(We could also store a corresponding array of checksums/SHAs in case someone wants to use that info, although I suspect that for most purposes just looking at the timestamp should suffice as it does in Python.)

@yuyichao
Copy link
Contributor

To be concrete on the next steps, the .ji file should store an array of strings giving the dependencies

Would be nice if this information is stored in a format that is easy to be retrived with command line tools.

@ScottPJones
Copy link
Contributor

Is it typical for Python to be moving dynamically compiled code to remote nodes?
That's where I think timestamps just end up having lots of headaches (having experienced that in the past with moving dynamically compiled code across a distributed system).

@stevengj
Copy link
Member

@yuyichao, since the .ji file is just object code (e.g. a COFF or ELF file or similar), it's not going to be too easily processed with the usual command-line tools. I think that in practice we may have to rely on using julia as the portable command-line tool to extract information from it. (On some systems one may be able to use readelf or similar commands, but this is likely to be platform-dependent.)

But julia -e 'println(join(dependencies(ARGS[1]),"\n"))' filename.ji is not such a bad command-line tool.

@yuyichao
Copy link
Contributor

But julia -e 'println(join(dependencies(ARGS[1]),"\n"))' filename.ji is not such a bad command-line tool.

Yes, although it will not work for bootstrap. If we have the code to record it, I guess adding another command line option to print it in text format for bootstrap shouldn't be too bad.

@tkelman
Copy link
Contributor Author

tkelman commented Jul 22, 2015

I think cmake could be useful as a replacement for the python build system (I know many people will probably hate me for saying this...) because many python modules are written in C/C++, either the glue code or even the core logic themselves to gain performance. This (build system for a foreign language) is what cmake is good for.

I think you're right on the money here. If cmake had been more mature/widely-used when a lot of these package systems in python were being designed, that ecosystem might look a lot different.

The bootstrapping-base concern is mostly orthogonal, but hopefully we should be able to reuse most of whatever we come up with, one way or another.

We will need to record and be able to extract the dependency information, but I don't think we need to reinvent the wheel in checking timestamps and managing rebuilds. From a list of dependencies into a build.ninja file would be a really simple transformation.

@stevengj
Copy link
Member

@tkelman, a maximum(map(timestamp, dependencies)) > timestamp(image) && recompile(image) check is trivial to implement, and my suspicion is that it will actually be easier to implement ourselves than to interface with an external build system, no matter how simple the latter.

You use fancier build systems like cmake when you need other features, like portable building of shared libraries, managing compiler flags, checking for external libraries, execution of test suites, etcetera, none of which is needed by the using Foo re-compilation machinery.

@tkelman
Copy link
Contributor Author

tkelman commented Jul 23, 2015

cmake addresses the higher-level autoconf-like part of the problem, which I agree isn't exactly this issue. Though we are going to want to portably build shared libraries out of Julia modules before too long, probably as soon as we can reliably use lld everywhere.

Dependencies are hardly ever a flat list, they get to be a multi-level complicated graph pretty quickly (simple recursion may be good enough for this, let's see). Do we need to recompile all of Gadfly any time any of its dependencies changes? Avoiding that is perhaps an optimization and might not need to be solved right away (that may require operating in a mode where cross-module inlining is not allowed), but we should try not to back ourselves into a corner design-wise in a way that makes future modularity much more difficult.

@stevengj
Copy link
Member

@tkelman, because module inclusion cannot be circular, recursion should be good enough (+ memoization to avoid checking timestamps twice). i.e. dynamic programming.

@stevengj
Copy link
Member

Also, I don't think we should use any build system that cannot be (a) invoked as a library and (b) invoked as a library with callbacks so that it compiles using the running julia process rather than spawning new julia processes in its build rules. It's not clear that either of these is possible with ninja.

@StefanKarpinski
Copy link
Member

I'm with @stevengj here – I don't see any reason why we'd need an external build system to figure out whether packages need to be recompiled or not. Even if it's a graph, it's far easier to traverse ourselves than to try to deal with an external build system.

@vtjnash
Copy link
Member

vtjnash commented Jul 27, 2015

I agree that the first step is for compile to spit out a list of the included files and imported modules.

see jl_deserialize_verify_mod_list, which parses a simple null-terminated list out of the header of the .ji file consisting of [len::Int32, name::Symbol{len}, uuid::UInt64].

invoked as a library with callbacks so that it compiles using the running julia process rather than spawning new julia processes in its build rules. It's not clear that either of these is possible with ninja.

invoking Base.compile doesn't use the currently running julia process either (every module is compiled via a freshly launched julia process), so i don't see this as a much of a showstopper.

@stevengj
Copy link
Member

stevengj commented Aug 3, 2015

Thanks @vtjnash, I used the jl_deserialize_verify_mod_list info in #12445 to export the dependency information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs decision A decision on this change is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants