Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move zig cc, zig translate-c, zig libc, main(), and linking from stage1 to stage2 #6250

Merged
merged 152 commits into from
Sep 30, 2020

Conversation

andrewrk
Copy link
Member

@andrewrk andrewrk commented Sep 4, 2020

  • build.zig: repair the ability to link against llvm, clang, and lld
  • move the zig cc arg parsing logic to stage2
    • the preprocessor flag is still TODO
    • the clang arg iterator code is improved to use slices instead of
      raw pointers because it no longer has to deal with an extern
      struct.
  • clean up error printing with a fatal function and use log API
    for messages rather than std.debug.print
  • add support for more CLI options to stage2 & update usage text
    • hooking up most of these new options is TODO
  • clean up the way libc and libc++ are detected via command line
    options. target information is used to determine if any of the libc
    candidate names are chosen.
  • add native library directory detection
  • implement the ability to invoke clang from stage2
  • introduce a build_options.have_llvm so we can comptime branch
    on whether LLVM is linked in or not.

Part of the motivation for doing this is so that @alexnask has the option to utilize our mingw .def files and zig cc capabilities to produce .lib files such as kernel32.lib and ntdll.lib for the purposes of testing PE files. This PR is progress towards #4313 and #4314.

Checklist

  • debug the invalid LLVM IR generated when trying to build this branch
  • expose stage2 in stage1 with zig stage2 for example zig stage2 build-exe hello.zig better yet, move main() to stage2
  • in Module, utilize std.cache_hash for the root source file
  • add support for invoking clang to build c_source_files, integrated with caching system
  • self-host link.cpp and building libcs (self-host building musl, glibc, and mingw-w64 #4313 and self-host linking #4314). using the zig cc command will set a flag indicating a preference for the llvm backend, which will include linking with LLD. At least for now. If zig's self-hosted linker ever gets on par with the likes of ld and lld, we can make it always be used even for zig cc.
  • make sure zig cc works and fix preprocessing
  • self-host main.cpp
  • go through the branch diff and look for TODO and add more checklist items. We need this to be on par with master branch before merging so that it doesn't regress.
  • look over the branch diff and open issues for things so we don't lose track of some of the things left to improve.
  • update print_targets.zig with glibc support
  • use global zig-cache dir for crt files
  • avoid invoking lld when it's just 1 object file (the zig cc -c case) will file separate issue
  • use separate cache hash instances for the zig module and each C object
  • capture lld stdout/stderr better
  • glibc .so files
  • musl
  • mingw-w64
  • retain cache_hash locks on files until we no longer will depend on their presence
  • port the stage1 os.cpp code that raises the open fd limit
  • improve the stage2 tests to support testing with LLVM extensions enabled will file separate issue
  • zig translate-c
  • zig libc
  • ELF LLD linking
  • MachO LLD linking
  • COFF LLD linking
  • WASM LLD linking
  • skip LLD caching when bin directory is not in the cache (so we don't put id.txt into the cwd)
  • ability to produce archive file

/// outputting it to an object file, and then linking that together with link options and
/// other objects.
/// Otherwise (depending on `use_lld`) this link code directly outputs and updates the final binary.
use_llvm: bool = false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on making this an enum: code_generation: enum { builtin, llvm } (I'm thinking about how to plumb in GCC support)

 * build.zig: repair the ability to link against llvm, clang, and lld
 * move the zig cc arg parsing logic to stage2
   - the preprocessor flag is still TODO
   - the clang arg iterator code is improved to use slices instead of
     raw pointers because it no longer has to deal with an extern
     struct.
 * clean up error printing with a `fatal` function and use log API
   for messages rather than std.debug.print
 * add support for more CLI options to stage2 & update usage text
   - hooking up most of these new options is TODO
 * clean up the way libc and libc++ are detected via command line
   options. target information is used to determine if any of the libc
   candidate names are chosen.
 * add native library directory detection
 * implement the ability to invoke clang from stage2
 * introduce a build_options.have_llvm so we can comptime branch
   on whether LLVM is linked in or not.
 * std.cache_hash exposes Hasher type
 * std.cache_hash makes hasher_init a global const
 * std.cache_hash supports cloning so that clones can share the same
   open manifest dir handle as well as fork from shared hasher state
 * start to populate the cache_hash for stage2 builds
 * remove a footgun from std.cache_hash add function
 * get rid of std.Target.ObjectFormat.unknown
 * rework stage2 logic for resolving output artifact names by adding
   object_format as an optional parameter to std.zig.binNameAlloc
 * support -Denable-llvm in stage2 tests
 * Module supports the use case when there are no .zig files
 * introduce c_object_table and failed_c_objects to Module
 * propagate many new kinds of data from CLI into Module and into
   linker.Options
 * introduce -fLLVM, -fLLD, -fClang and their -fno- counterparts.
   closes #6251.
   - add logic for choosing when to use LLD or zig's self-hosted linker
 * stub code for implementing invoking Clang to build C objects
 * add -femit-h, -femit-h=foo, and -fno-emit-h CLI options
Instead, append a "dirty suffix" to the version string when there are
dirty git changes and use the version string as the compiler id.

This avoids a dependency on the cache hash system, and saves time on
first invocation of the compiler since it does not have to compute its
compiler id. It also saves time by not having to check the cache for a
saved compiler id.
 * add target_util.zig which has ported code from src/target.cpp
 * Module gains an arena that owns memory used during initialization
   that has the same lifetime as the Module. Useful for constructing
   file paths and lists of strings that have mixed lifetimes.
   - The Module memory itself is allocated in this arena. init/deinit
     are modified to be create/destroy.
   - root_name moves to the arena and no longer needs manual free
 * implement the ability to invoke `zig clang` as a subprocess
   - there are lots of TODOs that should be solved before merging
 * Module now requires a Random object and zig_lib_dir
 * Module now requires a path to its own executable or any zig
   executable that can do `zig clang`.
 * Wire up more CLI options.
 * Module creates "zig-cache" directory and "tmp" and "o" subdirectories
   ("h" is created by the cache_hash)
 * stubbed out some of the things linker code needs to do with TODO
   prints
 * delete dead code for computing compiler id. the previous commit
   eliminated the need for it.
 * add `zig translate-c` CLI option but it's not fully hooked up yet.
   It should be possible for this to be fully wired up before merging
   this branch.
 * `zig targets` now uses canonical data for available_libcs
For when linking with LLD, we always create an object rather than going
straight to the executable. Next step is putting this object on the LLD
linker line.
@andrewrk
Copy link
Member Author

andrewrk commented Sep 10, 2020

Some stats on the zig executable (release build, stripped):

  • master branch stage1: 119 MiB
  • master branch stage2 (no LLVM): 3.5 MiB
  • this branch stage1 (includes stage2): 128 MiB
  • this branch stage2 (yes LLVM): 107 MiB
  • this branch stage2 (no LLVM): 3.7 MiB

So that 107 number is what we're looking at once we are done self-hosting, with an llvm-enabled build.

The 128 number is what we'll be shipping for zig 0.7.0. I do expect it to go down a little bit after deleting the C++ code that was ported from stage1 to stage2 (currently it is duplicated).

The 3.7 number is what you get if you choose to leave LLVM out of your life.

@bfredl
Copy link
Contributor

bfredl commented Sep 10, 2020

@andrewrk how much does stage2 without llvm do? Debug builds or some optimizations as well?

 * add `zig libc` command
 * add `--libc` CLI and integrate it with Module and linker code
 * implement libc detection and paths resolution
 * port LLD ELF linker line construction to stage2
 * integrate dynamic linker option into Module and linker code
 * implement default link_mode detection and error handling if
   user requests static when it cannot be fulfilled
 * integrate more linker options
 * implement detection of .so.X.Y.Z file extension as a shared object
   file. nice try, you can't fool me.
 * correct usage text for -dynamic and -static
@andrewrk
Copy link
Member Author

You can have a look at some of the test cases to get an idea of what it's capable of so far:

https://github.com/ziglang/zig/tree/master/test/stage2

We have a head start on many features however. Consider the following:

  • zig fmt is already self-hosted
  • translate-c / @cImport is already self-hosted
  • parsing .d files is already self-hosted
  • the cache hash system is already self-hosted
  • the CLI progress bar thing is already self-hosted
  • detecting libc installation paths is already self-hosted
  • clang CLI arg iteration is already self-hosted

The amount of stage1 C++ code is getting smaller and smaller. This PR takes a huge chunk out of it and moves it to stage2.

@andrewrk andrewrk changed the title start moving zig cc to stage2 move zig cc, zig translate-c, zig libc, main(), and linking from stage1 to stage2 Sep 10, 2020
@andrewrk andrewrk mentioned this pull request Sep 10, 2020
 * implement --debug-cc and --debug-link
 * implement C source files having extra flags
   - TODO a way to pass them on the CLI
 * introduce the Directory abstraction which contains both an open file
   descriptor and a file path name. The former is preferred but the
   latter is needed when communicating paths over a command line (e.g.
   to Clang or LLD).
 * use the cache hash to choose an artifact directory
   - TODO: use separate cache hash instances for the zig module and
     each C object
 * Module: introduce the crt_files table for keeping track of built libc
   artifacts for linking.
 * Add the ability to build 4/6 of the glibc static CRT lib files.
 * The zig-cache directory is now passed as a parameter to Module.
 * Implement the CLI logic of -femit-bin and -femit-h
   - TODO: respect -fno-emit-bin
   - TODO: the emit .h feature
 * Add the -fvalgrind, -fstack-check, and --single-threaded CLI options.
 * Implement the logic for auto detecting whether to enable PIC,
   sanitize-C, stack-check, valgrind, and single-threaded.
 * Properly add PIC args (or not) to clang argv.
 * Implement renaming clang-compiled object files into their proper
   place within the cache artifact directory.
   - TODO: std lib needs a proper higher level abstraction for
     std.os.renameat.
 * Package is cleaned up to use the "Unmanaged" StringHashMap and use the
   new Directory abstraction.
 * Clean up zig lib directory detection to make proper use of directory
   handles.
 * Linker code invokes LLD.
   - TODO properly deal with the stdout and stderr that we get from it
     and expose diagnostics from the Module API that match the expected
     error message format.
 * Delete the bitrotted LLVM C ABI bindings. We'll resurrect just the
   functions we need as we introduce dependencies on them. So far it
   only has ZigLLDLink in it.
 * Remove dead timer code.
 * `zig env` now prints the path to the zig executable as well.
Master branch added in the concept of library versioning being optional
to main.cpp. It will need to be re-added into this branch before merging
back into master.
into smaller exposed components and expose all of them. This makes it
more flexible.

`*const Cache` is now passed in with an open manifest dir handle which
the caller is responsible for managing.

Expose some of the base64 stuff.

Extract the hash helper functions into `HashHelper` and add some more
methods such as addOptional and addListOfFiles.

Add `CacheHash.toOwnedLock` so that you can deinitialize everything
except the open file handle which represents the file system lock on the
build artifacts.

Use ArrayListUnmanaged, saving space per allocated CacheHash.

Avoid 1 memory allocation in hit() with a static buffer.

hit() returns a bool; caller code is responsible for calling final() in
either case. This is a simpler and easier to use API.

writeManifest() is no longer called from deinit() with errors ignored.
 * update to the new cache hash API
 * std.Target defaultVersionRange moves to std.Target.Os.Tag
 * std.Target.Os gains getVersionRange which returns a tagged union
 * start the process of splitting Module into Compilation and "zig
   module".
   - The parts of Module having to do with only compiling zig code are
     extracted into ZigModule.zig.
   - Next step is to rename Module to Compilation.
   - After that rename ZigModule back to Module.
 * implement proper cache hash usage when compiling C objects, and
   properly manage the file lock of the build artifacts.
 * make versions optional to match recent changes to master branch.
 * proper cache hash integration for compiling zig code
 * proper cache hash integration for linking even when not compiling zig
   code.
 * ELF LLD linking integrates with the caching system. A comment from
   the source code:

   Here we want to determine whether we can save time by not invoking LLD when the
   output is unchanged. None of the linker options or the object files that are being
   linked are in the hash that namespaces the directory we are outputting to. Therefore,
   we must hash those now, and the resulting digest will form the "id" of the linking
   job we are about to perform.
   After a successful link, we store the id in the metadata of a symlink named "id.txt" in
   the artifact directory. So, now, we check if this symlink exists, and if it matches
   our digest. If so, we can skip linking. Otherwise, we proceed with invoking LLD.

 * implement disable_c_depfile option
 * add tracy to a few more functions
This is convenient for debugging purposes, as well as simplifying the
caching system since executable basenames will not conflict with their
corresponding object files.
comment reproduced here:

This is so that compiler_rt and libc.zig libraries know whether they
will eventually be linked with libc. They make different decisions
about what to export depending on whether another libc will be linked
in. For example, compiler_rt will not export the __chkstk symbol if it
knows libc will provide it, and likewise c.zig will not export memcpy.
This merges in the revert that fixes the broken Windows build of master
branch.
with respect to std.builtin.link_libc.

The commit 27e008e did not solve the
problem because although it got std.builtin.link_libc to be true for
compiler_rt.zig and c.zig, it had other unintentional side effects which
broke the build for -lc -target foo-linux-musl.

This commit introduces a new flag to Compilation to allow setting this
comptime flag to true without introducing other side effects to
compilation and linking.
Thanks Ryan Liptak!
This is the MachO equivalent for the code added to COFF for doing the
file copy when the input and output are both just one object file.
@andrewrk andrewrk merged commit fe117d9 into master Sep 30, 2020
@andrewrk andrewrk deleted the stage2-zig-cc branch September 30, 2020 08:28
@andrewrk andrewrk added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. stage1 The process of building from source via WebAssembly and the C backend. frontend Tokenization, parsing, AstGen, Sema, and Liveness. labels Sep 30, 2020
andrewrk added a commit that referenced this pull request Oct 5, 2020
It was regressed in 2 ways from the merge of #6250:
 * it was not being enabled by default when the target OS is native.
 * we were testing the libfoo.so file path existence with bogus format
   string ('{}' instead of '{s}') and so it ended up being something
   like "libstd.HashMap(K,V,...).Entry.so" instead of "libfoo.so". Using
   {} rather than {s} is a footgun, be careful!

Previous functionality is now restored.

closes #6523
@andrewrk andrewrk mentioned this pull request Oct 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. frontend Tokenization, parsing, AstGen, Sema, and Liveness. stage1 The process of building from source via WebAssembly and the C backend.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants