You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For some, the idea of creating a portable Linux binary is somewhat elusive.
In this article, we will be discussing how to create a Linux binary for a specific architecture, that you will have great success running on a large variety of Linux distros. This includes current releases, somewhat old ones, and hopefully far into the future.
A common problem facing those looking to deploy proprietary software on Linux, or for those trying to supply binaries to a very large user-base which will not compile your software themselves, is how to offer one binary that fits most normal scenarios.
There are generally four naive approaches to solving this problem.
The developers set up a bunch of specific distros, and compile the software on each of them, and give out distro specific binaries. This makes sense at first, till you run into some trouble.
You have to juggle many live CDs or maintain a bunch of installed distros which is painful and time consuming.
You end up only offering support for the distros you have handy, and you will get quite a few users on a more exotic distro nagging you for support, or a different and incompatible version of a distro you're already supporting.
The compilers or other build utilities on some distros are too old for your modern software, and you need to build them elsewhere, or figure out how to back port modern software to that old distro.
Builds break when a user upgrades their system.
Users end up needing to install some non standard system libraries, increasing everyone's frustration.
The developers just statically link the binaries. This isn't always a legal option due to some licenses that may be involved. Binaries which are fully statically linked also in many instances exhibit incorrect behavior (more on this later).
The developers just compile the software on one system, and pray that it works for everyone else.
Compile with a really really old distro and hope it works everywhere else. However this succumbs to the last three problems outlined in naive approach 不要相信一个熬夜的人说的每一句话 #1, and in many cases the binaries produced won't work with modern distros.
Now there are plenty of companies that supply those portable Linux binaries. You find on their website downloads for say Linux i386, AMD64, and PPC. Somehow that i386 binary manages to run on every i386 system you've tested, Red Hat, Debian, SUSE, Ubuntu, Gentoo, and both old and modern versions at that. What is their secret sauce?
Now let us dive into all the important information and techniques to accomplish this worthy goal.
First thing you want to know is what exactly is your binary linked to anyway? For this, the handy ldd command comes in.
The lines which are directed to a file are system libraries that you need to worry about. In this case, there's libstdc++, libm, libgcc, and libc. libc and libm are both part of (E)GLIBC, the C library that most Linux applications will be using. libstdc++ is GCC's C++ library. libgcc is GCC's implementation of some programming constructs that your program may be using, such as exception handling, and things like that.
In general (E)GLIBC is broken up into many sub libraries that your program may be linked against. Other notable examples are libdl for Dynamic Loading, libpthread for threading, librt for various real time functions, and a few others.
Your application will not run on a system unless all these dependencies are found, and are compatible. Therefore, versions of these also come into play. In general a newer minor version of a library will work, but not an older.
In order to find versions numbers, you want to use objdump. Here's an example with finding out what version of (E)GLIBC is needed:
In this case, 2.3 is the highest version number. Therefore this binary needs (E)GLIBC 2.3 or higher on the system. Note, the version numbers have nothing to do with the version installed on your system, rather (E)GLIBC marks each function with the minimum version that contains it.
Of course all this applies to other libraries as well, particularly libgcc and libstdc++.
Now that we know a little bit about what we're doing, I'm going to present the first bit of secret sauce.
If you're using C++, link with -static-libstdc++ this will ensure that libstdc++ is linked statically, but won't link every other lib statically like -static would. You want libstdc++ linked statically, because it's safe to do so, some systems may be using an older version (or have none at all, in the case of some servers), or you want your binary to remain compatible if a new major version of libstdc++ comes out which is no longer backwards compatible. Note that even though libstdc++ is GPL'd, it also offers a linking exception that allows you to link against it and even statically link it in any application.
If you see that your binary needs libgcc, also use -static-libgcc for the same reasons given above. Also, GCC is GPL'd, and has the same linking exception as above. I once had the unfortunate scenario where I sold a client an application without libgcc statically linked, that used exceptions. On his old server, as long as everything went absolutely perfectly, the application ran fine, but if any issue occurred, instead of gracefully handling the issue, the application terminated immediately. Since his libgcc was too old, the application saw the throws, but none of the catches. Statically linking libgcc fixed this issue.
Now you might be thinking, hey what about statically linking (E)GLIBC? Let me warn you that doing so is a bad idea. Some features in (E)GLIBC will only work if the statically linked (E)GLIBC is the exact same version of (E)GLIBC installed on the system, making statically linking pointless, if not downright problematic. (E)GLIBC's libdl is quite notable in this regard, as well as several networking functions. (E)GLIBC is also licensed under LGPL. Which essentially means that if you give out the source to your application, then in most cases you can distribute statically linked binaries with it, but otherwise, not. Also, since 99% of the functions are marked as requiring extremely old versions of (E)GLIBC, statically linking is hardly necessary in most cases.
The next bit of the secret sauce is statically linking those non standard libs your application needs but nothing else.
You probably never learned in school how to selectively static link those libraries you want, but it is indeed possible. Before the list of libraries you wish to static link, place -Wl,-Bstatic and afterwards -Wl,-Bdynamic.
Say in my application I want to statically link libcurl and OpenSSL, but want to dynamically link zlib, and the rest of my libs, such as other parts of (E)GLIBC, I would use the following as my link flags:
This is quite unacceptable. Distros generally compile packages with everything enabled. Your application generally does not need everything a library has to offer. In the case of libcurl, you can compile it yourself, and disable the features you aren't using. In an example application, I only need HTTP and FTP support, so I could compile libcurl with very little, and now have this:
This is much more manageable. Refer to the documentation of your libraries in order to see how to compile them without features you don't need.
If you're going to be compiling your own libraries, you probably want to set up a second system, virtual machine, or a chroot for building your customized library versions and applications, to ensure it doesn't conflict with your main system. Especially for the upcoming tip.
Secret sauce part 3, push your (E)GLIBC requirements down.
It seems that OpenSSL would work on (E)GLIBC 2.3, except for one pesky function which needs 2.7+. This is a problem if I want to ship an application with this modern OpenSSL on say Red Hat Enterprise Linux 5 which comes with GLIBC 2.5, or say Debian Stable from ~4 years ago, which only has 2.4.
In this case OpenSSL is using a C99 version of sscanf(), but not actually by choice.
In /usr/include/stdio.h on (E)GLIBC 2.7+, you'll notice two blocks:
#if defined __USE_ISOC99 && !defined __USE_GNU \ && (!defined __LDBL_COMPAT || !defined __REDIRECT) \ && (defined __STRICT_ANSI__ || defined __USE_XOPEN2K)# ifdef __REDIRECT/* For strict ISO C99 or POSIX compliance disallow %as, %aS and %a[ GNU extension which conflicts with valid %a followed by letter s, S or [. */externint __REDIRECT (fscanf, (FILE *__restrict __stream, __const char *__restrict __format, ...), __isoc99_fscanf) __wur;externint __REDIRECT (scanf, (__const char *__restrict __format, ...), __isoc99_scanf) __wur;externint __REDIRECT_NTH (sscanf, (__const char *__restrict __s, __const char *__restrict __format, ...), __isoc99_sscanf);# elseexternint __isoc99_fscanf (FILE *__restrict __stream, __const char *__restrict __format, ...) __wur;externint __isoc99_scanf (__const char *__restrict __format, ...) __wur;externint __isoc99_sscanf (__const char *__restrict __s, __const char *__restrict __format, ...) __THROW;# define fscanf __isoc99_fscanf# define scanf __isoc99_scanf# define sscanf __isoc99_sscanf# endif#endif
These two blocks of code make fscanf(), scanf(), sscanf(), vfscanf(), vscanf(), and vsscanf() use special C99 versions. Since older applications were already compiled against C89 versions, (E)GLIBC doesn't want to potentially break them and change how an existing function works. So instead, a new set of functions were created which only exist in (E)GLIBC 2.7+, and (E)GLIBC by default will direct all calls to these functions to the proper C99 versions when compiling.
Now there are some defines you can set in your library code and application code to ensure it uses the old more backwards compatible versions, but getting the exact right combination of defines without breaking anything else can be tricky. It may also be tedious to modify a code-base you're not familiar with.
Therefore, I recommend just deleting these two blocks from your <stdio.h> on your build system. You want your build system to be able to build everything for backwards compatibility, right?
If you're recompiling libraries like OpenSSL which are designed for massive portability with all kinds of systems, odds are, they're not looking for C99 support in basic scanf() family functions anyway. If you do happen to need C99 scanf() support in your application, I recommend that you add it manually with a specialized lib, for maximum portability. You can easily find a bunch online.
The last scenario that you may encounter is that you happen to want to use a modern library function. For most libs you can just statically link them, but that won't work for (E)GLIBC. Since some functions depend on system support, or that custom versions don't perform as well as the built in system ones, you definitely want to use the built in ones if they're available. The question is, how to once the binary has already been compiled?
So for our final bit of secret sauce, dynamically load any modern functions that you want to use, and work around them, or disable some functionality if not present.
Remember libdl that we mentioned above? It offers dlopen() for opening system libraries, and dlsym() for finding out if certain functions are present or not, and retrieving a pointer to them.
I'm going to post a full example that you can look at and play with. In this example, we have a program which tries to figure out how big system pipes are. In this application, we are going to see how much data we can stuff in a pipe before
we're told that the pipe is full, and the write would need to block.
Linux offers a function called pipe2() which has the crucial ability to create a pipe in non-blocking mode. If it doesn't exist, we can create it ourselves, but we prefer the built in one if possible.
#ifndef __linux__#error This program is specifically designed for Linux, even though it works elsewhere#endif
//Implement the rather straight forward pipe2(), note: this function is of type: static pipe2_t staticint our_pipe2(int pipefd[2], int flags) { int ret = pipe(pipefd); if (!ret) //Success, pipe created { //The built in pipe2() would not suffer from race conditions that the following code would succumb to in a threaded application if (flags & O_NONBLOCK) { fcntl(pipefd[0], F_SETFL, fcntl(pipefd[0], F_GETFL) | O_NONBLOCK); fcntl(pipefd[1], F_SETFL, fcntl(pipefd[1], F_GETFL) | O_NONBLOCK); }
static pipe2_t pipe2 = our_pipe2; //pipe2() is initialized to our function
size_t pipe_size() //Manually determine the size of the system's pipe, for automatic, look up Linux specific F_GETPIPE_SZ { //Create a union for using a pipe, so usage is a bit more logical union { int pipefd[2]; struct { int read; int write; } side; } u;
if (!pipe2(u.pipefd, O_NONBLOCK)) //Note, here pipe2() is used { for (;;) //Write to a pipe in a loop, the final amount should be the size of the pipe { ssize_t w = write(u.side.write, &amount, sizeof(size_t)); //Write a size_t to the pipe
if (w > 0) { amount += w; } //Success, add amount written and then loop elseif (w == 0) //Pipe was closed, and we certainly didn't close it { perror("Pipe unexpectedly closed"); amount = 0; //Reset to unknown, because an error occured break; } else/* Error occured trying to write */if (errno != EINTR) //And it wasn't an interruption, so something that needs handling { if ((errno != EAGAIN) && (errno != EWOULDBLOCK)) //Failed to write to pipe - and it's nothing we'd fix { perror("Failed to write to pipe"); amount = 0; //Reset to unknown, because an error occured } //Else, pipe is full, we're done!
break; //In either case, we're done writing to the pipe } //Else If (errno == EINTR), we'd just loop and try again } close(u.side.read); close(u.side.write); } else { perror("Failed to create pipe"); }
return(amount); }
int main(constint argc, constchar *const *const argv) { void *so = dlopen("libc.so.6", RTLD_LAZY); //Open the C library if (so) { void *sym = dlsym(so, "pipe2"); //Grab the handle to pipe2() if it exists if (sym) //Success! { pipe2 = (pipe2_t)sym; //Use the built in one instead of ours puts("Using system's pipe2()."); } else { puts("Using our pipe2()."); } } else { puts("Using our pipe2()."); fprintf(stderr, "Could not open C library: %s\n", dlerror()); }
//Here's the real work size_t a = pipe_size(); if (a) { printf("Pipe size is: %zu\n", a); } else { fputs("Could not determine pipe size.\n", stderr); }
if (so) { dlclose(so); } return(0); }
Here's how to compile and run it:
/tmp> gcc -Wall -o pipe_test pipe_test.c -ldl
/tmp> ./pipe_test
Using system's pipe2().
Pipe size is: 65536
/tmp>
Now pipe2() Was added to GLIBC in 2.9, yet this binary here according to
objdump
only needs (E)GLIBC 2.2.5+. Here's the output from an older system with GLIBC 2.7, using the exact same binary created on a newer system:
/tmp> ./pipe_test
Using our pipe2().
Pipe size is: 65536
/tmp>
Lastly, let me recap all the techniques we learned.
Use ldd and objdump to see version requirements of binaries and libraries.
Statically link compiler and language libraries, such as libgcc and libstdc++.
Statically link selected libraries, while dynamically linking others.
Compile selected libraries with as little needed functionality as possible.
Pushing (E)GLIBC requirements down, by being wary of functions which have changed over time, and (E)GLIBC redirects calls to them in newly compiled programs by default.
Pushing (E)GLIBC requirements down by not directly using new functions, and instead working around their presence.
Doing all this, you'll still need to make different builds for different operating systems, and different architectures like x86 and ARM, but at least you won't be forced to for all different distros and versions thereof.
One thing of note, it's possible to have Linux with different C libraries, and in those cases, you may as well be using a different Operating System. You'll be hard pressed to make complex programs compiled against one C library run on Linux which uses a different C library, where the needed one is not present. Thankfully though, all the mainstream desktop and server distros all use (E)GLIBC.
In any case, the techniques you've learned here can also be applied to other setups too. (E)GLIBC was only focused on in this article because of its popularity and its many gotchas, but many other libraries that you may use, particularly video and audio libraries have similar issues as well.
via insanecoding.blogspot.com https://ift.tt/UCfWY3y
June 5, 2024 at 02:08PM
The text was updated successfully, but these errors were encountered:
Insane Coding: Creating portable Linux binaries
https://ift.tt/Yk1SsGe
For some, the idea of creating a portable Linux binary is somewhat elusive.
In this article, we will be discussing how to create a Linux binary for a specific architecture, that you will have great success running on a large variety of Linux distros. This includes current releases, somewhat old ones, and hopefully far into the future.
A common problem facing those looking to deploy proprietary software on Linux, or for those trying to supply binaries to a very large user-base which will not compile your software themselves, is how to offer one binary that fits most normal scenarios.
There are generally four naive approaches to solving this problem.
Now there are plenty of companies that supply those portable Linux binaries. You find on their website downloads for say Linux i386, AMD64, and PPC. Somehow that i386 binary manages to run on every i386 system you've tested, Red Hat, Debian, SUSE, Ubuntu, Gentoo, and both old and modern versions at that. What is their secret sauce?
Now let us dive into all the important information and techniques to accomplish this worthy goal.
First thing you want to know is what exactly is your binary linked to anyway? For this, the handy ldd command comes in.
The lines which are directed to a file are system libraries that you need to worry about. In this case, there's libstdc++, libm, libgcc, and libc. libc and libm are both part of (E)GLIBC, the C library that most Linux applications will be using. libstdc++ is GCC's C++ library. libgcc is GCC's implementation of some programming constructs that your program may be using, such as exception handling, and things like that.
In general (E)GLIBC is broken up into many sub libraries that your program may be linked against. Other notable examples are libdl for Dynamic Loading, libpthread for threading, librt for various real time functions, and a few others.
Your application will not run on a system unless all these dependencies are found, and are compatible. Therefore, versions of these also come into play. In general a newer minor version of a library will work, but not an older.
In order to find versions numbers, you want to use objdump. Here's an example with finding out what version of (E)GLIBC is needed:
In this case, 2.3 is the highest version number. Therefore this binary needs (E)GLIBC 2.3 or higher on the system. Note, the version numbers have nothing to do with the version installed on your system, rather (E)GLIBC marks each function with the minimum version that contains it.
Of course all this applies to other libraries as well, particularly libgcc and libstdc++.
Now that we know a little bit about what we're doing, I'm going to present the first bit of secret sauce.
If you're using C++, link with -static-libstdc++ this will ensure that libstdc++ is linked statically, but won't link every other lib statically like -static would. You want libstdc++ linked statically, because it's safe to do so, some systems may be using an older version (or have none at all, in the case of some servers), or you want your binary to remain compatible if a new major version of libstdc++ comes out which is no longer backwards compatible. Note that even though libstdc++ is GPL'd, it also offers a linking exception that allows you to link against it and even statically link it in any application.
If you see that your binary needs libgcc, also use -static-libgcc for the same reasons given above. Also, GCC is GPL'd, and has the same linking exception as above. I once had the unfortunate scenario where I sold a client an application without libgcc statically linked, that used exceptions. On his old server, as long as everything went absolutely perfectly, the application ran fine, but if any issue occurred, instead of gracefully handling the issue, the application terminated immediately. Since his libgcc was too old, the application saw the throws, but none of the catches. Statically linking libgcc fixed this issue.
Now you might be thinking, hey what about statically linking (E)GLIBC? Let me warn you that doing so is a bad idea. Some features in (E)GLIBC will only work if the statically linked (E)GLIBC is the exact same version of (E)GLIBC installed on the system, making statically linking pointless, if not downright problematic. (E)GLIBC's libdl is quite notable in this regard, as well as several networking functions. (E)GLIBC is also licensed under LGPL. Which essentially means that if you give out the source to your application, then in most cases you can distribute statically linked binaries with it, but otherwise, not. Also, since 99% of the functions are marked as requiring extremely old versions of (E)GLIBC, statically linking is hardly necessary in most cases.
The next bit of the secret sauce is statically linking those non standard libs your application needs but nothing else.
You probably never learned in school how to selectively static link those libraries you want, but it is indeed possible. Before the list of libraries you wish to static link, place -Wl,-Bstatic and afterwards -Wl,-Bdynamic.
Say in my application I want to statically link libcurl and OpenSSL, but want to dynamically link zlib, and the rest of my libs, such as other parts of (E)GLIBC, I would use the following as my link flags:
gcc -o app *.o -static-libgcc -Wl,-Bstatic -lcurl -lssl -lcrypto -Wl,-Bdynamic -lz -ldl -lpthread -lrt
The next step is to ensure that your libraries pull in as few dependencies as possible. Here's the output from ldd on my libcurl.so:
This is quite unacceptable. Distros generally compile packages with everything enabled. Your application generally does not need everything a library has to offer. In the case of libcurl, you can compile it yourself, and disable the features you aren't using. In an example application, I only need HTTP and FTP support, so I could compile libcurl with very little, and now have this:
This is much more manageable. Refer to the documentation of your libraries in order to see how to compile them without features you don't need.
If you're going to be compiling your own libraries, you probably want to set up a second system, virtual machine, or a chroot for building your customized library versions and applications, to ensure it doesn't conflict with your main system. Especially for the upcoming tip.
Secret sauce part 3, push your (E)GLIBC requirements down.
Let's look at an objdump on OpenSSL.
It seems that OpenSSL would work on (E)GLIBC 2.3, except for one pesky function which needs 2.7+. This is a problem if I want to ship an application with this modern OpenSSL on say Red Hat Enterprise Linux 5 which comes with GLIBC 2.5, or say Debian Stable from ~4 years ago, which only has 2.4.
In this case OpenSSL is using a C99 version of sscanf(), but not actually by choice.
In /usr/include/stdio.h on (E)GLIBC 2.7+, you'll notice two blocks:
And
These two blocks of code make fscanf(), scanf(), sscanf(), vfscanf(), vscanf(), and vsscanf() use special C99 versions. Since older applications were already compiled against C89 versions, (E)GLIBC doesn't want to potentially break them and change how an existing function works. So instead, a new set of functions were created which only exist in (E)GLIBC 2.7+, and (E)GLIBC by default will direct all calls to these functions to the proper C99 versions when compiling.
Now there are some defines you can set in your library code and application code to ensure it uses the old more backwards compatible versions, but getting the exact right combination of defines without breaking anything else can be tricky. It may also be tedious to modify a code-base you're not familiar with.
Therefore, I recommend just deleting these two blocks from your <stdio.h> on your build system. You want your build system to be able to build everything for backwards compatibility, right?
If you're recompiling libraries like OpenSSL which are designed for massive portability with all kinds of systems, odds are, they're not looking for C99 support in basic scanf() family functions anyway. If you do happen to need C99 scanf() support in your application, I recommend that you add it manually with a specialized lib, for maximum portability. You can easily find a bunch online.
The last scenario that you may encounter is that you happen to want to use a modern library function. For most libs you can just statically link them, but that won't work for (E)GLIBC. Since some functions depend on system support, or that custom versions don't perform as well as the built in system ones, you definitely want to use the built in ones if they're available. The question is, how to once the binary has already been compiled?
So for our final bit of secret sauce, dynamically load any modern functions that you want to use, and work around them, or disable some functionality if not present.
Remember libdl that we mentioned above? It offers dlopen() for opening system libraries, and dlsym() for finding out if certain functions are present or not, and retrieving a pointer to them.
I'm going to post a full example that you can look at and play with. In this example, we have a program which tries to figure out how big system pipes are. In this application, we are going to see how much data we can stuff in a pipe before
we're told that the pipe is full, and the write would need to block.
Linux offers a function called pipe2() which has the crucial ability to create a pipe in non-blocking mode. If it doesn't exist, we can create it ourselves, but we prefer the built in one if possible.
Here's how to compile and run it:
Now pipe2() Was added to GLIBC in 2.9, yet this binary here according to
objdumponly needs (E)GLIBC 2.2.5+. Here's the output from an older system with GLIBC 2.7, using the exact same binary created on a newer system:
Lastly, let me recap all the techniques we learned.
Doing all this, you'll still need to make different builds for different operating systems, and different architectures like x86 and ARM, but at least you won't be forced to for all different distros and versions thereof.
One thing of note, it's possible to have Linux with different C libraries, and in those cases, you may as well be using a different Operating System. You'll be hard pressed to make complex programs compiled against one C library run on Linux which uses a different C library, where the needed one is not present. Thankfully though, all the mainstream desktop and server distros all use (E)GLIBC.
In any case, the techniques you've learned here can also be applied to other setups too. (E)GLIBC was only focused on in this article because of its popularity and its many gotchas, but many other libraries that you may use, particularly video and audio libraries have similar issues as well.
via insanecoding.blogspot.com https://ift.tt/UCfWY3y
June 5, 2024 at 02:08PM
The text was updated successfully, but these errors were encountered: