Towards Multi-Platform Application Binaries

This post discusses the current state of native application development, a potential model for single-binary, multi-platform executables and libraries, and the work required to get there.

Operating System Overlords

For a long time engineers have built native software using the operating system as the target. They benefit from using the OSs provided features but must deal also with the particulars: the heterogeneous availability of libraries, functions, formats, and toolchains. Each platform has traditionally had its own executable and library formats, libraries, toolchains, compilers, languages, packaging, and even distribution. And yet, these provide substantially the same functionality: native applications create threads and processes, access memory, files, and the network, accept input and display output across all major operating systems.

Despite significant diversity in system software, there is a lack of diversity in actual computer architecture. A desktop or laptop computer today has x86_64 CPU(s), RAM, storage, and networking. A server has essentially the same components, although likely enterprise grade and more of them. A phone or tablet has an ARMvX CPU and additional communications capability.

The Legacy of C

Tradition has led to native development looking like this:

Despite common hardware, applications target multiple C libraries and kernels, each exposing largely similar functionality. While there is a C library standard, the implementation differences are enough that applications may need to specially handle each platform. For example, the glibc implementation of rand() takes a lock for shared state; the MSVCRT version uses thread-local storage. Even fundamentals such as line endings and path separators are different across platforms. Furthermore, the C libraries expose hundreds of legacy functions, ranging from memory manipulation to datetime formatting. In most projects, these functions are replaced by more sophisticated versions due to lack of features or consistency across platforms.

A note from the bottom of Microsoft’s docs:

Note: In all versions of Microsoft C/C++ except Microsoft C/C++ version 7.0, and in all versions of Microsoft Visual C++, the time function returns the current time as the number of seconds elapsed since midnight on January 1, 1970. In Microsoft C/C++ version 7.0, time returned the current time as the number of seconds elapsed since midnight on December 31, 1899.

– From Time Management in MSVCRT

The pain points here stem from the combinatorial explosion of (OS,libc,version,arch) that porting libraries and higher level frameworks must support. It’s common to see code like this from Boost, lambda/detail/lambda_config.hpp.

#if __GLIBC__ == 2 && __GLIBC_MINOR__ < 8
  write_descriptor_ = read_descriptor_ = syscall(__NR_eventfd, 0);
  if (read_descriptor_ != -1)
  {
    ::fcntl(read_descriptor_, F_SETFL, O_NONBLOCK);
    ::fcntl(read_descriptor_, F_SETFD, FD_CLOEXEC);
  }
#else // __GLIBC__ == 2 && __GLIBC_MINOR__ < 8
# if defined(EFD_CLOEXEC) && defined(EFD_NONBLOCK)
  write_descriptor_ = read_descriptor_ =
    ::eventfd(0, EFD_CLOEXEC | EFD_NONBLOCK)
...
#endif

A Simpler Way

It is likely that a significant number of useful applications could be built on a common libplatform library rather than OS-specific C library implementations. The platform library provides basic access to critical OS features, such as filesystem, IO, memory, network, threads, time, and locks. On top of this, a “C” library would provide common routines in a modular way, such that an application can require only the parts that it needs. This approach is compatible with a general industry desire to control or avoid insecure functionality such as strcpy. Perhaps insecure functions can be marked or packaged in a way that makes their use possible, yet discouraged so that developers are pointed towards better alternatives. Packaging in this way also allows for additional security including sandboxing, as applications can link to “libc-sandbox” rather than the unrestricted platform or C libraries.

The benefits of this approach are numerous:

Applications are built on top of common fundamentals, enabling them to run on any OS with a libplatform implementation.
Using libplatform makes it difficult to accidentally rely on OS-specific behavior.
Reduce or eliminate OS-specific handling. No more #if _LINUX!
Developer control over C library features. Similar to C++, pay for only what you need.
A single binary can run on every major platform

Point #5 is a bold claim and some clarification is necessary. First, the user application executable and libraries are dependent only on CPU architecture, rather than OS; libplatform will be dependent on (OS,architecture). Second, the user binaries will require a loader to execute, as there is not currently a multi-OS executable format. These two points do not, however, change the ability of a developer to ship a single binary that works on a large number of computers.

A Look at Multi-platform Packaging and Distribution

Native application distribution is commonly specific to an operating system. Apple and Microsoft provide app stores, Linux distributions have package systems (apt, pacman, etc.), and there are several smaller distribution systems that work on one or more platform, such as Homebrew for macOS and mingw-get for MinGW (Windows), as well as various language/framework specific systems, including pip for Python, npm for Node.js, and gem for ruby. Unfortunately, none of these packaging systems work universally for native applications – its likely that the packaging software itself would have no issues with a multi-platform native binary repository, but no such repository exists.

Such a packaging system would be able to easily address the issues mentioned above, where a platform-specific loader is required to actually execute the platform-independent application binaries. package install ssh could for example result in the following files:

root
|-- bin/                            # Stub launchers for installed packages
|   |-- ssh{,.app,.exe}             # The platform-specific stub launcher
|-- packages/                       # Contains files from installed packages
|   |-- ssh/                        # The ssh package
|   |   |-- libssh.so               # The ssh library (ELF64)
|   |   |-- ssh                     # The ssh commandline executable (ELF64)
|   |-- libc-minimal/               # C library package (minimal interface)
|   |   |-- libc.so
|   |-- libc-posix/                 # C library package (POSIX interface)
|   |   |-- libc_posix.so
|   |-- libc-<variant>/             # C library package (alternative interface)
|   |   |-- libc_<variant>.so
|-- support/                        # OS-specific files
|   |-- launcher_win.exe            # Windows launcher template (PE32+)
|   |-- launcher_mac.app            # macOS launcher template (Mach-O)
|   |-- launcher_linux              # Linux launcher template (ELF64)

The ssh package itself just contains /packages/ssh. The directory structure above is simplified by not showing metadata, multiple versions, vendoring, etc. The above approach also comes with all the advantages of current packaging systems: a community-focused development model allowing developers greater control over their project dependencies. libplatform provides the root functionality; all other libraries build upon it. User applications are likely to use one of the libc variants or another higher-level library, but the choice is up to them.

What Needs to be Done?

Significant work needs to be done for this to be real. Here are some of the phases:

Multi-platform compiler (likely Clang/LLVM) that can produce a common executable/library format (ELF perhaps?) on any platform. Bonus if the builds are byte-for-byte reproducible across platforms.
libplatform for Linux, macOS, and Windows, initially targeting x86_64.
Enough libc to build LLVM’s libc++.

With a multi-platform C/C++ compiler, libplatform, libc, and libc++, it should be possible to start building some smaller open-source projects. Some modifications are likely, as applications may expect to find resources (or install resources) in locations that don’t make sense cross-platform.

Longer term, there can be additional phases:

Architecture-neutral binaries, for example by distributing LLVM bitcode and doing native compilation at install/first-run time.
Support for other languages (C#, Objective-C, Swift are used to produce native applications in addition to C/C++.)
Common GUI toolkit (this is a big one, but I think we can draw from web technologies here. More on this in a future post.)

If we ever get to #6, we would have the ability to develop software for a wide range of devices, using a common set of libraries, tools, and languages. Rather than being guided by operating system support, developers could choose their language and dependencies based on their applications features. This would result in an overall reduction in development time and support cost, especially for smaller organizations that currently are not able to support several platforms.