Skip to content

1_Runtime_Environment_Build_and_Automation

Corbin Simpson edited this page Oct 12, 2023 · 13 revisions

Overview

Our baseline runtime environment is composed of a Debian Bookworm (12.1) image customized using Packer.

Structure

This baseline runtime environment is further composed by the configuration of these runtime dependencies:

Fundamental runtime dependencies:

Governance Product From Source Software License Source instructions Done
Docker buildx Docker Inc. Apache 2.0 None
NVIDIA CUDA NVIDIA CUDA Proprietary None
Intel Math Kernel Libraries Intel Libraries Intel Simplified Maybe, or don't build ?
Facebook FAISS Facebook FAISS MIT Built!
Linux Foundation Pytorch Pytorch from source 3-clause BSD Built! ✅ / 🔰
Symas LMDB An extremely well regarded key/value store Open LDAP Public 2.8 unknown 📛
Linux Foundation Cloud-Hypervisor Cloud Hypervisor built with musl Dual: 3-clause BSD, Apache 2.0 use cargo deb 📛
Red Hat, Inc. virtiofsd build and configure virtiofsd Dual: 3-clause BSD, Apache 2.0 use cargo deb 📛
Linux Foundation Istio Service Mesh - Apache 2.0 do we need to? 📛

Future interest?

https://github.com/tunib-ai/parallelformers

** N/P indicates impossible because source code is not available ** CUDA 12.2

  • Intel Math Kernel Libraries
  • Facebook FAISS

Kernel Command Line

6.1 Virtualization Kernel Parameters Profile

https://github.com/artificialwisdomai/origin/pull/109#pullrequestreview-1662554300

These kernel options have been extensively tested, are correct on both AMD and Intel, and focused on virtualization.

option value what it does why we set it
modprobe.blacklist nouveau prevents nouveau from being loaded interferes with nVidia proprietary module nvidia
pci realloc=on forces reallocation of PCI resources not correctly autodetected?
pcie_ports native force native access to PCIe services ?
usbcore.nousb - disables USB ?
vsyscall none disables vsyscalls (vDSO still works though) hardening
intel_iommu on,sm_on try loading Intel IOMMU driver in scalable mode ?
amd_iommu on try loading AMD IOMMU driver ?
amd_iommu_intr vapic use virtual APIC routing when possible accelerates virtualization
iommu pt ? ?
iommu.strict 1 invalidate TLBs synchronously when DMA regions are unmapped, trading performance for isolation hardening
iommu.forcedac 1 force dual-address cycle for PCI resources; allocate PCI ranges in 64-bit address space when possible ?
kvm-amd.avic 1 force-enable AVIC for AMD KVM required for amd_iommu_intr=vapic
kvm-amd.nested 0 disable nested virtualization for AMD KVM hardening?
kvm-amd.npt 1 force-enable nested page tables cargo cult; should already be enabled by default unless hardware doesn't support it
transparent_hugepage never disable automatic transparent hugepages backing anonymous mmap ?
nmi_watchdog 0 disable NMI watchdog for hardlocks ?
default_hugepagesz 2M allocate hugepages with 2MiB size ?
hugepagesz 2M allocate hugepages of 2MiB size at boot ?
hugepages 400000 allocate 400k hugepages at boot ?
video efifb:off disable UEFI-compatible framebuffer cargo cult
iommu.passthrough 1 don't route DMA through IOMMU performance
rd.driver.pre vfio-pci prefer Virtual Function I/O for PCI resource access ?
pcie_port_pm off disable PCIe power management performance?
pcie_bus_perf - try to configure PCIe busses for best performance performance
pcie_aspm off disable PCIe power management performance?

Licensing

Redistribution of the resulting image is limited due to proprietary licensing terms.

To satisfy Intel's and nVidia's licenses, the resulting image must not only expose their library functionality, but must also include at least one of our workloads.

To satisfy nVidia's license, the following notice shall be included in modifications and derivative works of sample source code distributed: "This software contains source code provided by NVIDIA Corporation."

Git action changes

  • When PR is submitted, a build is executed.
  • When PR is merged, all build artifacts are committed to Oracle Artifact storage.

Definition of a build

  • A build is defined by a baseline runtime environment image retrieved from the Oracle artifact storage.
  • A build consists of multiple components.
  • Each component is built using docker buildx.
  • The output of each component build is stored in /opt/computelify/component_manufacturer/component_name

What is included in a component build

  • Python implementation packaged as a Python Wheel with a .whl suffix.
  • Binary implementation archived and compressed in the format manufacturer_component.tar.zstd
  • Text file in the format ld-so-conf-manufacturer_component.conf to be renamed as a drop-in file to /etc/ld.so.conf.d/component_manufactureer/component_name.conf.

WORKING INFORMATION

Install components

Bulding faiss (abstracted into a containment):

Execute the following steps to build from a Dockerfile:

  • git clone https://github.com/artificialwisdomai/origin
  • sudo mv /home/wise/repos/origin/platform/packaging/build/faiss/target/ld-so-conf-intel.conf /etc/ld.so.conf.d/intel-oneapi.conf
  • sudo ldconfig
  • pushd / ; sudo tar -xf /home/wise/repos/origin/platform/packaging/build/faiss/target/intel-mkl-2023.2.0.tar.gz; popd
  • mkdir -p /opt/facebook
  • Pushd /opt/Facebook ; sudo tar -xf /home/wise/repos/origin/platform/packaging/build/faiss/target/faiss.tar.gz ; popd
  • Sudo mv /home/wise/repos/origin/platform/packaging/buld/faiss/target/ld-so-conf-faiss.conf /etc/ld.so.conf.d/facebook-faiss.conf
  • Sudo ldconfig

Verify dynamic linking cache

Please note, you will not see a link to /opt/intel/... because the current faiss build automation is not a replication of the[ build from source instructions](https://github.com/artificialwisdomai/origin/wiki/Build-FAISS-from-source

wise@wise-a40x1-1:/opt/facebook/lib$ ldd [libfaiss_avx2.so](http://libfaiss_avx2.so/)
Example list of dependencies which are INCORRECT:
	linux-vdso.so.1 (0x00007ffe597f0000)
	libcudart.so.12 => not found
	libcublas.so.12 => not found
	libgomp.so.1 => not found
	libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f6298c00000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f629d376000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f629d354000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6298e1f000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f629d45b000)

Minor bugs

The file ld-so-conf-facebook-faiss.conf needs to be created with the contents:

/opt/facebook/lib