Skip to content

Commit

Permalink
git subrepo pull (merge) lib/astc-encoder
Browse files Browse the repository at this point in the history
Warning fixes and performance improvements. Commit includes
reuse issue fix and regen of golden images for toktx ASTC tests.

subrepo:
  subdir:   "lib/astc-encoder"
  merged:   "8a3a329b"
upstream:
  origin:   "https://github.com/ARM-software/astc-encoder.git"
  branch:   "main"
  commit:   "f48cc27b"
git-subrepo:
  version:  "0.4.3"
  origin:   "https://github.com/MarkCallow/git-subrepo.git"
  commit:   "c1f1132"
  • Loading branch information
MarkCallow committed Apr 29, 2022
1 parent 7c24a98 commit 51f4763
Show file tree
Hide file tree
Showing 447 changed files with 57,317 additions and 14,585 deletions.
4 changes: 4 additions & 0 deletions .reuse/dep5
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,10 @@ Files: lib/astc-encoder/Source/tinyexr.h
Copyright: 2014-2019 Syoyo Fujita and many contributors
License: BSD-3-Clause

Files: lib/astc-encoder/Source/wuffs-v0.3.c
Copyright: 2022 The Wuffs Authors.
License: Apache-2.0

# We have asked Binomial about REUSE compliance for their repo, see https://github.com/BinomialLLC/basis_universal/issues/165
Files: lib/basisu/*
Copyright: 2019-2020 Binomial LLC
Expand Down
4 changes: 2 additions & 2 deletions lib/astc-encoder/.gitrepo
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
[subrepo]
remote = https://github.com/ARM-software/astc-encoder.git
branch = main
commit = a33dbb44739da188e32d8f90e2342a780cf751fe
parent = dbfeb82a731c534f0ad830800a9cd7e68755cc0d
commit = f48cc27b2528286126c116f42f2792ed2fa13755
parent = 7c24a986d1f48e5cb08b62a6fc55ae50522c4efb
method = merge
cmdver = 0.4.3
6 changes: 5 additions & 1 deletion lib/astc-encoder/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ if(MSVC)
add_compile_options("/wd4324") # Disable structure was padded due to alignment specifier
endif()

project(astcencoder VERSION 3.3.0)
project(astcencoder VERSION 3.7.0)

set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
Expand All @@ -42,7 +42,9 @@ option(ISA_NONE "Enable builds for no SIMD")
option(ISA_NATIVE "Enable builds for native SIMD")
option(DECOMPRESSOR "Enable builds for decompression only")
option(DIAGNOSTICS "Enable builds for diagnostic trace")
option(ASAN "Enable builds width address sanitizer")
option(UNITTEST "Enable builds for unit tests")
option(NO_INVARIANCE "Enable builds without invariance")
option(CLI "Enable build of CLI" ON)

set(UNIVERSAL_BUILD OFF)
Expand Down Expand Up @@ -202,7 +204,9 @@ if("${MACOS_BUILD}")
printopt("Universal bin " ${UNIVERSAL_BUILD})
endif()
printopt("Decompressor " ${DECOMPRESSOR})
printopt("No invariance " ${NO_INVARIANCE})
printopt("Diagnostics " ${DIAGNOSTICS})
printopt("ASAN " ${ASAN})
printopt("Unit tests " ${UNITTEST})

# Subcomponents
Expand Down
112 changes: 63 additions & 49 deletions lib/astc-encoder/Docs/Building.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,19 @@
This page provides instructions for building `astcenc` from the sources in
this repository.

Builds use CMake 3.15 or higher as the build system generator. The examples on
this page only show how to use it to target NMake (Windows) and Make
(Linux and macOS), but CMake supports other build system backends.
Builds must use CMake 3.15 or higher as the build system generator. The
examples on this page show how to use it to generate build systems for NMake
(Windows) and Make (Linux and macOS), but CMake supports other build system
backends.

## Windows

Builds for Windows are tested with CMake 3.17 and Visual Studio 2019.

### Configuring the build

To use CMake you must first configure the build. Create a build directory
in the root of the astenc checkout, and then run `cmake` inside that directory
To use CMake you must first configure the build. Create a build directory in
the root of the `astcenc` checkout, and then run `cmake` inside that directory
to generate the build system.

```shell
Expand All @@ -25,20 +26,21 @@ cd build
# Configure your build of choice, for example:

# x86-64 using NMake
cmake -G "NMake Makefiles" -T ClangCL -DCMAKE_BUILD_TYPE=Release ^
-DCMAKE_INSTALL_PREFIX=.\ -DISA_AVX2=ON -DISA_SSE41=ON -DISA_SSE2=ON ..
cmake -G "NMake Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=..\ ^
-DISA_AVX2=ON -DISA_SSE41=ON -DISA_SSE2=ON ..

# x86-64 using Visual Studio solution
cmake -G "Visual Studio 16 2019" -T ClangCL -DCMAKE_INSTALL_PREFIX=.\ ^
cmake -G "Visual Studio 16 2019" -T ClangCL -DCMAKE_INSTALL_PREFIX=..\ ^
-DISA_AVX2=ON -DISA_SSE41=ON -DISA_SSE2=ON ..
```

This example shows all SIMD variants being enabled. It is possible to build a
subset of the supported variants by enabling only the ones you require. At
least one variant must be enabled.
A single CMake configure can build multiple binaries for a single target CPU
architecture, for example building x64 for both SSE2 and AVX2. Each binary name
will include the build variant as a postfix. It is possible to build any set of
the supported SIMD variants by enabling only the ones you require.

Using the Visual Studio Clang-cl LLVM toolchain (`-T ClangCL`) is optional but
produces signficantly faster binaries than the default toolchain. The C++ LLVM
Using the Visual Studio Clang-CL LLVM toolchain (`-T ClangCL`) is optional but
produces significantly faster binaries than the default toolchain. The C++ LLVM
toolchain component must be installed via the Visual Studio installer.

### Building
Expand All @@ -61,7 +63,7 @@ Builds for macOS and Linux are tested with CMake 3.17 and clang++ 9.0.
### Configuring the build

To use CMake you must first configure the build. Create a build directory
in the root of the astenc checkout, and then run `cmake` inside that directory
in the root of the astcenc checkout, and then run `cmake` inside that directory
to generate the build system.

```shell
Expand All @@ -75,32 +77,30 @@ cd build
# Configure your build of choice, for example:

# Arm arch64
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=./ \
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../ \
-DISA_NEON=ON ..

# x86-64
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=./ \
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../ \
-DISA_AVX2=ON -DISA_SSE41=ON -DISA_SSE2=ON ..

# macOS universal binary build
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=./ \
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=../ \
-DISA_AVX2=ON -DISA_NEON=ON ..
```

This example shows all SIMD variants being enabled. It is possible to build a
subset of the supported variants by enabling only the ones you require.

For all platforms a single CMake configure can build multiple binaries for a
single target CPU architecture, for example building x64 for both SSE2 and
AVX2. The binary name will include the build variant as a postfix.

The macOS platform additionally supports the ability to build a universal
binary, combining one x86 and one arm64 variant into a single output binary.
The OS select the correct variant to run for the machine being used to run the
binary. To build a universal binary select a single x64 variant and a single
arm64 variant, and both will be included in a single output binary. It is not
required, but if `CMAKE_OSX_ARCHITECTURES` is set on the command line (e.g.
by XCode-generated build commands) it will be validated against the other
A single CMake configure can build multiple binaries for a single target CPU
architecture, for example building x64 for both SSE2 and AVX2. Each binary name
will include the build variant as a postfix. It is possible to build any set of
the supported SIMD variants by enabling only the ones you require.

For macOS, we additionally support the ability to build a universal binary,
combining one x86 and one arm64 variant into a single output binary. The OS
will select the correct variant to run for the machine being used to run the
built binary. To build a universal binary select a single x86 variant and a
single arm64 variant, and both will be included in a single output binary. It
is not required, but if `CMAKE_OSX_ARCHITECTURES` is set on the command line
(e.g. by XCode-generated build commands) it will be validated against the other
configuration variant settings.

### Building
Expand All @@ -116,7 +116,8 @@ make install -j16

## Advanced build options

For codec developers there are a number of useful features in the build system.
For codec developers and power users there are a number of useful features in
the build system.

### Build Types

Expand All @@ -131,32 +132,39 @@ We support and test the following `CMAKE_BUILD_TYPE` options.
Note that optimized release builds are compiled with link-time optimization,
which can make profiling more challenging ...

### Constrained block size builds

All normal builds will support all ASTC block sizes, including the worst case
6x6x6 3D block size (216 texels per block). Compressor memory footprint and
performance can be improved by limiting the block sizes supported in the build
by adding `-DBLOCK_MAX_TEXELS=<texel_count>` to to CMake command line when
configuring. Legal block sizes that are unavailable in a restricted build will
return the error `ASTCENC_ERR_NOT_IMPLEMENTED` during context creation.

### Non-invariant builds

All normal builds are designed to be invariant, so any build from the same git
revision will produce bit-identical results for all compilers and CPU
architectures. To achieve this we sacrifice some performance, so if this is
not required you can specify `-DNO_INVARIANCE=ON` to enable additional
optimizations.

### No intrinsics builds

All normal builds will use SIMD accelerated code paths using intrinsics, as all
target architectures (x86-64 and aarch64) guarantee SIMD availability. For
supported target architectures (x86 and arm64) guarantee SIMD availability. For
development purposes it is possible to build an intrinsic-free build which uses
no explicit SIMD acceleration (the compiler may still auto-vectorize).

To enable this binary variant add `-DISA_NONE=ON` to the CMake command line
when configuring. It is NOT recommended to use this for production; it is
significantly slower than the vectorized SIMD builds.

### Constrained block sizebuilds

All normal builds will support all ASTC block sizes, including the worst case
6x6x6 3D block size (216 texels per block). Compressor memory footprint and
performance can be improved by limiting the block sizes supported in the build
by adding `-DBLOCK_MAX_TEXELS=<texel_count>` to to CMake command line when
configuring. Legal block sizes that are unavailable in a restricted build will
return the error `ASTCENC_ERR_NOT_IMPLEMENTED` during context creation.

### Testing
### Test builds

We support building unit tests.

These builds use the `googletest` framework, which is pulled in though a git
submodule. On first use, you must fetch the submodule dependency:
We support building unit tests. These use the `googletest` framework, which is
pulled in though a git submodule. On first use, you must fetch the submodule
dependency:

```shell
git submodule init
Expand All @@ -174,7 +182,13 @@ cd build
ctest --verbose
```

### Packaging
### Address sanitizer builds

We support building with ASAN on Linux and macOS when using a compiler that
supports it. To build binaries with ASAN checking enabled add `-DASAN=ON` to
the CMake command line when configuring.

## Packaging a release bundle

We support building a release bundle of all enabled binary configurations in
the current CMake configuration using the `package` build target
Expand All @@ -190,7 +204,7 @@ Windows packages will use the `.zip` format, other packages will use the

## Integrating as a library into another project

The core codec of astcenc is built as a library, and so can be easily
The core codec of `astcenc` is built as a library, and so can be easily
integrated into other projects using CMake. An example of the CMake integration
and the codec API usage can be found in the `./Utils/Example` directory in the
repository. See the [Example Readme](../Utils/Example/README.md) for more
Expand Down
117 changes: 115 additions & 2 deletions lib/astc-encoder/Docs/ChangeLog-3x.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,120 @@ release of the 3.x series.
All performance data on this page is measured on an Intel Core i5-9600K
clocked at 4.2 GHz, running `astcenc` using AVX2 and 6 threads.

<!-- ---------------------------------------------------------------------- -->
## 3.7

**Status:** April 2022

The 3.7 release contains another round of performance optimizations, including
significant improvements to the command line front-end (faster PNG loader) and
the arm64 build of the codec (faster NEON implementation).

* **General:**
* **Feature:** The command line tool PNG loader has been switched to use
the Wuffs library, which is robust and significantly faster than the
current stb_image implementation.
* **Feature:** Support for non-invariant builds returns. Opt-in to slightly
faster, but not bit-exact, builds by setting `-DNO_INVARIANCE=ON` for the
CMake configuration. This improves performance by around 2%.
* **Optimization:** Changed SIMD `select()` so that it matches the default
NEON behavior (bitwise select), rather than the default x86-64 behavior
(lane select on MSB). Specialization `select_msb()` added for the one case
we want to select on a sign-bit, where NEON needs a different
implementation. This provides a significant (>25%) performance uplift on
NEON implementations.

### Performance:

Key for charts:

* Color = block size (see legend).
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).

**Relative performance vs 3.5 release:**

![Relative scores 3.7 vs 3.6](./ChangeLogImg/relative-3.6-to-3.7.png)

<!-- ---------------------------------------------------------------------- -->
## 3.6

**Status:** April 2022

The 3.6 release contains another round of performance optimizations.

There are no interface changes in this release, but in general the API is not
designed to be binary compatible across versions. We always recommend
rebuilding your client-side code using the updated `astcenc.h` header.

* **General:**
* **Feature:** Data tables are now optimized for contexts without the
`SELF_DECOMPRESS_ONLY` flag set. The flag therefore no longer improves
compression performance, but still reduces context creation time and
context data table memory footprint.
* **Feature:** Image quality for 4x4 `-fastest` configuration has been
improved.
* **Optimization:** Decimation modes are reliably excluded from processing
when they are only partially selected in the compressor configuration (e.g.
if used for single plane, but not dual plane modes). This is a significant
performance optimization for all quality levels.
* **Optimization:** Fast-path block load function variant added for 2D LDR
images with no swizzle. This is a moderate performance optimization for the
fast and fastest quality levels.

### Performance:

Key for charts:

* Color = block size (see legend).
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).

**Relative performance vs 3.5 release:**

![Relative scores 3.6 vs 3.5](./ChangeLogImg/relative-3.5-to-3.6.png)

<!-- ---------------------------------------------------------------------- -->
## 3.5

**Status:** March 2022

The 3.5 release contains another round of performance optimizations.

There are no interface changes in this release, but in general the API is not
designed to be binary compatible across versions. We always recommend
rebuilding your client-side code using the updated `astcenc.h` header.

* **General:**
* **Feature:** Compressor configurations using `SELF_DECOMPRESS_ONLY` mode
store compacted partition tables, which significantly improves both
context create time and runtime performance.
* **Feature:** Bilinear infill for decimated weight grids supports a new
variant for half-decimated grids which are only decimated in one axis.

### Performance:

Key for charts:

* Color = block size (see legend).
* Letter = image format (N = normal map, G = grayscale, L = LDR, H = HDR).

**Relative performance vs 3.4 release:**

![Relative scores 3.5 vs 3.4](./ChangeLogImg/relative-3.4-to-3.5.png)


<!-- ---------------------------------------------------------------------- -->
## 3.4

**Status:** In development
**Status:** February 2022

The 3.4 release introduces another round of optimizations, removing a number
of power-user configuration options to simplify the core compressor data path.

Reminder for users of the library interface - the API is not designed to be
binary compatible across versions, and this release is not compatible with
earlier releases. Please update and rebuild your client-side code using the
updated `astcenc.h` header.

* **General:**
* **Feature:** Many memory allocations have been moved off the stack into
dynamically allocated working memory. This significantly reduces the peak
Expand All @@ -35,6 +140,13 @@ of power-user configuration options to simplify the core compressor data path.
* **Feature:** The `-perceptual` option to set a perceptual error metric is
still supported, but is currently a no-op in the compressor for mask map
and normal map textures.
* **Bug-fix:** Corrected decompression of error blocks in some cases, so now
returning the expected error color (magenta for LDR, NaN for HDR). Note
that astcenc determines the error color to use based on the output image
data type not the decoder profile.
* **Binary releases:**
* **Improvement:** Windows binaries changed to use ClangCL 12.0, which gives
up to 10% performance improvement.

### Performance:

Expand All @@ -45,7 +157,8 @@ Key for charts:

**Relative performance vs 3.3 release:**

Pending ...
![Relative scores 3.4 vs 3.3](./ChangeLogImg/relative-3.3-to-3.4.png)


<!-- ---------------------------------------------------------------------- -->
## 3.3
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 51f4763

Please sign in to comment.