- For bitmaps with 32bpp or less, endianess comes into play. Most rendering libraries (including pixman/cairo) use bit signifance instead of byte order. This makes sense from a historical perspective, where RGB565 and other packed bit formats were the norm. It makes less sense from a modern, float-centric perspective, where 24/32-bit bitmaps are baseline, and iterating over bytes or floats is more ergonomic. It is suggested that in-memory bitmaps be indentified by their bitwise (significance) and bytewise (order) formats indepentently. For example, on little-endian ARGB8888 is the primary bitwise format, but BGRA8888 would be the bytewise format. This distinction is rarely considered by windows developers, but must be considered for Core CLR.
- In a web context, 32-bit sRGB BGRA is the primary "at-rest" memory format.
- GDI+, WPF, and even Photoshop have gotten a lot of color theory and basic computer science (signal processing) wrong. Don't assume their output is correct, or desirable. Lighroom is a good baseline, but Photoshop is bogged down with legacy user expectations.
- Most image processing needs to occur in a linear light color space, NOT sRGB, which is perceptual and heavily weighted towards darks. 8-bit is barely sufficient precision for storing sRGB, which 'compresses' the color space in a way that coincides with our perceptive abilities. NEVER store linear RGB in less than 14 integer bits/channel. There's no good reason not to use 4-byte floating-point, particularly if you're operating on a small number of scan lines at a time.
- Nearly 30% of CVEs come from imaging libraries and codecs. This is not because stupid people wrote them. Most had decades of experience in the space. It's because imaging, metadata, and codecs combine complexity, optimization, and low-level primtives in ways that can be difficult to reason about; it's the perfect, unfactorable storm. Halide and OpenCL are promising abstractions, but they haven't really arrived yet. Rust, too, promises zero-overhead abstractions that could increas security. But these are ...maturing... technologies, and this space pushes every language or component to its limits. If, in 2015, we still struggle with edge cases in C compilers, in this space, we should keep our expectations similarly low when experimenting with Rust.
- Allocations can be 20-50% of proccessing time. Large allocations are particularly latent on Windows. Virtual memory makes custom allocators hard to reason about, but the amount of time spent acquiring memory indicates that this might be low-hanging fruit.
- C++ templates are handy, but will exclude contributors. Vanilla C libraries have a history of long-term maintenance. C++ imaging libraries do not.
- You cannot avoid pointers. Embrace them, and publish them in interfaces. There are no success stories of manipulating modern-sized bitmaps in C# without them.
- Performance comes from effective use of the CPU cache, not SIMD. Focus on the cache, and you will prosper. Algorithm size can actually play a role here; this is not trivially optimized.
- Valgrind is not optional. Create comprehensive test suites and execute them with valgrind. Visual Studio cannot subsutitute for valgrind. It is an essential development tool in this space.
- Separability by dimension and composability are key to good performance. Very few imaging operations cannot be composed.
- Manual instrumentation with pre-allocated profiling logs may be required. Existing tooling often causes wild goose chases.
- Allocation failure should be assumed a given. Prefer OS-specific allocation calls that prevent paging to disk, and fail early. Handle all results.
- FILE and LINE allow you to document errors quite clearly, if you also keep conditionals simple.
- Use lookup tables from going from byte to float. Even casting is slower than consulting a 1kb table, somehow.
- Memory reads are more expensive that writes. Make read sequential, and let scatters happen during writes, if possible.
- Never use global or even thread-local state. Pass a context to every function. The context can hold error details, profiling logs, and more.
- Users think about image math in very different ways. They cannot be happily shoehorned into a single API. Most prefer to describe the result (I want an image of width w and height h in format f, etc). Others want an imperative interface where they apply ordered operations (even though they usually get it wrong). Percentages make sense to some, while others think only in pixels. It's important that the underlying framework be able to accomodate a wide range of wrappers, in order to make everyone happy.
- Doing the right thing by default - all the time - isn't terribly easy, but it's terribly important. Users don't want become imaging specialists to solve their 'tiny' problem. Letting users solve a problem without really understanding it is actually important here, even though in typically I frown upon that.
- Never do I/O while holding an uncompressed bitmap in memory. Never. Streams aren't your friend here. You can easily max out RAM and cause disk paging because of a I/O latency spike. That paging will cause more latency, which will... you get the picture. Buffer compressed images fully into memory before decoding. The exception to this rule is (very large) multi-page, multi-frame, or tiled media, in which case you've got a tricky buffering problem to deal with.
- There are incredible performance gains available if the entire imaging task can be represented as a graph of operations. Operations can be composed all the way down into the jpeg decoder, for order-of-magnitue gains. But the user-facing API will need to both hide this graph and allow all those optimizations to be broken. Intermediate imaging developers who want to work with pixels directly will require predictability and an easy-to-reason-about series of steps. Give them an IntPtr, Stride, Width, Height, Length, BitwiseFmt, and BytewiseFmt to work on 1 frame at a time.
- IDisposable is too naive to work in the imaging space. There is not a one-to-one correlation between resources and objects, nor can there be if the API is to be sane. Allocation of space for an in-memory bitmap can take 60ms; it's too costly to discard and re-create unnecessarily. A class may represent a subregion of a frame, whose resources may be allocated by a parent image object, or something else altogether. It makes more sense to track these resources within "scopes", knowing that disposal of the scope/container will fully purge all unmanaged resources in the correct order. This is an unfortunate wart, but it can not be cleanly removed without something like rust's lifetime tracking, which is not an appropriate feature for a web app language.