oblas these implement just enough blas-like routines to implement a solver over a finite field of octets. gradually optimizing for different archs. (eg. SSE, AVX, NEON) references: https://www.ssrc.ucsc.edu/Papers/plank-fast13.pdf