Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batched f16 conversion #191

Merged
merged 33 commits into from
Jul 8, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
baf90df
prototype unoptimized batched f16 conversion + fix round up division …
johannesvollmer Jan 7, 2023
c13b24e
Merge branch 'master' into f16_batch_conversion
johannesvollmer Jan 7, 2023
7b41d0d
rename some stuff and improve error message
johannesvollmer Jan 7, 2023
284da36
use batch size of 16
johannesvollmer Jan 7, 2023
b4b9518
Merge branch 'master' into f16_batch_conversion
johannesvollmer Jan 8, 2023
9de05cc
improve comments, update to new `half` version
johannesvollmer Jan 20, 2023
27aca4d
Merge remote-tracking branch 'origin/f16_batch_conversion' into f16_b…
johannesvollmer Jan 20, 2023
5d96f18
Merge branch 'master' into f16_batch_conversion
johannesvollmer Jan 20, 2023
376ae08
revert an incomplete refactoring
johannesvollmer Jan 20, 2023
6475c87
refactor batch conversion function to reduce code duplication
johannesvollmer Jan 20, 2023
aaf06fe
Merge branch 'master' into f16_batch_conversion
johannesvollmer Feb 28, 2023
278c517
Merge branch 'master' into f16_batch_conversion
johannesvollmer Jun 27, 2023
a6c125b
add simple unit test
johannesvollmer Jun 27, 2023
da4690e
fix two compiler warnings
johannesvollmer Jun 27, 2023
633ab61
force use newest version of `half`
johannesvollmer Jun 27, 2023
3d3c6bd
Update src/block/samples.rs
johannesvollmer Jun 27, 2023
ff2a2ed
Update src/block/samples.rs
johannesvollmer Jun 27, 2023
66123ed
Merge remote-tracking branch 'origin/f16_batch_conversion' into f16_b…
johannesvollmer Jul 2, 2023
441e813
add more benchmarks
johannesvollmer Jul 2, 2023
8cc1e38
add more benchmarks
johannesvollmer Jul 2, 2023
3ece29d
add more benchmarks
johannesvollmer Jul 2, 2023
4236bf3
Merge branch 'master' into f16_batch_conversion
johannesvollmer Jul 3, 2023
a572e94
inline-hint closures
johannesvollmer Jul 3, 2023
f559647
undo use inline attribute (experimental, not supported yet)
johannesvollmer Jul 3, 2023
73a62b9
use inline syntax for `run` commands
johannesvollmer Jul 3, 2023
f0052ad
Merge branch 'master' into f16_batch_conversion
johannesvollmer Jul 3, 2023
1d5fd6b
attempt ci without cache
johannesvollmer Jul 6, 2023
980369b
attempt fix ci
johannesvollmer Jul 6, 2023
e9bff52
refactor
johannesvollmer Jul 6, 2023
79940c5
bump rust version to take advantage of f16 intrinsics
johannesvollmer Jul 6, 2023
3ce3c05
Merge branch 'master' into f16_batch_conversion
johannesvollmer Jul 7, 2023
c3cb590
retain backwards compatibility for this pr, only break the release af…
johannesvollmer Jul 7, 2023
85a311d
Merge branch 'master' into f16_batch_conversion
johannesvollmer Jul 8, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 60 additions & 12 deletions src/block/samples.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
//! Extract pixel samples from a block of pixel bytes.

use crate::prelude::*;
use half::prelude::HalfFloatSliceExt;


/// A single red, green, blue, or alpha value.
Expand Down Expand Up @@ -112,6 +113,7 @@ impl From<Sample> for u32 { #[inline] fn from(s: Sample) -> Self { s.to_u32() }

/// Create an arbitrary sample type from one of the defined sample types.
/// Should be compiled to a no-op where the file contains the predicted sample type.
/// The slice functions should be optimized into a `memcpy` where there is no conversion needed.
pub trait FromNativeSample: Sized + Copy + Default + 'static {

/// Create this sample from a f16, trying to represent the same numerical value
Expand All @@ -122,31 +124,77 @@ pub trait FromNativeSample: Sized + Copy + Default + 'static {

/// Create this sample from a u32, trying to represent the same numerical value
fn from_u32(value: u32) -> Self;

/// Convert all values from the slice into this type.
/// This function exists to allow the compiler to perform a vectorization optimization.
#[inline]
fn from_f16s(from: &[f16], to: &mut [Self]) {
johannesvollmer marked this conversation as resolved.
Show resolved Hide resolved
assert_eq!(from.len(), to.len(), "slices must have the same length");
for (from, to) in from.iter().zip(to.iter_mut()) {
*to = Self::from_f16(*from);
}
}

/// Convert all values from the slice into this type.
/// This function exists to allow the compiler to perform a vectorization optimization.
#[inline]
fn from_f32s(from: &[f32], to: &mut [Self]) {
johannesvollmer marked this conversation as resolved.
Show resolved Hide resolved
assert_eq!(from.len(), to.len(), "slices must have the same length");
for (from, to) in from.iter().zip(to.iter_mut()) {
*to = Self::from_f32(*from);
}
}

/// Convert all values from the slice into this type.
/// This function exists to allow the compiler to perform a vectorization optimization.
#[inline]
fn from_u32s(from: &[u32], to: &mut [Self]) {
assert_eq!(from.len(), to.len(), "slices must have the same length");
for (from, to) in from.iter().zip(to.iter_mut()) {
*to = Self::from_u32(*from);
}
}
}

// TODO haven't i implemented this exact behaviour already somewhere else in this library...??
impl FromNativeSample for f32 {
fn from_f16(value: f16) -> Self { value.to_f32() }
fn from_f32(value: f32) -> Self { value } // this branch means that we never have to match every single sample if the file format matches the expected output
fn from_u32(value: u32) -> Self { value as f32 }
#[inline] fn from_f16(value: f16) -> Self { value.to_f32() }
#[inline] fn from_f32(value: f32) -> Self { value }
#[inline] fn from_u32(value: u32) -> Self { value as f32 }

// f16 is a custom type
// so the compiler can not automatically vectorize the conversion
// that's why we need to specialize this function
#[inline]
fn from_f16s(from: &[f16], to: &mut [Self]) {
from.convert_to_f32_slice(to);
}
}

impl FromNativeSample for u32 {
fn from_f16(value: f16) -> Self { value.to_f32() as u32 }
fn from_f32(value: f32) -> Self { value as u32 }
fn from_u32(value: u32) -> Self { value }
#[inline] fn from_f16(value: f16) -> Self { value.to_f32() as u32 }
#[inline] fn from_f32(value: f32) -> Self { value as u32 }
#[inline] fn from_u32(value: u32) -> Self { value }
}

impl FromNativeSample for f16 {
fn from_f16(value: f16) -> Self { value }
fn from_f32(value: f32) -> Self { f16::from_f32(value) }
fn from_u32(value: u32) -> Self { f16::from_f32(value as f32) }
#[inline] fn from_f16(value: f16) -> Self { value }
#[inline] fn from_f32(value: f32) -> Self { f16::from_f32(value) }
#[inline] fn from_u32(value: u32) -> Self { f16::from_f32(value as f32) }

// f16 is a custom type
// so the compiler can not automatically vectorize the conversion
// that's why we need to specialize this function
#[inline]
fn from_f32s(from: &[f32], to: &mut [Self]) {
to.convert_from_f32_slice(from)
}
}

impl FromNativeSample for Sample {
fn from_f16(value: f16) -> Self { Self::from(value) }
fn from_f32(value: f32) -> Self { Self::from(value) }
fn from_u32(value: u32) -> Self { Self::from(value) }
#[inline] fn from_f16(value: f16) -> Self { Self::from(value) }
#[inline] fn from_f32(value: f32) -> Self { Self::from(value) }
#[inline] fn from_u32(value: u32) -> Self { Self::from(value) }
}


Expand Down
86 changes: 71 additions & 15 deletions src/image/read/specific_channels.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ use crate::image::read::layers::{ChannelsReader, ReadChannels};
use crate::block::chunk::TileCoordinates;

use std::marker::PhantomData;
use crate::io::Read;


/// Can be attached one more channel reader.
Expand Down Expand Up @@ -279,30 +280,85 @@ pub struct OptionalSampleReader<DefaultSample> {
impl<Sample: FromNativeSample> SampleReader<Sample> {
fn read_own_samples<'s, FullPixel>(
&self, bytes: &'s[u8], pixels: &mut [FullPixel],
get_pixel: impl Fn(&mut FullPixel) -> &mut Sample
get_sample: impl Fn(&mut FullPixel) -> &mut Sample
){
let start_index = pixels.len() * self.channel_byte_offset;
let byte_count = pixels.len() * self.channel.sample_type.bytes_per_sample();
let mut own_bytes_reader = &bytes[start_index .. start_index + byte_count]; // TODO check block size somewhere

let error_msg = "error when reading from in-memory slice";
let mut own_bytes_reader = &mut &bytes[start_index .. start_index + byte_count]; // TODO check block size somewhere
let output = pixels.iter_mut().map(|pixel| get_sample(pixel));

// match outside the loop to avoid matching on every single sample
match self.channel.sample_type {
SampleType::F16 => for pixel in pixels.iter_mut() {
*get_pixel(pixel) = Sample::from_f16(f16::read(&mut own_bytes_reader).expect(error_msg));
},

SampleType::F32 => for pixel in pixels.iter_mut() {
*get_pixel(pixel) = Sample::from_f32(f32::read(&mut own_bytes_reader).expect(error_msg));
},

SampleType::U32 => for pixel in pixels.iter_mut() {
*get_pixel(pixel) = Sample::from_u32(u32::read(&mut own_bytes_reader).expect(error_msg));
},
SampleType::F16 => read_and_convert_samples_batched(
&mut own_bytes_reader, output,
Sample::from_f16s
),

SampleType::F32 => read_and_convert_samples_batched(
&mut own_bytes_reader, output,
Sample::from_f32s
),

SampleType::U32 => read_and_convert_samples_batched(
&mut own_bytes_reader, output,
Sample::from_u32s
),
}

debug_assert!(own_bytes_reader.is_empty(), "bytes left after reading all samples");


/// performs something similar to
/// `for sample in out_samples { *sample = Sample::convert_from(f16/f32/u32::read_from_bytes(bytes)); }`
fn read_and_convert_samples_batched<'t, From, To>(
mut bytes: impl Read,
mut out_samples: impl ExactSizeIterator<Item=&'t mut To>,
convert_slice: impl Fn(&[From], &mut [To])
) where From: Data + Default + Copy, To: 't + Default + Copy
{
// using a batch size of 4
// because that's what `half` has vectorization for,
// and we want the compiler to
// optimize away all the logic in
// `HalfFloatSliceExt::convert_from_f32_slice`
johannesvollmer marked this conversation as resolved.
Show resolved Hide resolved

// this is not a global! why is this warning triggered?
#[allow(non_upper_case_globals)]
const batch_size: usize = 4;

let mut source_samples_batch: [From; batch_size] = Default::default();
let mut desired_samples_batch: [To; batch_size] = Default::default();

let total_sample_count = out_samples.len();
let batch_count = total_sample_count / batch_size;
let remaining_samples_count = total_sample_count % batch_size;

let error_msg = "error when reading from in-memory slice";

for _ in 0 .. batch_count {
Data::read_slice(&mut bytes, &mut source_samples_batch).expect(error_msg);
convert_slice(source_samples_batch.as_slice(), desired_samples_batch.as_mut_slice());

for converted_sample in desired_samples_batch {
*out_samples.next().expect("less elements than calculated") = converted_sample;
}
}

if remaining_samples_count != 0 {
let source_samples_batch = &mut source_samples_batch[..remaining_samples_count];
let desired_samples_batch = &mut desired_samples_batch[..remaining_samples_count];

// TODO dedup with above
Data::read_slice(&mut bytes, source_samples_batch).expect(error_msg);
convert_slice(source_samples_batch, desired_samples_batch);

for converted_sample in desired_samples_batch {
*out_samples.next().expect("less elements than calculated") = *converted_sample;
}
}

debug_assert!(out_samples.next().is_none(), "not all samples have been written");
}
}
}

Expand Down
8 changes: 7 additions & 1 deletion src/math.rs
Original file line number Diff line number Diff line change
Expand Up @@ -194,9 +194,15 @@ impl RoundingMode {
}
}

/// Only works for positive numbers.
pub(crate) fn divide<T>(self, dividend: T, divisor: T) -> T
where T: Copy + Add<Output = T> + Sub<Output = T> + Div<Output = T> + From<u8>
where T: Copy + Add<Output = T> + Sub<Output = T> + Div<Output = T> + From<u8> + std::cmp::PartialOrd
{
assert!(
dividend >= T::from(0) && divisor >= T::from(1),
"division with rounding up only works for positive numbers"
);

match self {
RoundingMode::Up => (dividend + divisor - T::from(1_u8)) / divisor, // only works for positive numbers
RoundingMode::Down => dividend / divisor,
Expand Down