Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change ArrayBase.ptr to NonNull type #683

Merged
merged 8 commits into from
Sep 9, 2019

Conversation

berquist
Copy link
Contributor

Closes #434

Should any of these unwrap calls turn into proper error handling? It looks like most functions already state that they'll panic if given an invalid pointer.

It also looks like there's some wasteful *mut A -> NonNull<A> -> *mut A or NonNull<A> -> *mut A -> NonNull<A> conversions happening (for example, in impl_clone). What do you think about eliminating these?

src/impl_raw_views.rs Outdated Show resolved Hide resolved
src/impl_raw_views.rs Outdated Show resolved Hide resolved
src/impl_clone.rs Outdated Show resolved Hide resolved
@bluss
Copy link
Member

bluss commented Aug 23, 2019

A fun very loose concern (I'm sorry, might just be interesting!)

Rust has in the past had problems with loop optimization of iterators that yield Option<&T> because of the non-nullable pointer optimization that is used for references in Option. Long story short, the reference compiles to just a pointer, there are some attributes that tell the compiler the pointer will be non-null, and simultaneously the code will use the null value to encode Option::None.

Then the non-nullable annotation can get lost or confused in some optimization passes, and the result is that it's not always certain if null means the end of the loop or just the next pointer result of the iteration.

When using NonNull inside ArrayView like this, it's possible that the same misoptimization can show up in Iterators yielding array views because Option<ArrayView<..>> could have the same problem.

Wait - but don't the most important iterators in Rust use Option<&T>, like the slice, so it must be good? They are in fact tuned with assume etc: https://doc.rust-lang.org/1.37.0/src/core/slice/mod.rs.html#525 https://doc.rust-lang.org/1.37.0/src/core/slice/mod.rs.html#3102 which we don't have access to in stable Rust.

We can just look out for it and check benchmarks

@berquist
Copy link
Contributor Author

Alright, I think I got it.

Some benchmarks:

It looks like there might be a regression in taking dot products.

@bluss
Copy link
Member

bluss commented Sep 4, 2019

Thanks a lot, I'd love to take a closer look at this. This is a good tool for benchmark comparisons: https://github.com/BurntSushi/cargo-benchcmp

From master to the latest after unchecked:

 name                               63 ns/iter  62 ns/iter  diff ns/iter   diff % 
 add_1d_regular                     369         373                    4    1.08% 
 add_1d_strided                     1,449       1,481                 32    2.21% 
 add_2d_0_to_2_iadd_scalar          247         252                    5    2.02% 
 add_2d_alloc                       699         711                   12    1.72% 
 add_2d_assign_ops                  374         382                    8    2.14% 
 add_2d_both_transposed             1,343       1,321                -22   -1.64% 
 add_2d_broadcast_0_to_2            248         251                    3    1.21% 
 add_2d_broadcast_1_to_2            902         480                 -422  -46.78% 
 add_2d_cutout                      924         547                 -377  -40.80% 
 add_2d_cutouts_by_16               1,609       1,158               -451  -28.03% 
 add_2d_cutouts_by_32               1,169       753                 -416  -35.59% 
 add_2d_cutouts_by_4                6,105       5,303               -802  -13.14% 
 add_2d_f32_regular                 376         380                    4    1.06% 
 add_2d_one_transposed              1,200       1,266                 66    5.50% 
 add_2d_regular                     376         378                    2    0.53% 
 add_2d_regular_dyn                 428         431                    3    0.70% 
 add_2d_strided                     1,274       1,370                 96    7.54% 
 add_2d_strided_dyn                 1,278       1,378                100    7.82% 
 add_2d_zip                         872         392                 -480  -55.05% 
 add_2d_zip_alloc                   491         496                    5    1.02% 
 add_2d_zip_both_transposed         875         394                 -481  -54.97% 
 add_2d_zip_cutout                  967         628                 -339  -35.06% 
 add_2d_zip_one_transposed          1,188       1,271                 83    6.99% 
 add_2d_zip_strided                 1,272       1,362                 90    7.08% 
 add_3d_strided                     2,520       2,361               -159   -6.31% 
 add_3d_strided_dyn                 2,525       2,346               -179   -7.09% 
 assign_scalar_2d_corder            483         492                    9    1.86% 
 assign_scalar_2d_cutout            995         1,007                 12    1.21% 
 assign_scalar_2d_forder            488         494                    6    1.23% 
 assign_zero_2d_corder              483         499                   16    3.31% 
 assign_zero_2d_cutout              995         1,017                 22    2.21% 
 assign_zero_2d_forder              488         499                   11    2.25% 
 bench_col_iter                     367         369                    2    0.54% 
 bench_iter_diag                    486         255                 -231  -47.53% 
 bench_row_iter                     252         252                    0    0.00% 
 bench_to_owned_n                   90          93                     3    3.33% 
 bench_to_owned_strided             349         363                   14    4.01% 
 bench_to_owned_t                   90          93                     3    3.33% 
 chunk2x2_iter_sum                  307,793     308,089              296    0.10% 
 chunk2x2_sum                       132,500     129,286           -3,214   -2.43% 
 chunk2x2_sum_get1                  139,509     138,343           -1,166   -0.84% 
 chunk2x2_sum_get2                  143,527     142,770             -757   -0.53% 
 chunk2x2_sum_uget1                 104,488     104,178             -310   -0.30% 
 clip                               400         388                  -12   -3.00% 
 create_iter_4d                     22          22                     0    0.00% 
 default_f64                        1,999       2,014                 15    0.75% 
 dot_extended                       1,892       1,776               -116   -6.13% 
 dot_f32_1024                       88          92                     4    4.55% 
 dot_f32_10e6                       90,033      91,441             1,408    1.56% 
 dot_f32_16                         4           4                      0    0.00% 
 dot_f32_20                         5           5                      0    0.00% 
 dot_f32_256                        22          22                     0    0.00% 
 dot_f32_32                         4           5                      1   25.00% 
 equality_f32                       499         498                   -1   -0.20% 
 equality_f32_mixorder              2,737       2,870                133    4.86% 
 equality_i32                       496         505                    9    1.81% 
 fold_axis                          554         538                  -16   -2.89% 
 fold_sum_i32_2d_cutout             343         300                  -43  -12.54% 
 fold_sum_i32_2d_cutout_transpose   343         301                  -42  -12.24% 
 fold_sum_i32_2d_regular            138         139                    1    0.72% 
 fold_sum_i32_2d_stride             978         997                   19    1.94% 
 fold_sum_i32_2d_transpose          139         143                    4    2.88% 
 gemv_64_32                         585         590                    5    0.85% 
 gemv_64_64c                        864         852                  -12   -1.39% 
 gemv_64_64f                        2,176       2,193                 17    0.78% 
 iadd_scalar_2d_regular             248         252                    4    1.61% 
 iadd_scalar_2d_regular_dyn         281         279                   -2   -0.71% 
 iadd_scalar_2d_strided             1,066       1,075                  9    0.84% 
 iadd_scalar_2d_strided_dyn         1,050       1,077                 27    2.57% 
 indexed_iter_1d_ix1                1,424       965                 -459  -32.23% 
 indexed_iter_2d_ix2                3,797       2,605             -1,192  -31.39% 
 indexed_iter_3d_dyn                111,694     111,813              119    0.11% 
 indexed_iter_3d_ix3                5,892       3,025             -2,867  -48.66% 
 indexed_zip_1d_ix1                 973         969                   -4   -0.41% 
 indexed_zip_2d_ix2                 2,653       1,970               -683  -25.74% 
 indexed_zip_3d_ix3                 2,922       2,913                 -9   -0.31% 
 iter_all_2d_cutout                 8,461       8,499                 38    0.45% 
 iter_axis_chunks_1_iter_sum        1,452       1,468                 16    1.10% 
 iter_axis_chunks_5_iter_sum        604         614                   10    1.66% 
 iter_axis_iter_sum                 534         509                  -25   -4.68% 
 iter_filter_sum_2d_f32             310         314                    4    1.29% 
 iter_filter_sum_2d_stride_f32      167         167                    0    0.00% 
 iter_filter_sum_2d_stride_u32      108         108                    0    0.00% 
 iter_filter_sum_2d_u32             38          37                    -1   -2.63% 
 iter_sum_1d_raw                    137         139                    2    1.46% 
 iter_sum_1d_regular                137         139                    2    1.46% 
 iter_sum_1d_strided_fold           1,219       1,205                -14   -1.15% 
 iter_sum_1d_strided_rfold          1,212       1,199                -13   -1.07% 
 iter_sum_2d_by_row                 277         259                  -18   -6.50% 
 iter_sum_2d_cutout                 396         400                    4    1.01% 
 iter_sum_2d_cutout_by_row          358         355                   -3   -0.84% 
 iter_sum_2d_cutout_outer_iter      278         260                  -18   -6.47% 
 iter_sum_2d_raw                    138         138                    0    0.00% 
 iter_sum_2d_regular                143         143                    0    0.00% 
 iter_sum_2d_transpose              1,119       1,117                 -2   -0.18% 
 iter_sum_2d_transpose_by_row       1,979       1,858               -121   -6.11% 
 iter_sum_2d_transpose_regular      3,741       4,392                651   17.40% 
 map_axis_0                         465         465                    0    0.00% 
 map_axis_1                         416         489                   73   17.55% 
 map_regular                        30          30                     0    0.00% 
 map_stride                         85          87                     2    2.35% 
 map_stride_double_f64              334         304                  -30   -8.98% 
 map_stride_f64                     506         510                    4    0.79% 
 map_stride_u32                     524         535                   11    2.10% 
 mat_mul_f32::m004                  203         223                   20    9.85% 
 mat_mul_f32::m007                  258         257                   -1   -0.39% 
 mat_mul_f32::m008                  214         198                  -16   -7.48% 
 mat_mul_f32::m012                  575         577                    2    0.35% 
 mat_mul_f32::m016                  584         569                  -15   -2.57% 
 mat_mul_f32::m032                  3,105       3,137                 32    1.03% 
 mat_mul_f32::m064                  21,492      21,665               173    0.80% 
 mat_mul_f32::m127                  162,131     163,678            1,547    0.95% 
 mat_mul_f32::mix10000              12,692,607  12,809,887       117,280    0.92% 
 mat_mul_f32::mix16x4               691         709                   18    2.60% 
 mat_mul_f32::mix32x2               508         506                   -2   -0.39% 
 mat_mul_f64::m004                  195         218                   23   11.79% 
 mat_mul_f64::m007                  282         342                   60   21.28% 
 mat_mul_f64::m008                  237         274                   37   15.61% 
 mat_mul_f64::m012                  591         646                   55    9.31% 
 mat_mul_f64::m016                  773         827                   54    6.99% 
 mat_mul_f64::m032                  4,432       4,435                  3    0.07% 
 mat_mul_f64::m064                  31,081      31,231               150    0.48% 
 mat_mul_f64::m127                  235,004     235,987              983    0.42% 
 mat_mul_f64::mix10000              18,528,812  18,515,705       -13,107   -0.07% 
 mat_mul_f64::mix16x4               917         976                   59    6.43% 
 mat_mul_f64::mix32x2               719         762                   43    5.98% 
 mat_mul_i32::m004                  130         128                   -2   -1.54% 
 mat_mul_i32::m007                  402         403                    1    0.25% 
 mat_mul_i32::m008                  553         562                    9    1.63% 
 mat_mul_i32::m012                  1,299       1,334                 35    2.69% 
 mat_mul_i32::m016                  2,600       2,579                -21   -0.81% 
 mat_mul_i32::m032                  14,575      14,584                 9    0.06% 
 mat_mul_i32::m064                  92,357      97,458             5,101    5.52% 
 mat_mul_i32::m127                  780,099     785,692            5,593    0.72% 
 mean_axis0                         2,755       2,717                -38   -1.38% 
 mean_axis1                         2,283       2,212                -71   -3.11% 
 scalar_add_1                       547         577                   30    5.48% 
 scalar_add_2                       550         577                   27    4.91% 
 scalar_sub_1                       549         576                   27    4.92% 
 scalar_sub_2                       573         603                   30    5.24% 
 scalar_toowned                     328         360                   32    9.76% 
 scaled_add_2d_f32_regular          379         379                    0    0.00% 
 sum_2d_cutout                      372         373                    1    0.27% 
 sum_2d_float                       362         362                    0    0.00% 
 sum_2d_float_cutout                419         420                    1    0.24% 
 sum_2d_float_t_cutout              1,499       1,496                 -3   -0.20% 
 sum_2d_regular                     212         210                   -2   -0.94% 
 sum_3_azip                         4,738       4,733                 -5   -0.11% 
 sum_3_azip_fold                    1,012       1,006                 -6   -0.59% 
 sum_3_std_zip1                     1,007       1,000                 -7   -0.70% 
 sum_3_std_zip2                     1,008       991                  -17   -1.69% 
 sum_3_std_zip3                     1,013       991                  -22   -2.17% 
 sum_axis0                          2,746       2,713                -33   -1.20% 
 sum_axis1                          2,318       2,196               -122   -5.26% 
 vector_sum_3_azip                  2,438       2,396                -42   -1.72% 
 vector_sum_3_std_zip               2,406       2,408                  2    0.08% 
 vector_sum_3_zip_unchecked         2,418       2,394                -24   -0.99% 
 vector_sum_3_zip_unchecked_manual  2,440       2,405                -35   -1.43% 
 zeros_f64                          2,020       2,029                  9    0.45% 

src/impl_constructors.rs Outdated Show resolved Hide resolved
src/impl_methods.rs Outdated Show resolved Hide resolved
src/impl_views.rs Outdated Show resolved Hide resolved
src/impl_views.rs Outdated Show resolved Hide resolved
@bluss
Copy link
Member

bluss commented Sep 8, 2019

I've used the breaking-change label so that we remember to put a note in the relnotes, that raw pointers are now checked to be non-null and can never be null, even in empty array views (This has been true before too, but it's true now too 🙂).

rawpointer 0.2 implements offset methods for NonNull<T>, which will be
useful for us.

This was already a dev-dependency, and also already a regular transitive
dependency (from matrixmultiply).
Add two constructors that suffice for ndarray:

- Get NonNull<T> from Vec<T>'s buffer pointer
- Get NonNull<T> from raw pointer, and check if it's nonnull with
a debug assertion

In all other cases we can use .offset() on the NonNull (method from
rawpointer 0.2, so we don't have to explicit roundtrip through raw
pointers when we offset NonNull<T>.

For safety, see this documentation in the rawpointer crate, which
says why using .offset() on NonNull is safe.

NonNull<T> supports the same offsetting methods under the same safety
constraints as the other raw pointer implementations.

There is no difference - both when offsetting *mut T and NonNull<T>, the
offset is only well defined if we remain inside the same object or
one-past the end, and we can never land in a null pointer while obeying
those rules.
@bluss
Copy link
Member

bluss commented Sep 8, 2019

I've pushed two commits to the PR

One adds dep on rawpointer 0.2 which has added an .offset() method for NonNull, which makes our code a lot simpler. (See commit log for more info).
I thought adding this method there made sense, and it was already a transitive dep through matrixmultiply — although that's the 0.1 version, but matrixmultiply will update too.

We could also copy the offset method into this crate, to reduce deps.

Using .offset() simplifies a lot, and I factored out the two other way nonnull pointers were created.

I hope this is ok. You've enabled maintainer pushes to the PR, and that's the reason I could update your branch.

This was referenced Sep 8, 2019
Bring in the latest changes and resolve conflicts (impl_views split)
@bluss
Copy link
Member

bluss commented Sep 8, 2019

I'm sorry for the mess. I've merged master and resolved conflicts, since another PR made this one conflict. (We could also rebase - please rebase and squash as needed if you want — I thought it was awkward to do that in a two-author branch).

@bluss
Copy link
Member

bluss commented Sep 8, 2019

I think this is ready for merge, but it could be rebase/squashed if you want @berquist

@berquist
Copy link
Contributor Author

berquist commented Sep 9, 2019

Whatever you would prefer.

@jturner314 jturner314 merged commit b98c844 into rust-ndarray:master Sep 9, 2019
@jturner314
Copy link
Member

Great work on this! There are some places where it would be nice to replace more raw pointers with NonNull (e.g. in the iterators), but this PR is a definite improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change ArrayBase.ptr to NonNull type
3 participants