diff --git a/doc/specs/stdlib_selection.md b/doc/specs/stdlib_selection.md new file mode 100644 index 000000000..b6694047c --- /dev/null +++ b/doc/specs/stdlib_selection.md @@ -0,0 +1,350 @@ +--- +title: Selection Procedures +--- + +# The `stdlib_selection` module + +[TOC] + +## Overview of selection + +Suppose you wish to find the value of the k-th smallest entry in an array of size N, or +the index of that value. While it could be done by sorting the whole array +using `[[stdlib_sorting(module):sort(interface)]]` or +`[[stdlib_sorting(module):sort_index(interface)]]` from +`[[stdlib_sorting(module)]]` and then finding the k-th entry, that would +require O(N x LOG(N)) time. However selection of a single entry can be done in +O(N) time, which is much faster for large arrays. This is useful, for example, +to quickly find the median of an array, or some other percentile. + +The Fortran Standard Library therefore provides a module, `stdlib_selection`, +which implements selection algorithms. + +## Overview of the module + +The module `stdlib_selection` defines two generic subroutines: +* `select` is used to find the k-th smallest entry of an array. The input +array is also modified in-place, and on return will be partially sorted +such that `all(array(1:k) <= array(k)))` and `all(array(k) <= array((k+1):size(array)))` is true. +The user can optionally specify `left` and `right` indices to constrain the search +for the k-th smallest value. This can be useful if you have previously called `select` +to find a smaller or larger rank (that will have led to partial sorting of +`array`, thus implying some constraints on the location). + +* `arg_select` is used to find the index of the k-th smallest entry of an array. +In this case the input array is not modified, but the user must provide an +input index array with the same size as `array`, having indices that are a permutation of +`1:size(array)`, which is modified instead. On return the index array is modified +such that `all(array(index(1:k)) <= array(index(k)))` and `all(array(k) <= array(k+1:size(array)))`. +The user can optionally specify `left` and `right` indices to constrain the search +for the k-th smallest value. This can be useful if you have previously called `arg_select` +to find a smaller or larger rank (that will have led to partial sorting of +`index`, thus implying some constraints on the location). + + +## `select` - find the k-th smallest value in an input array + +### Status + +Experimental + +### Description + +Returns the k-th smallest value of `array(:)`, and also partially sorts `array(:)` +such that `all(array(1:k) <= array(k))` and `all(array(k) <= array((k+1):size(array)))` + +### Syntax + +`call [[stdlib_selection(module):select(interface)]]( array, k, kth_smallest [, left, right ] )` + +### Class + +Generic subroutine. + +### Arguments + +`array` : shall be a rank one array of any of the types: +`integer(int8)`, `integer(int16)`, `integer(int32)`, `integer(int64)`, +`real(sp)`, `real(dp)`, `real(xdp)`, `real(qp)`. It is an `intent(inout)` argument. + +`k`: shall be a scalar with any of the types: +`integer(int8)`, `integer(int16)`, `integer(int32)`, `integer(int64)`. It +is an `intent(in)` argument. We search for the `k`-th smallest entry of `array(:)`. + +`kth_smallest`: shall be a scalar with the same type as `array`. It is an +`intent(out)` argument. On return it contains the k-th smallest entry of +`array(:)`. + +`left` (optional): shall be a scalar with the same type as `k`. It is an +`intent(in)` argument. If specified then we assume the k-th smallest value is +definitely contained in `array(left:size(array))`. If `left` is not present, +the default is 1. This is typically useful if multiple calls to `select` are +made, because the partial sorting of `array` implies constraints on where we +need to search. + +`right` (optional): shall be a scalar with the same type as `k`. It is an +`intent(in)` argument. If specified then we assume the k-th smallest value is +definitely contained in `array(1:right)`. If `right` is not present, the +default is `size(array)`. This is typically useful if multiple calls to +`select` are made, because the partial sorting of `array` implies constraints +on where we need to search. + +### Notes + +Selection of a single value should have runtime of O(`size(array)`), so it is +asymptotically faster than sorting `array` entirely. The test program at the +end of this document shows that is the case. + +The code does not support `NaN` elements in `array`; it will run, but there is +no consistent interpretation given to the order of `NaN` entries of `array` +compared to other entries. + +`select` was derived from code in the Coretran library by Leon Foks, +https://github.com/leonfoks/coretran. Leon Foks has given permission for the +code here to be released under stdlib's MIT license. + +### Example + +```fortran +program demo_select + use stdlib_selection, only: select + implicit none + + real, allocatable :: array(:) + real :: kth_smallest + integer :: k, left, right + + array = [3., 2., 7., 4., 5., 1., 4., -1.] + + k = 2 + call select(array, k, kth_smallest) + print*, kth_smallest ! print 1.0 + + k = 7 + ! Due to the previous call to select, we know for sure this is in an + ! index >= 2 + call select(array, k, kth_smallest, left=2) + print*, kth_smallest ! print 5.0 + + k = 6 + ! Due to the previous two calls to select, we know for sure this is in + ! an index >= 2 and <= 7 + call select(array, k, kth_smallest, left=2, right=7) + print*, kth_smallest ! print 4.0 + +end program demo_select +``` + +## `arg_select` - find the index of the k-th smallest value in an input array + +### Status + +Experimental + +### Description + +Returns the index of the k-th smallest value of `array(:)`, and also partially sorts +the index-array `indx(:)` such that `all(array(indx(1:k)) <= array(indx(k)))` and +`all(array(indx(k)) <= array(indx((k+1):size(array))))` + +### Syntax + +`call [[stdlib_selection(module):arg_select(interface)]]( array, indx, k, kth_smallest [, left, right ] )` + +### Class + +Generic subroutine. + +### Arguments + +`array` : shall be a rank one array of any of the types: +`integer(int8)`, `integer(int16)`, `integer(int32)`, `integer(int64)`, +`real(sp)`, `real(dp)`, `real(xdp), `real(qp)`. It is an `intent(in)` argument. On input it is +the array in which we search for the k-th smallest entry. + +`indx`: shall be a rank one array with the same size as `array`, containing all integers +from `1:size(array)` in any order. It is of any of the types: +`integer(int8)`, `integer(int16)`, `integer(int32)`, `integer(int64)`. It is an +`intent(inout)` argument. On return its elements will define a partial sorting of `array(:)` such that: + `all( array(indx(1:k-1)) <= array(indx(k)) )` and `all(array(indx(k)) <= array(indx(k+1:size(array))))`. + +`k`: shall be a scalar with the same type as `indx`. It is an `intent(in)` +argument. We search for the `k`-th smallest entry of `array(:)`. + +`kth_smallest`: a scalar with the same type as `indx`. It is an `intent(out)` argument, +and on return it contains the index of the k-th smallest entry of `array(:)`. + +`left` (optional): shall be a scalar with the same type as `k`. It is an `intent(in)` +argument. If specified then we assume the k-th smallest value is definitely contained +in `array(indx(left:size(array)))`. If `left` is not present, the default is 1. +This is typically useful if multiple calls to `arg_select` are made, because +the partial sorting of `indx` implies constraints on where we need to search. + +`right` (optional): shall be a scalar with the same type as `k`. It is an `intent(in)` +argument. If specified then we assume the k-th smallest value is definitely contained +in `array(indx(1:right))`. If `right` is not present, the default is +`size(array)`. This is typically useful if multiple calls to `arg_select` are +made, because the reordering of `indx` implies constraints on where we need to +search. + +### Notes + +`arg_select` does not modify `array`, unlike `select`. + +The partial sorting of `indx` is not stable, i.e., indices that map to equal +values of array may be reordered. + +The code does not support `NaN` elements in `array`; it will run, but there is +no consistent interpretation given to the order of `NaN` entries of `array` +compared to other entries. + +While it is essential that that `indx` contains a permutation of the integers `1:size(array)`, +the code does not check for this. For example if `size(array) == 4`, then we could have +`indx = [4, 2, 1, 3]` or `indx = [1, 2, 3, 4]`, but not `indx = [2, 1, 2, 4]`. It is the user's +responsibility to avoid such errors. + +Selection of a single value should have runtime of O(`size(array)`), so it is +asymptotically faster than sorting `array` entirely. The test program at the end of +these documents confirms that is the case. + +`arg_select` was derived using code from the Coretran library by Leon Foks, +https://github.com/leonfoks/coretran. Leon Foks has given permission for the +code here to be released under stdlib's MIT license. + +### Example + + +```fortran +program demo_arg_select + use stdlib_selection, only: arg_select + implicit none + + real, allocatable :: array(:) + integer, allocatable :: indx(:) + integer :: kth_smallest + integer :: k, left, right + + array = [3., 2., 7., 4., 5., 1., 4., -1.] + indx = [( k, k = 1, size(array) )] + + k = 2 + call arg_select(array, indx, k, kth_smallest) + print*, array(kth_smallest) ! print 1.0 + + k = 7 + ! Due to the previous call to arg_select, we know for sure this is in an + ! index >= 2 + call arg_select(array, indx, k, kth_smallest, left=2) + print*, array(kth_smallest) ! print 5.0 + + k = 6 + ! Due to the previous two calls to arg_select, we know for sure this is in + ! an index >= 2 and <= 7 + call arg_select(array, indx, k, kth_smallest, left=2, right=7) + print*, array(kth_smallest) ! print 4.0 + +end program demo_arg_select +``` + +## Comparison with using `sort` + +The following program compares the timings of `select` and `arg_select` for +computing the median of an array, vs using `sort` from `stdlib`. In theory we +should see a speed improvement with the selection routines which grows like +LOG(size(`array`)). + +```fortran +program selection_vs_sort + use stdlib_kinds, only: dp, sp, int64 + use stdlib_selection, only: select, arg_select + use stdlib_sorting, only: sort + implicit none + + call compare_select_sort_for_median(1) + call compare_select_sort_for_median(11) + call compare_select_sort_for_median(101) + call compare_select_sort_for_median(1001) + call compare_select_sort_for_median(10001) + call compare_select_sort_for_median(100001) + + contains + subroutine compare_select_sort_for_median(N) + integer, intent(in) :: N + + integer :: i, k, result_arg_select, indx(N), indx_local(N) + real :: random_vals(N), local_random_vals(N) + integer, parameter :: test_reps = 100 + integer(int64) :: t0, t1 + real :: result_sort, result_select + integer(int64) :: time_sort, time_select, time_arg_select + logical :: select_test_passed, arg_select_test_passed + + ! Ensure N is odd + if(mod(N, 2) /= 1) stop + + time_sort = 0 + time_select = 0 + time_arg_select = 0 + + select_test_passed = .true. + arg_select_test_passed = .true. + + indx = (/( i, i = 1, N) /) + + k = (N+1)/2 ! Deliberate integer division + + do i = 1, test_reps + call random_number(random_vals) + + ! Compute the median with sorting + local_random_vals = random_vals + call system_clock(t0) + call sort(local_random_vals) + result_sort = local_random_vals(k) + call system_clock(t1) + time_sort = time_sort + (t1 - t0) + + ! Compute the median with selection, assuming N is odd + local_random_vals = random_vals + call system_clock(t0) + call select(local_random_vals, k, result_select) + call system_clock(t1) + time_select = time_select + (t1 - t0) + + ! Compute the median with arg_select, assuming N is odd + local_random_vals = random_vals + indx_local = indx + call system_clock(t0) + call arg_select(local_random_vals, indx_local, k, result_arg_select) + call system_clock(t1) + time_arg_select = time_arg_select + (t1 - t0) + + if(result_select /= result_sort) select_test_passed = .FALSE. + if(local_random_vals(result_arg_select) /= result_sort) arg_select_test_passed = .FALSE. + end do + + print*, "select ; N=", N, '; ', merge('PASS', 'FAIL', select_test_passed), & + '; Relative-speedup-vs-sort:', (1.0*time_sort)/(1.0*time_select) + print*, "arg_select; N=", N, '; ', merge('PASS', 'FAIL', arg_select_test_passed), & + '; Relative-speedup-vs-sort:', (1.0*time_sort)/(1.0*time_arg_select) + + end subroutine + +end program +``` + +The results seem consistent with expectations when the `array` is large; the program prints: +``` + select ; N= 1 ; PASS; Relative-speedup-vs-sort: 1.90928173 + arg_select; N= 1 ; PASS; Relative-speedup-vs-sort: 1.76875830 + select ; N= 11 ; PASS; Relative-speedup-vs-sort: 1.14835048 + arg_select; N= 11 ; PASS; Relative-speedup-vs-sort: 1.00794709 + select ; N= 101 ; PASS; Relative-speedup-vs-sort: 2.31012774 + arg_select; N= 101 ; PASS; Relative-speedup-vs-sort: 1.92877376 + select ; N= 1001 ; PASS; Relative-speedup-vs-sort: 4.24190664 + arg_select; N= 1001 ; PASS; Relative-speedup-vs-sort: 3.54580402 + select ; N= 10001 ; PASS; Relative-speedup-vs-sort: 5.61573362 + arg_select; N= 10001 ; PASS; Relative-speedup-vs-sort: 4.79348087 + select ; N= 100001 ; PASS; Relative-speedup-vs-sort: 7.28823519 + arg_select; N= 100001 ; PASS; Relative-speedup-vs-sort: 6.03007460 +``` diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index bcb4931b3..17e81a0a7 100644 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -12,6 +12,7 @@ set(fppFiles stdlib_linalg_diag.fypp stdlib_linalg_outer_product.fypp stdlib_optval.fypp + stdlib_selection.fypp stdlib_sorting.fypp stdlib_sorting_ord_sort.fypp stdlib_sorting_sort.fypp diff --git a/src/Makefile.manual b/src/Makefile.manual index f573fa6cf..d03a9072c 100644 --- a/src/Makefile.manual +++ b/src/Makefile.manual @@ -13,6 +13,7 @@ SRCFYPP = \ stdlib_quadrature.fypp \ stdlib_quadrature_trapz.fypp \ stdlib_quadrature_simps.fypp \ + stdlib_selection.fypp \ stdlib_random.fypp \ stdlib_sorting.fypp \ stdlib_sorting_ord_sort.fypp \ @@ -104,6 +105,8 @@ stdlib_quadrature_trapz.o: \ stdlib_quadrature.o \ stdlib_error.o \ stdlib_kinds.o +stdlib_selection.o: \ + stdlib_kinds.o stdlib_sorting.o: \ stdlib_kinds.o \ stdlib_string_type.o diff --git a/src/stdlib_selection.fypp b/src/stdlib_selection.fypp new file mode 100644 index 000000000..4a963c5b8 --- /dev/null +++ b/src/stdlib_selection.fypp @@ -0,0 +1,276 @@ +#:include "common.fypp" +! Specify kinds/types for the input array in select and arg_select +#:set ARRAY_KINDS_TYPES = INT_KINDS_TYPES + REAL_KINDS_TYPES +! The index arrays are of all INT_KINDS_TYPES + +module stdlib_selection +! +! This code was modified from the "Coretran" implementation "quickSelect" by +! Leon Foks, https://github.com/leonfoks/coretran/tree/master/src/sorting +! +! Leon Foks gave permission to release this code under stdlib's MIT license. +! (https://github.com/fortran-lang/stdlib/pull/500#commitcomment-57418593) +! + +use stdlib_kinds + +implicit none + +private + +public :: select, arg_select + +interface select + !! version: experimental + !! ([Specification](..//page/specs/stdlib_selection.html#select-find-the-k-th-smallest-value-in-an-input-array)) + + #:for arraykind, arraytype in ARRAY_KINDS_TYPES + #:for intkind, inttype in INT_KINDS_TYPES + #:set name = rname("select", 1, arraytype, arraykind, intkind) + module procedure ${name}$ + #:endfor + #:endfor +end interface + +interface arg_select + !! version: experimental + !! ([Specification](..//page/specs/stdlib_selection.html#arg_select-find-the-index-of-the-k-th-smallest-value-in-an-input-array)) + #:for arraykind, arraytype in ARRAY_KINDS_TYPES + #:for intkind, inttype in INT_KINDS_TYPES + #:set name = rname("arg_select", 1, arraytype, arraykind, intkind) + module procedure ${name}$ + #:endfor + #:endfor +end interface + +contains + + #:for arraykind, arraytype in ARRAY_KINDS_TYPES + #:for intkind, inttype in INT_KINDS_TYPES + #:set name = rname("select", 1, arraytype, arraykind, intkind) + subroutine ${name}$(a, k, kth_smallest, left, right) + !! select - select the k-th smallest entry in a(:). + !! + !! Partly derived from the "Coretran" implementation of + !! quickSelect by Leon Foks, https://github.com/leonfoks/coretran + !! + ${arraytype}$, intent(inout) :: a(:) + !! Array in which we seek the k-th smallest entry. + !! On output it will be partially sorted such that + !! `all(a(1:(k-1)) <= a(k)) .and. all(a(k) <= a((k+1):size(a)))` + !! is true. + ${inttype}$, intent(in) :: k + !! We want the k-th smallest entry. E.G. `k=1` leads to + !! `kth_smallest=min(a)`, and `k=size(a)` leads to + !! `kth_smallest=max(a)` + ${arraytype}$, intent(out) :: kth_smallest + !! On output contains the k-th smallest value of `a(:)` + ${inttype}$, intent(in), optional :: left, right + !! If we know that: + !! the k-th smallest entry of `a` is in `a(left:right)` + !! and also that: + !! `maxval(a(1:(left-1))) <= minval(a(left:right))` + !! and: + !! `maxval(a(left:right))) <= minval(a((right+1):size(a)))` + !! then one or both bounds can be specified to narrow the search. + !! The constraints are available if we have previously called the + !! subroutine with different `k` (because of how `a(:)` becomes + !! partially sorted, see documentation for `a(:)`). + + ${inttype}$ :: l, r, mid, iPivot + integer, parameter :: ip = ${intkind}$ + + l = 1_ip + if(present(left)) l = left + r = size(a, kind=ip) + if(present(right)) r = right + + if(k < 1_ip .or. k > size(a, kind=ip) .or. l > r .or. l < 1_ip .or. & + r > size(a, kind=ip)) then + error stop "select must have 1 <= k <= size(a), and 1 <= left <= right <= size(a)"; + end if + + searchk: do + mid = l + ((r-l)/2_ip) ! Avoid (l+r)/2 which can cause overflow + + call medianOf3(a, l, mid, r) + call swap(a(l), a(mid)) + call partition(a, l, r, iPivot) + + if (iPivot < k) then + l = iPivot + 1_ip + elseif (iPivot > k) then + r = iPivot - 1_ip + elseif (iPivot == k) then + kth_smallest = a(k) + return + end if + end do searchk + + contains + pure subroutine swap(a, b) + ${arraytype}$, intent(inout) :: a, b + ${arraytype}$ :: tmp + tmp = a; a = b; b = tmp + end subroutine + + pure subroutine medianOf3(a, left, mid, right) + ${arraytype}$, intent(inout) :: a(:) + ${inttype}$, intent(in) :: left, mid, right + if(a(right) < a(left)) call swap(a(right), a(left)) + if(a(mid) < a(left)) call swap(a(mid) , a(left)) + if(a(right) < a(mid) ) call swap(a(mid) , a(right)) + end subroutine + + pure subroutine partition(array,left,right,iPivot) + ${arraytype}$, intent(inout) :: array(:) + ${inttype}$, intent(in) :: left, right + ${inttype}$, intent(out) :: iPivot + ${inttype}$ :: lo,hi + ${arraytype}$ :: pivot + + pivot = array(left) + lo = left + hi=right + do while (lo <= hi) + do while (array(hi) > pivot) + hi=hi-1_ip + end do + inner_lohi: do while (lo <= hi ) + if(array(lo) > pivot) exit inner_lohi + lo=lo+1_ip + end do inner_lohi + if (lo <= hi) then + call swap(array(lo),array(hi)) + lo=lo+1_ip + hi=hi-1_ip + end if + end do + call swap(array(left),array(hi)) + iPivot=hi + end subroutine + end subroutine + #:endfor + #:endfor + + + #:for arraykind, arraytype in ARRAY_KINDS_TYPES + #:for intkind, inttype in INT_KINDS_TYPES + #:set name = rname("arg_select", 1, arraytype, arraykind, intkind) + subroutine ${name}$(a, indx, k, kth_smallest, left, right) + !! arg_select - find the index of the k-th smallest entry in `a(:)` + !! + !! Partly derived from the "Coretran" implementation of + !! quickSelect by Leon Foks, https://github.com/leonfoks/coretran + !! + ${arraytype}$, intent(in) :: a(:) + !! Array in which we seek the k-th smallest entry. + ${inttype}$, intent(inout) :: indx(:) + !! Array of indices into `a(:)`. Must contain each integer + !! from `1:size(a)` exactly once. On output it will be partially + !! sorted such that + !! `all( a(indx(1:(k-1)))) <= a(indx(k)) ) .AND. + !! all( a(indx(k)) <= a(indx( (k+1):size(a) )) )`. + ${inttype}$, intent(in) :: k + !! We want index of the k-th smallest entry. E.G. `k=1` leads to + !! `a(kth_smallest) = min(a)`, and `k=size(a)` leads to + !! `a(kth_smallest) = max(a)` + ${inttype}$, intent(out) :: kth_smallest + !! On output contains the index with the k-th smallest value of `a(:)` + ${inttype}$, intent(in), optional :: left, right + !! If we know that: + !! the k-th smallest entry of `a` is in `a(indx(left:right))` + !! and also that: + !! `maxval(a(indx(1:(left-1)))) <= minval(a(indx(left:right)))` + !! and: + !! `maxval(a(indx(left:right))) <= minval(a(indx((right+1):size(a))))` + !! then one or both bounds can be specified to reduce the search + !! time. These constraints are available if we have previously + !! called the subroutine with a different `k` (due to the way that + !! `indx(:)` becomes partially sorted, see documentation for `indx(:)`). + + ${inttype}$ :: l, r, mid, iPivot + integer, parameter :: ip = ${intkind}$ + + l = 1_ip + if(present(left)) l = left + r = size(a, kind=ip) + if(present(right)) r = right + + if(size(a) /= size(indx)) then + error stop "arg_select must have size(a) == size(indx)" + end if + + if(k < 1_ip .or. k > size(a, kind=ip) .or. l > r .or. l < 1_ip .or. & + r > size(a, kind=ip)) then + error stop "arg_select must have 1 <= k <= size(a), and 1 <= left <= right <= size(a)"; + end if + + searchk: do + mid = l + ((r-l)/2_ip) ! Avoid (l+r)/2 which can cause overflow + + call arg_medianOf3(a, indx, l, mid, r) + call swap(indx(l), indx(mid)) + call arg_partition(a, indx, l, r, iPivot) + + if (iPivot < k) then + l = iPivot + 1_ip + elseif (iPivot > k) then + r = iPivot - 1_ip + elseif (iPivot == k) then + kth_smallest = indx(k) + return + end if + end do searchk + + contains + pure subroutine swap(a, b) + ${inttype}$, intent(inout) :: a, b + ${inttype}$ :: tmp + tmp = a; a = b; b = tmp + end subroutine + + pure subroutine arg_medianOf3(a, indx, left, mid, right) + ${arraytype}$, intent(in) :: a(:) + ${inttype}$, intent(inout) :: indx(:) + ${inttype}$, intent(in) :: left, mid, right + if(a(indx(right)) < a(indx(left))) call swap(indx(right), indx(left)) + if(a(indx(mid)) < a(indx(left))) call swap(indx(mid) , indx(left)) + if(a(indx(right)) < a(indx(mid)) ) call swap(indx(mid) , indx(right)) + end subroutine + + pure subroutine arg_partition(array, indx, left,right,iPivot) + ${arraytype}$, intent(in) :: array(:) + ${inttype}$, intent(inout) :: indx(:) + ${inttype}$, intent(in) :: left, right + ${inttype}$, intent(out) :: iPivot + ${inttype}$ :: lo,hi + ${arraytype}$ :: pivot + + pivot = array(indx(left)) + lo = left + hi = right + do while (lo <= hi) + do while (array(indx(hi)) > pivot) + hi=hi-1_ip + end do + inner_lohi: do while (lo <= hi ) + if(array(indx(lo)) > pivot) exit inner_lohi + lo=lo+1_ip + end do inner_lohi + if (lo <= hi) then + call swap(indx(lo),indx(hi)) + lo=lo+1_ip + hi=hi-1_ip + end if + end do + call swap(indx(left),indx(hi)) + iPivot=hi + end subroutine + end subroutine + #:endfor + #:endfor + +end module + + diff --git a/src/tests/CMakeLists.txt b/src/tests/CMakeLists.txt index 290980533..4e40b4f1b 100644 --- a/src/tests/CMakeLists.txt +++ b/src/tests/CMakeLists.txt @@ -21,6 +21,7 @@ add_subdirectory(io) add_subdirectory(linalg) add_subdirectory(logger) add_subdirectory(optval) +add_subdirectory(selection) add_subdirectory(sorting) add_subdirectory(stats) add_subdirectory(string) diff --git a/src/tests/Makefile.manual b/src/tests/Makefile.manual index d3594201e..83d93c992 100644 --- a/src/tests/Makefile.manual +++ b/src/tests/Makefile.manual @@ -17,6 +17,7 @@ all test clean:: $(MAKE) -f Makefile.manual --directory=io $@ $(MAKE) -f Makefile.manual --directory=logger $@ $(MAKE) -f Makefile.manual --directory=optval $@ + $(MAKE) -f Makefile.manual --directory=selection $@ $(MAKE) -f Makefile.manual --directory=sorting $@ $(MAKE) -f Makefile.manual --directory=quadrature $@ $(MAKE) -f Makefile.manual --directory=stats $@ diff --git a/src/tests/selection/CMakeLists.txt b/src/tests/selection/CMakeLists.txt new file mode 100644 index 000000000..f3bb19811 --- /dev/null +++ b/src/tests/selection/CMakeLists.txt @@ -0,0 +1,11 @@ +### Pre-process: .fpp -> .f90 via Fypp + +# Create a list of the files to be preprocessed +set( + fppFiles + test_selection.fypp +) + +fypp_f90("${fyppFlags}" "${fppFiles}" outFiles) + +ADDTEST(selection) diff --git a/src/tests/selection/Makefile.manual b/src/tests/selection/Makefile.manual new file mode 100644 index 000000000..94a94037c --- /dev/null +++ b/src/tests/selection/Makefile.manual @@ -0,0 +1,10 @@ +SRCFYPP = test_selection.fypp + +SRCGEN = $(SRCFYPP:.fypp=.f90) + +$(SRCGEN): %.f90: %.fypp ../../common.fypp + fypp -I../.. $(FYPPFLAGS) $< $@ + +PROGS_SRC = $(SRCGEN) + +include ../Makefile.manual.test.mk diff --git a/src/tests/selection/test_selection.fypp b/src/tests/selection/test_selection.fypp new file mode 100644 index 000000000..5c3932386 --- /dev/null +++ b/src/tests/selection/test_selection.fypp @@ -0,0 +1,469 @@ +#:include "common.fypp" +! Specify kinds/types for the input array in select and arg_select +#:set ARRAY_KINDS_TYPES = INT_KINDS_TYPES + REAL_KINDS_TYPES +! The index arrays are of all INT_KINDS_TYPES + + +module test_selection + + use stdlib_kinds + use stdlib_selection, only: select, arg_select + use testdrive, only: new_unittest, unittest_type, error_type, check + + implicit none + + private + public :: collect_selection + +contains + + !> Collect all exported unit tests + subroutine collect_selection(testsuite) + !> Collection of tests + type(unittest_type), allocatable, intent(out) :: testsuite(:) + + testsuite = [ & + new_unittest("test_select_1_iint8_int8", test_select_1_iint8_int8) & + #:for arraykind, arraytype in ARRAY_KINDS_TYPES + #:for intkind, inttype in INT_KINDS_TYPES + #:set name = rname("test_select", 1, arraytype, arraykind, intkind) + , new_unittest("${name}$", ${name}$) & + #:endfor + #:endfor + + #:for arraykind, arraytype in ARRAY_KINDS_TYPES + #:for intkind, inttype in INT_KINDS_TYPES + #:set name = rname("test_arg_select", 1, arraytype, arraykind, intkind) + , new_unittest("${name}$", ${name}$) & + #:endfor + #:endfor + ] + + end subroutine collect_selection + + #:for arraykind, arraytype in ARRAY_KINDS_TYPES + #:for intkind, inttype in INT_KINDS_TYPES + #:set name = rname("test_select", 1, arraytype, arraykind, intkind) + subroutine ${name}$(error) + type(error_type), allocatable, intent(out) :: error + + integer, parameter :: ip = ${intkind}$ + ${inttype}$, parameter :: N = 10, Nm = 8 + ${inttype}$, parameter :: near_huge = HUGE(N) - 1_ip ! Segfaults without the -1_ip + ${inttype}$, parameter :: Nreps = 2 ! Number of repetitions of random sampling + ${inttype}$, parameter :: Nr = 25_ip ! Size of random array, must be < HUGE(N) + + ${arraytype}$ :: x(N), x_copy(N), mat(Nm), mat_copy(Nm), len1(1), len2(2), & + kth_smallest, random_vals(Nr), one = 1 + ${inttype}$ :: i, p, up_rank, down_rank, mid_rank + real(dp) :: random_doubles(Nr) ! Deliberately double precision for all cases + logical :: test1, test2, test3 + ${arraytype}$, allocatable :: long_array(:) + + ! x contains the numbers 1**2, 2**2, .... 10**2, with mixed-up order + x = (/( i**2, i=1, size(x, kind=ip) )/) + x(5:2:-1) = x(2:5) + x(10:8:-1) = x(8:10) + + ! Check that the ith-ranked entry of x really is i**2 + do i = 1, size(x, kind=ip) + x_copy = x + call select(x_copy, i, kth_smallest) + call check( error, (kth_smallest == i**2), " ${name}$: kth smallest entry should be i**2") + if(allocated(error)) return + end do + + ! Check that it works when we specify "left" and know that the array + ! is partially sorted due to previous calls to quickselect + x_copy = x + do i = 1, size(x, kind=ip), 1 + call select(x_copy, i, kth_smallest, left=i) + call check( error, (kth_smallest == i**2), " ${name}$: kth smallest entry with left specified") + if(allocated(error)) return + end do + + ! Check that it works when we specify "right" and know that the array + ! is partially sorted due to previous calls to quickselect + x_copy = x + do i = size(x, kind=ip), 1, -1 + call select(x_copy, i, kth_smallest, right=i) + call check( error, (kth_smallest == i**2), " ${name}$: kth smallest entry with right specified") + if(allocated(error)) return + end do + + ! The test below can catch overflow in naive calculation of the middle index, like discussed here: + ! https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html + ! But don't do it if near_huge is large, to avoid allocating a big array and slowing the tests + if(near_huge < 200) then + allocate(long_array(near_huge)) + long_array = 0 * one + long_array(1:3) = one + call select(long_array, near_huge - 2_ip, kth_smallest) + call check( error, (kth_smallest == one), " ${name}$: designed to catch overflow in middle index") + if(allocated(error)) return + deallocate(long_array) + end if + + ! Simple tests + mat = one * [3, 2, 7, 4, 5, 1, 4, -1] + mat_copy = mat + call select(mat_copy, 1_ip, kth_smallest) + call check(error, kth_smallest == -1, " ${name}$: mat test 1") + if(allocated(error)) return + mat_copy = mat + call select(mat_copy, 2_ip, kth_smallest) + call check(error, kth_smallest == 1, " ${name}$: mat test 2") + if(allocated(error)) return + mat_copy = mat + call select(mat_copy, size(mat, kind=ip)+1_ip-4_ip, kth_smallest) + call check(error, kth_smallest == 4, " ${name}$: mat test 3") + if(allocated(error)) return + mat_copy = mat + call select(mat_copy, 5_ip, kth_smallest) + call check(error, kth_smallest == 4, " ${name}$: mat test 4") + if(allocated(error)) return + mat_copy = mat + call select(mat_copy, 6_ip, kth_smallest) + call check(error, kth_smallest == 4, " ${name}$: mat test 5") + if(allocated(error)) return + mat_copy = mat + call select(mat_copy, 7_ip, kth_smallest) + call check(error, kth_smallest == 5, " ${name}$: mat test 6") + if(allocated(error)) return + + ! Check it works for size(a) == 1 + len1(1) = -1 * one + call select(len1, 1_ip, kth_smallest) + call check(error, kth_smallest == -1, " ${name}$: array with size 1") + if(allocated(error)) return + + ! Check it works for size(a) == 2 + len2 = [-3, -5]*one + call select(len2, 2_ip, kth_smallest) + call check(error, kth_smallest == -3, " ${name}$: array with size 2, test 1") + if(allocated(error)) return + len2 = [-3, -5]*one + call select(len2, 1_ip, kth_smallest) + call check(error, kth_smallest == -5, " ${name}$: array with size 2, test 2") + if(allocated(error)) return + len2 = [-1, -1]*one + call select(len2, 1_ip, kth_smallest) + call check(error, kth_smallest == -1, " ${name}$: array with size 2, test 3") + if(allocated(error)) return + len2 = [-1, -1]*one + call select(len2, 2_ip, kth_smallest) + call check(error, kth_smallest == -1, " ${name}$: array with size 2, test 4") + if(allocated(error)) return + + ! + ! Test using random data + ! + ! Search for the p-th smallest rank, for all these p + ! (avoid end-points to enable constrained search tests) + do p = 3, Nr-2 + + ! Repeat for different random samples to try to expose any errors + do i = 1, Nreps + + ! Make random numbers of the correct type + call random_number(random_doubles) + random_vals = random_doubles * Nr + + call select(random_vals, p, kth_smallest) + + test1 = kth_smallest == random_vals(p) + test2 = all(random_vals(1:(p-1)) <= random_vals(p)) + test3 = all(random_vals(p) <= & + random_vals((p+1):size(random_vals, kind=ip))) + call check(error, (test1 .and. test2 .and. test3), "${name}$: random data regular select") + if(allocated(error)) return + + ! Constrained search above 'p', providing 'left' + up_rank = p + (Nr-p)/2_ip ! Deliberate integer division + call select(random_vals, up_rank, kth_smallest, left=p) + + test1 = kth_smallest == random_vals(up_rank) + test2 = all(random_vals(1:(up_rank-1)) <= random_vals(up_rank)) + test3 = all(random_vals(up_rank) <= & + random_vals((up_rank+1):size(random_vals, kind=ip))) + call check(error, (test1 .and. test2 .and. test3), "${name}$: random data left-constrained select") + if(allocated(error)) return + + ! Constrained search below p, providing 'right' + down_rank = p - (p/2_ip) + call select(random_vals, down_rank, kth_smallest, right=p) + + test1 = kth_smallest == random_vals(down_rank) + test2 = all(random_vals(1:(down_rank-1)) <= & + random_vals(down_rank)) + test3 = all(random_vals(down_rank) <= & + random_vals((down_rank+1):size(random_vals, kind=ip))) + call check(error, (test1 .and. test2 .and. test3), "${name}$: random data right-constrained select") + if(allocated(error)) return + + ! Constrained search between up-ind and down-ind, proving left + ! and right. Make 'mid_rank' either above or below p + mid_rank = p - p/3_ip*mod(i,2_ip) + (Nr-p)/3_ip*(1_ip-mod(i,2_ip)) + call select(random_vals, mid_rank, kth_smallest, & + left=down_rank, right=up_rank) + + test1 = kth_smallest == random_vals(mid_rank) + test2 = all(random_vals(1:(mid_rank-1)) <= & + random_vals(mid_rank)) + test3 = all(random_vals(mid_rank) <= & + random_vals((mid_rank+1):size(random_vals, kind=ip))) + call check(error, (test1 .and. test2 .and. test3), "${name}$: random data left-right-constrained select") + if(allocated(error)) return + + end do + end do + + end subroutine + #:endfor + #:endfor + + + #:for arraykind, arraytype in ARRAY_KINDS_TYPES + #:for intkind, inttype in INT_KINDS_TYPES + #:set name = rname("test_arg_select", 1, arraytype, arraykind, intkind) + subroutine ${name}$(error) + type(error_type), allocatable, intent(out) :: error + + integer, parameter :: ip = ${intkind}$ + ${inttype}$, parameter :: N = 10, Nm = 8 + ${inttype}$, parameter :: near_huge = HUGE(N) - 1_ip ! Segfaults without the -1_ip + ${inttype}$, parameter :: Nreps = 2 ! Number of repetitions of random sampling + ${inttype}$, parameter :: Nr = 25_ip ! Size of random array, must be < HUGE(N) + + ${arraytype}$ :: x(N), mat(Nm), len1(1), len2(2), random_vals(Nr), one=1 + + integer(ip) :: indx(N), indx_copy(N), indx_mat(Nm), indx_mat_copy(Nm), & + indx_len1(1), indx_len2(2), indx_r(Nr) + real(dp) :: random_doubles(Nr) ! Deliberately double precision for all cases + integer(ip) :: i, j, p, up_rank, down_rank, mid_rank, kth_smallest + logical :: test1, test2, test3 + ${arraytype}$, allocatable :: long_array(:) + ${inttype}$, allocatable :: long_array_index(:) + + ! Make x contain 1**2, 2**2, .... 10**2, but mix up the order + x = (/( i**2, i=1, size(x, kind=ip) )/) + x(5:2:-1) = x(2:5) + x(10:8:-1) = x(8:10) + + indx = (/(i, i = 1, size(x, kind=ip))/) + + ! Check that the ith ranked entry of x equals i**2 + do i = 1, size(x, kind=ip) + indx_copy = indx + call arg_select(x, indx, i, kth_smallest) + call check(error, x(kth_smallest) == i**2, " ${name}$: kth smallest entry should be i**2") + if(allocated(error)) return + end do + + ! Check that it works when we specify "left" and know that the index + ! array is partially sorted due to previous calls to arg_select + indx_copy = indx + do i = 1, size(x, kind=ip), 1 + call arg_select(x, indx_copy, i, kth_smallest, left=i) + call check(error, (x(kth_smallest) == i**2), " ${name}$: kth smallest entry with left specified") + if(allocated(error)) return + end do + + ! Check that it works when we specify "right" and know that the index + ! array is partially sorted due to previous calls to arg_select + indx_copy = indx + do i = size(x, kind=ip), 1, -1 + call arg_select(x, indx_copy, i, kth_smallest, right=i) + call check(error, (x(kth_smallest) == i**2), " ${name}$: kth smallest entry with right specified") + if(allocated(error)) return + end do + + ! The test below would catch overflow in naive calculation of the middle index, like discussed here: + ! https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html + ! But don't do it if near_huge is large, to avoid allocating a big array and slowing the tests + if(near_huge < 200) then + allocate(long_array(near_huge)) + allocate(long_array_index(near_huge)) + long_array = 0 * one + long_array(1:3) = one + long_array_index = (/( i, i = 1_ip, size(long_array, kind=ip) )/) + call arg_select(long_array, long_array_index, near_huge - 2_ip, kth_smallest) + call check( error, (kth_smallest < 4), " ${name}$: designed to catch overflow in middle index") + if(allocated(error)) return + deallocate(long_array, long_array_index) + end if + + ! Simple tests + mat = one * [3, 2, 7, 4, 5, 1, 4, -1] + indx_mat = (/( i, i = 1, size(mat, kind=ip) )/) + + indx_mat_copy = indx_mat + call arg_select(mat, indx_mat_copy, 1_ip, kth_smallest) + call check(error, mat(kth_smallest) == -1, " ${name}$: mat test 1") + if(allocated(error)) return + + indx_mat_copy = indx_mat + call arg_select(mat, indx_mat_copy, 2_ip, kth_smallest) + call check(error, mat(kth_smallest) == 1, " ${name}$: mat test 2") + if(allocated(error)) return + + indx_mat_copy = indx_mat + call arg_select(mat, indx_mat_copy, size(mat, kind=ip)+1_ip-4_ip, & + kth_smallest) + call check(error, mat(kth_smallest) == 4, " ${name}$: mat test 3") + if(allocated(error)) return + + indx_mat_copy = indx_mat + call arg_select(mat, indx_mat_copy, 5_ip, kth_smallest) + call check(error, mat(kth_smallest) == 4, " ${name}$: mat test 4") + if(allocated(error)) return + + indx_mat_copy = indx_mat + call arg_select(mat, indx_mat_copy, 6_ip, kth_smallest) + call check(error, mat(kth_smallest) == 4, " ${name}$: mat test 5") + if(allocated(error)) return + + indx_mat_copy = indx_mat + call arg_select(mat, indx_mat_copy, 7_ip, kth_smallest) + call check(error, mat(kth_smallest) == 5, " ${name}$: mat test 6") + if(allocated(error)) return + + ! Check it works for size(a) == 1 + len1(1) = -1 * one + indx_len1(1) = 1 + call arg_select(len1, indx_len1, 1_ip, kth_smallest) + call check(error, len1(kth_smallest) == -1, " ${name}$: array with size 1") + if(allocated(error)) return + + ! Check it works for size(a) == 2 + len2 = [-3, -5] * one + indx_len2 = [1_ip, 2_ip] + call arg_select(len2, indx_len2, 2_ip, kth_smallest) + call check(error, len2(kth_smallest) == -3, " ${name}$: array with size 2, test 1") + if(allocated(error)) return + + len2 = [-3, -5] * one + indx_len2 = [1_ip, 2_ip] + call arg_select(len2, indx_len2, 1_ip, kth_smallest) + call check(error, len2(kth_smallest) == -5, " ${name}$: array with size 2, test 2") + if(allocated(error)) return + + len2 = [-1, -1] * one + indx_len2 = [1_ip, 2_ip] + call arg_select(len2, indx_len2, 1_ip, kth_smallest) + call check(error, len2(kth_smallest) == -1, " ${name}$: array with size 2, test 3") + if(allocated(error)) return + + len2 = [-1, -1] * one + indx_len2 = [1_ip, 2_ip] + call arg_select(len2, indx_len2, 2_ip, kth_smallest) + call check(error, len2(kth_smallest) == -1, " ${name}$: array with size 2, test 4") + if(allocated(error)) return + + ! + ! Test using random data + ! + ! Search for the p-th smallest, for all these p (avoid end-points to + ! enable additional tests using "left", "right" arguments) + do p = 3, Nr-2 + + ! Repeat for many random samples to try to expose any errors + do i = 1, Nreps + + ! Make random numbers of the correct type + call random_number(random_doubles) + random_vals = random_doubles * Nr + + indx_r = (/( j, j = 1, size(random_vals, kind=ip) )/) + + ! Standard arg_select + call arg_select(random_vals, indx_r, p, kth_smallest) + + test1 = random_vals(kth_smallest) == random_vals(indx_r(p)) + test2 = all(random_vals(indx_r(1:(p-1))) <= & + random_vals(indx_r(p))) + test3 = all(random_vals(indx_r(p)) <= & + random_vals(indx_r((p+1):size(random_vals, kind=ip)))) + call check(error, (test1 .and. test2 .and. test3), "${name}$: random data regular arg_select") + if(allocated(error)) return + + ! Constrained search for a rank above 'p', providing 'left' + up_rank = p + (Nr-p)/2_ip ! Deliberate integer division + call arg_select(random_vals, indx_r, up_rank, & + kth_smallest, left=p) + + test1 = random_vals(kth_smallest) == & + random_vals(indx_r(up_rank)) + test2 = all(random_vals(indx_r(1:(up_rank-1))) <= & + random_vals(indx_r(up_rank))) + test3 = all(random_vals(indx_r(up_rank)) <= & + random_vals(indx_r((up_rank+1):size(random_vals, kind=ip)))) + call check(error, (test1 .and. test2 .and. test3), "${name}$: random data left-constrained arg_select") + if(allocated(error)) return + + + ! Constrained search for a rank below p, providing 'right' + down_rank = p - (p/2_ip) + call arg_select(random_vals, indx_r, down_rank, & + kth_smallest, right=p) + + test1 = random_vals(kth_smallest) == & + random_vals(indx_r(down_rank)) + test2 = all(random_vals(indx_r(1:(down_rank-1))) <= & + random_vals(indx_r(down_rank))) + test3 = all(random_vals(indx_r(down_rank)) <= & + random_vals(indx_r((down_rank+1):size(random_vals, kind=ip)))) + call check(error, (test1 .and. test2 .and. test3), "${name}$: random data right-constrained arg_select") + if(allocated(error)) return + + ! Constrained search for a rank between up-ind and down-ind, + ! proving left and right. 'mid_rank' is either above or below p + mid_rank = p - p/3_ip*mod(i,2_ip) + (Nr-p)/3_ip*(1_ip-mod(i,2_ip)) + call arg_select(random_vals, indx_r, mid_rank, & + kth_smallest, left=down_rank, right=up_rank) + + test1 = random_vals(kth_smallest) == & + random_vals(indx_r(mid_rank)) + test2 = all(random_vals(indx_r(1:(mid_rank-1))) <= & + random_vals(indx_r(mid_rank))) + test3 = all(random_vals(indx_r(mid_rank)) <= & + random_vals(indx_r((mid_rank+1):size(random_vals, kind=ip)))) + call check(error, (test1 .and. test2 .and. test3), "${name}$: random data left-right-constrained arg_select") + if(allocated(error)) return + + end do + end do + + end subroutine + #:endfor + #:endfor + +end module + +program tester + use, intrinsic :: iso_fortran_env, only: compiler_version, error_unit + use testdrive, only: new_testsuite, run_testsuite, testsuite_type + use test_selection, only: collect_selection + + implicit none + integer :: stat, is + type(testsuite_type), allocatable :: testsuites(:) + character(len=*), parameter :: fmt = '("#", *(1x, a))' + + stat = 0 + + testsuites = [ & + new_testsuite("selection", collect_selection) & + ] + + do is = 1, size(testsuites) + write(error_unit, fmt) "Testing:", testsuites(is)%name + call run_testsuite(testsuites(is)%collect, error_unit, stat) + end do + + if (stat > 0) then + write(error_unit, '(i0, 1x, a)') stat, "test(s) failed!" + error stop + end if + +end program tester