You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Like sortperm, but accepts a preallocated index vector ix. If initialized is false (the default), ix is initialized to contain the values 1:length(v).
Maybe the innocent word "index" is already supposed to tell the reader exactly what to expect from this function. But it did not for me. I would even say that an index vector ix is any vector that contains nonnegative integers, since for any such ix we can find another vector x such that x[ix] is a valid indexing operation.
So here's my expectation: v can be put to a certain order, and usually this is what v[ix] should do. However, the fact that we can switch off initialization at all suggests that ix = 1:length(v) is a prerequisite for sortperm! to behave like this. And if ix contains other values, then these other values are put in the same order as 1:length(v) would have been put. Example:
The vector [4,3,2,1] can be sorted by reversing the order. So usually, sortperm! would reverse 1:4, but now it has to reverse 5:8.
If you check what the example does, you may indeed find that my expectations are fulfilled. However, just running it multiple times will show that you can really get anything:
Indeed, there are 24 permutations of four distinct elements, and you can get all of them. Some are more likely than others... Well, quicksort is unstable, but for unique input vectors, I'd still expect unique outputs.
So, first lesson that should either go into the documentation or into a rewrite of the function: ix must be a subset of 1:length(v), else the results are nondeterministic.
Second question: What happens if ix is such a subset, but it is unordered?
This looks like it never terminates. So actually, the order in which the indices are initially stored is completely irrelevant - this should also go into the documentation, as it tells the user when initialization is actually not necessary.
So then the final usage instructions for initialized are: Use false whenever your ix is not a subset of 1:length(v), else the results will be nondeterministic. Use true whenever they are such a subset, in whatever order. In particular, sortperm! with initialized=true can be called multiple times in a row with different vectors (of the same size), but ix does not need to be re-initialized.
julia>versioninfo()
Julia Version 1.8.3
Commit 0434deb161 (2022-11-1420:14 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU:16×11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz
WORD_SIZE:64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, tigerlake)
Threads:8 on 16 virtual cores
The text was updated successfully, but these errors were encountered:
projekter
changed the title
Documentation: sortperm! with initialized=false
Documentation: sortperm! with initialized=true
Dec 23, 2022
Like sortperm, but accepts a preallocated index vector or array ix with the same axes as A. If initialized is false (the default), ix is initialized to contain the values LinearIndices(A).
Which is a bit better but still does not fully explain initialized and its risks.
Perhaps we should remove the initialized option altogether (ignore it and always initialize). Behavior when initialized=true and ix is not a permutation of LinearIndices(v) is currently undefined (it is not documented anywhere and therefore not defined) and I don't think anyone depends it. The reason people use initialized=true is for performance, but the performance savings are negligible:
julia>for i in1:12
n =2^i
v =rand(n)
ix =collect(eachindex(v))
total_time =@belapsedsortperm!(copyto!($ix, eachindex($v)), $v, initialized=true) setup=(rand!($v))
copy_time =@belapsedcopyto!($ix, eachindex($v))
@printf"%4i | %7.2g %7.2g %.1f%%\n" n total_time copy_time copy_time/(total_time-copy_time)*100end
len | sort init possible savings from setting `initialized=true`2|5.3e-074.9e-090.9%4|5.4e-075.9e-091.1%8|5.7e-076.7e-091.2%16|8.1e-076.3e-090.8%32|1.2e-068.4e-090.7%64|2.1e-068.8e-090.4%128|3.9e-061.4e-080.4%256|7.8e-062.2e-080.3%512|1.6e-054.2e-080.3%1024|3.3e-057.8e-080.2%2048|7.1e-051.5e-070.2%4096|0.000153.7e-070.2%
The documentation of
sortperm!
says:Maybe the innocent word "index" is already supposed to tell the reader exactly what to expect from this function. But it did not for me. I would even say that an index vector
ix
is any vector that contains nonnegative integers, since for any suchix
we can find another vectorx
such thatx[ix]
is a valid indexing operation.So here's my expectation:
v
can be put to a certain order, and usually this is whatv[ix]
should do. However, the fact that we can switch off initialization at all suggests thatix = 1:length(v)
is a prerequisite forsortperm!
to behave like this. And ifix
contains other values, then these other values are put in the same order as1:length(v)
would have been put. Example:The vector
[4,3,2,1]
can be sorted by reversing the order. So usually,sortperm!
would reverse1:4
, but now it has to reverse5:8
.If you check what the example does, you may indeed find that my expectations are fulfilled. However, just running it multiple times will show that you can really get anything:
Indeed, there are 24 permutations of four distinct elements, and you can get all of them. Some are more likely than others... Well, quicksort is unstable, but for unique input vectors, I'd still expect unique outputs.
So, first lesson that should either go into the documentation or into a rewrite of the function:
ix
must be a subset of1:length(v)
, else the results are nondeterministic.Second question: What happens if
ix
is such a subset, but it is unordered?This looks like it never terminates. So actually, the order in which the indices are initially stored is completely irrelevant - this should also go into the documentation, as it tells the user when initialization is actually not necessary.
So then the final usage instructions for
initialized
are: Usefalse
whenever yourix
is not a subset of1:length(v)
, else the results will be nondeterministic. Usetrue
whenever they are such a subset, in whatever order. In particular,sortperm!
withinitialized=true
can be called multiple times in a row with different vectors (of the same size), butix
does not need to be re-initialized.The text was updated successfully, but these errors were encountered: