Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rename uninitialized to undef #26316

Merged
merged 4 commits into from
Mar 10, 2018
Merged

Conversation

KristofferC
Copy link
Member

@KristofferC KristofferC commented Mar 4, 2018

uninitialized is just so long...

Edit: I changed my mind to like undefmore... ALL VOTES ARE NOW INVALID MOHAHA ;)

@KristofferC KristofferC added the triage This should be discussed on a triage call label Mar 4, 2018
@KristofferC KristofferC force-pushed the kc/short_but_still_with_great_dignity branch 2 times, most recently from e8e55ba to 487991b Compare March 4, 2018 12:31
@Sacha0
Copy link
Member

Sacha0 commented Mar 4, 2018

🍿

@ViralBShah
Copy link
Member

ViralBShah commented Mar 4, 2018

This is the right direction. uninitialized is definitely a mouthful and too many characters, butuninit does sound a bit weird.

@StefanKarpinski
Copy link
Member

#TeamJunk

@haampie
Copy link
Contributor

haampie commented Mar 4, 2018

What about bare? At least it's just 4 characters.

@innerlee
Copy link
Contributor

innerlee commented Mar 5, 2018

Visually it is too similar to unit. One (me) might guess it is a variant of unit in some strange way...

@GregPlowman
Copy link
Contributor

undef ?

@GlenHertz
Copy link
Contributor

I like undef as it matches what is displayed in the REPL.

@ararslan
Copy link
Member

ararslan commented Mar 5, 2018

undef is actually its own thing, and using it here would be misleading: the array values are defined, but they're junk values.

@thofma
Copy link
Contributor

thofma commented Mar 5, 2018

undef is actually its own thing, and using it here would be misleading: the array values are defined, but they're junk values.

Not true for non-isbits type.

@JeffBezanson
Copy link
Member

"Undefined" unfortunately means a couple different things. In the case of isbits types, the values you get are undefined in a sense similar to "undefined behavior" --- you don't know what you're going to get. For non-isbits types, you know exactly what you're going to get --- null pointers that reliably throw UndefRefError when you access them. That's "undefined" in the sense of "not set yet".

The #undef in printing refers to the second behavior. But there isn't any way to get that behavior for all types, so we picked a different word.

@StefanKarpinski
Copy link
Member

We could use undef to mean either of those things. The unfortunate part is that people will try to use the undef singleton to fill an array and it will appear to work but it will not be doing what they wanted it to at all.

@GregPlowman
Copy link
Contributor

If the terminology is not standard, then perhaps change #undef to #null for use with reliably defined null pointers. This then frees up undef.

@swissr
Copy link
Contributor

swissr commented Mar 5, 2018

Matlab (file exchange) has an uninit. For me it's a bit an uninspired abbreviation (but better than the full word). My two 'ideas':

  • reconsider junk and in addition provide an uninitialized alias. Julia would use the colloquial/technical? term but be 'sensible enough' to provide an alternative for persons and organisations which deem junk a bit too cute (btw. go, rust, python, also do have some junk in their code bases)

  • but my favorite would be new: it's neutral, very short and could be thought as an abbreviation for new_uninitialized_memory; the 'related' new in inner constructors also has (possibly) uninitialized fields and is rather low-level.
    In the three array initialization forms I found, Array{T,N}(::New, dims), Array{T,N}(::Nothing, dims), Array{T,N}(::Missing, dims), the first one, ::New (i.e. ::Uninitialized) is by far the most used and thus I think, if at all possible should have the shortest name.
    On the negative side, new is maybe a bit too broad and not as plain as uninitialized (or junk). - Not a good argument, but in Pascal New allocates a new uninitialized instance.

@JeffBezanson
Copy link
Member

JeffBezanson commented Mar 5, 2018

Changing #undef goes a bit deeper than that, since we use this terminology in various places, like UndefRefError, UndefVarError, UndefKeywordError.

I think new fails to capture the key concept that the memory is uninitialized, or its contents undefined, or the like.

junk also means something a bit different to me; that's sometimes used to mean filling with a "weird" non-zero value like 0xBB, or quasi-random values.

@alyst
Copy link
Contributor

alyst commented Mar 6, 2018

skipinit?

@tk3369
Copy link
Contributor

tk3369 commented Mar 6, 2018

raw?

@alyst
Copy link
Contributor

alyst commented Mar 6, 2018

dirty?

@KristofferC
Copy link
Member Author

KristofferC commented Mar 6, 2018

I think undef isn't so bad. It is what we print for non bitstype when it is uninitialized and for bitstypes it can be argued that the values you get are undefined. It is short and easy to say but also a bit "scary" which is good, because uninitialized stuff are scary.

@GregPlowman
Copy link
Contributor

Or unset ?
We talk about setting a variable or array element.

@ViralBShah
Copy link
Member

noinit

@chethega
Copy link
Contributor

chethega commented Mar 6, 2018

What about alloc instead of new? The idea would be that the first argument tells us how to initialize the array; in this case, we only request the memory.

Bonus point 1: We can use calloc in the future to get zero-byte initialization without paying for zeroing out the memory if we get it fresh from the kernel (use calloc in array.c non-pool arrays instead of malloc). This is nice for giant arrays that never get filled (we never get the memory from the kernel).

Bonus point 2: In the future someone might want more control over where and how to allocate. This way we have streamlined the names beforehand!

@ViralBShah
Copy link
Member

How about unfilled, inspired by one of Stefan's comments on slack.

@JeffBezanson
Copy link
Member

I think undef, uninit, and noinit all express the concept reasonably well.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Mar 6, 2018

My proposal is this:

  1. Use Array{T}(size=(m,n)) for uninitialized array allocation now.
  2. When (if) we can do zeroed out allocation efficiently enough to make it a default, we change the behavior of Array{T}(size=(m,n)) to provide zeroed out memory. Note: not filled with zero(T) but with zero bytes, whatever value of T that happens to represent, possibly an invalid one.
  3. At that point, provide an even less convenient way to ask for truly uninitialized memory.

This approach avoids having a thing (uninitialized) which pretends to iterate a non-value (#undef / junk values) and has the benefit that people who write code with it now for performance will no longer have uninitialized memory in the future if/when we figure out how to make that efficient (possibly by proving that every value is written to by their program so the array doesn't need to be filled with zeros first). After all, people don't actually want uninitialized memory, what they want is an array of a certain size as quickly as possible with the understanding that have to fill it with values.

@chethega
Copy link
Contributor

chethega commented Mar 6, 2018

Array{T}(size=(n,)) is imho much harder to visually parse than Array{T}(uninitialized, n) in order to allocate a vector of length n. Am I understanding your proposal right that you want to make size a keyword-only argument in order to disambiguate from iterators?

@JeffBezanson
Copy link
Member

I think we might as well provide a way to explicitly request uninitialized memory now --- it's very easy, since it's already implemented, and I'm not optimistic about making zero-initializing cost-free.

@tbeason
Copy link

tbeason commented Mar 6, 2018

I think undef, uninit, and noinit all express the concept reasonably well.

I agree with Jeff.

@mbauman
Copy link
Member

mbauman commented Mar 6, 2018

I've not weighed in here yet because I really don't find uninitialized all that horrible. unin⇥ works great and I actually like the way it looks. But if we're going to change things, I'd throw my support behind keyword arguments. As with all spelling changes, my reasons are heavily subjective:

  • Array{Int}(I, size=(3,3)) is more readable to me than the status quo is.
  • It fits nicely with the fallback default Vector() and Matrix() that folks have argued over in the past. It gives them a reason for being. The default size is simply Array{T, N}(;size=ntuple(i->0, N)) when no positional arguments are given. Otherwise, it's the size of the positional argument.
  • It fits nicely with how we've been describing keyword arguments in the 0.7 transition. Positional arguments provide the content, keywords provide the metadata. If you don't provide content, it's undefined.
  • I think we could also support non-tuples in the one-dimensional form Array{Int}(size=42) to make it a little less ASCII-soupy. It's actually quite easy and requires no special cases since integers are iterable.
  • I've not fully worked out the implementation, but I believe it provides space for a simultaneously supported axes= keyword argument that allows for a more general offset array implementation.

@JeffBezanson
Copy link
Member

The default size is simply Array{T, N}(;size=ntuple(i->0, N)) when no positional arguments are given.

I don't think that makes sense --- positional arguments can also have default values. If 0x0 is the right thing to return for Matrix(), then that remains true independent of whether keyword arguments exist IMO. Now, I don't like that behavior myself, but I think it's a separate issue.

I'm in favor of the keyword arguments, but I'm pretty sure we will also need a first-class way to request uninitialized data at some point, in some context. Deliberately adding a construct (Array{T}(size=...)) that will give uninitialized data now but likely not in the future also seems unnecessarily complex.

@mbauman
Copy link
Member

mbauman commented Mar 6, 2018

I don't think that makes sense --- positional arguments can also have default values. If 0x0 is the right thing to return for Matrix(), then that remains true independent of whether keyword arguments exist IMO. Now, I don't like that behavior myself, but I think it's a separate issue.

I think this is different because Array{T}() currently does have a completely different meaning: it constructs a zero-dimensional array. Moving the size arguments into a keyword removes this existing meaning from the constructor, making this a whole lot less objectionable.

@JeffBezanson
Copy link
Member

We're talking about Array{T,N}() though.

@chethega
Copy link
Contributor

chethega commented Mar 7, 2018

Apart from some people (including myself) not liking the keyword syntax: Are we sure that keyword arguments are reliably fast and the type gets inferred reliably? (we need to infer the N of Array{T,N} from the type of the size argument, and who knows how the surrounding code will look like when users call this)

I mean, requesting some uninitialized memory for an array is pretty low-level and should give predictably good machine code. Looking at e.g. the first of the following does not fill me with confidence in the reliability of performant keyword args, today:

@noinline foo(;x)=x
@code_native foo(x=3)
# terrible code, allocates and dynamic dispatch

@noinline foo(x)=x
@code_native foo(3)
#good code

foo(x)=x
@code_native foo(3)
#good code

foo(;x)=x
@code_native foo(x=3)
#good code

versioninfo()
#Julia Version 0.7.0-DEV.4465
#Commit f569ebe9bc* (2018-03-04 15:47 UTC)

Also, I really like the first slot of the constructor as a singleton for giving metadata on initialization. This allows people to extend it by stuff like scrubbing their data on free,

struct Clear_on_free end
function Array{T}(::Clear_on_free, size...) where T
       rv = Array{T}(uninitialized, size...)
       finalizer(rv) do rv ccall(:bzero, Nothing, (Ptr{Nothing}, Cint), pointer(rv), sizeof(rv)) end
       rv
end

or, occasionally useful for branch-free pushing (get a giant chunk of virtual memory that is materialized by the kernel on-demand, needs the finalizer hack in order to not play havoc with julia's memory accounting but then you can just write to the array and TLB/pagefault/kernel does the resizing for us)

struct Unaccounted_calloc end
function Array{T}(::Unaccounted_calloc, size...) where T
       @assert isbits(T)
       sz = prod(size)
       Aptr = ccall(:calloc, Ptr{Nothing}, (UInt64,UInt64), sz, sizeof(T))
       Aptr !== C_NULL || error("alloc failure")
       rv = unsafe_wrap(Array, convert(Ptr{T}, Aptr), size; own = false)
       finalizer(rv) do rv ccall(:free, Nothing, (Ptr{Nothing},), Aptr) end
       rv
end
ar=Array{Int}(Unaccounted_calloc(), 1<<30);
@time ar=Array{Int}(Unaccounted_calloc(), 1<<30);
# 0.000121 seconds (18 allocations: 800 bytes)

@mbauman
Copy link
Member

mbauman commented Mar 7, 2018

Why would we use @noinline here? Even if we did:

julia> struct Foo{T,N}; end;

julia> @noinline Foo{T}(;size=0) where {T} = Array{T,length(size)}(uninitialized, size...)

julia> @btime Foo{Int}(size=$((1,2)))
  25.948 ns (1 allocation: 96 bytes)
1×2 Array{Int64,2}:
 140227429422112  0

julia> @btime Array{Int}(uninitialized, $1, $2)
  25.267 ns (1 allocation: 96 bytes)
1×2 Array{Int64,2}:
 140227341983856  140227337813616

Some optimizations can only be performed if you call it from a compiled context — e.g., use f(y) = foo(x=y) and then introspect f instead of foo. That'll be more representative of real code.

@KristofferC KristofferC force-pushed the kc/short_but_still_with_great_dignity branch from 487991b to 3d1d014 Compare March 7, 2018 09:54
@KristofferC KristofferC changed the title rename uninitialized to uninit rename uninitialized to undef Mar 7, 2018
@KristofferC KristofferC force-pushed the kc/short_but_still_with_great_dignity branch from 3d1d014 to ee6203d Compare March 7, 2018 09:56
@KristofferC
Copy link
Member Author

Changed from uninit to undef based on the democratic concept of emoji-voting (#26316 (comment)).

@KristofferC KristofferC force-pushed the kc/short_but_still_with_great_dignity branch from ee6203d to ef6a313 Compare March 7, 2018 10:58
@innerlee
Copy link
Contributor

innerlee commented Mar 7, 2018

unknown, unsure

@JeffBezanson
Copy link
Member

undef is ok but it might be confusable with "null pointer", e.g. thinking that x = undef will make x an undefined variable. Maybe we can mitigate that with the name of its type, e.g. calling it UndefinedContents or UndefInitializer or the like.

@JeffBezanson
Copy link
Member

Triage thinks we can go with undef, but print it as something like initializer with undefined values.

@KristofferC
Copy link
Member Author

Ok, will update PR with new show method and the type to UndefInitializer

@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Mar 8, 2018
@KristofferC KristofferC force-pushed the kc/short_but_still_with_great_dignity branch from 070a9dd to e1efa94 Compare March 9, 2018 08:32
@martinholters
Copy link
Member

I'm still uneasy with the similarity of undef and #undef. Isn't it too easy to expect e.g. new(undef, x) to leave the first field (of whatever is being instantiated here) undefined in the #undef sense?

@KristofferC
Copy link
Member Author

How did a user find undef? Presumably by looking at some docs. It is pretty clear what it is used for. Trying this anyway:

julia> struct F x
       F() = new(undef)
       end

julia> F()
F(array initializer with undefined values)

Also clear it is not #undef.

@martinholters
Copy link
Member

julia> Vector{Array}(undef, 1)
1-element Array{Array,1}:
 #undef

julia> Ref{Array}()
Base.RefValue{Array}(#undef)

julia> Ref{Array}(undef)
ERROR: MethodError: Cannot `convert` an object of type UndefInitializer to an object of type Array

This looks a little less confusing with uninitialized, IMO. (And we might consider deprecating Ref{T}() to Ref{T}(::Uninitialized), solving this particular case, but you get the idea.)

@KristofferC
Copy link
Member Author

This looks a little less confusing with uninitialized, IMO.

Sure but the whole point here is that people hate to write uninitialized. Yeah, this might come at the cost that someone will try something like your example, it doesn't work, they look up the docs for what they are actually using, and get enlightened.

@JeffBezanson JeffBezanson added this to the 1.0 milestone Mar 9, 2018
@KristofferC KristofferC merged commit 92879ea into master Mar 10, 2018
@KristofferC KristofferC deleted the kc/short_but_still_with_great_dignity branch March 10, 2018 08:35
Keno pushed a commit that referenced this pull request Jun 5, 2024
* uninit is dead, long live undef

* UnInitialized -> UndefInitializer + show method

* also deprecate Uninitialized
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.