Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go2: read-only and immutability #29192

Closed
JavierZunzunegui opened this issue Dec 12, 2018 · 35 comments
Closed

proposal: Go2: read-only and immutability #29192

JavierZunzunegui opened this issue Dec 12, 2018 · 35 comments
Labels
FrozenDueToAge Proposal v2 An incompatible library change
Milestone

Comments

@JavierZunzunegui
Copy link

JavierZunzunegui commented Dec 12, 2018

Abstract

This document presents a solution on how to introduce read only and immutability support in go2. It is split into two parts:

  • A read-only language change that provides the expected compile-time guarantees with limited additional complexity or inconvenience to developers.
  • How immutability may be implemented on top of these language change, transparently to developers.

This document and it's ideas build up on the Go 2: read-only types proposal by Jonathan Amsterdam, which is referred to as 'the original' throughout. It focuses on the two points above while deliberately omitting the more general questions 'why is read-only needed?', 'how is read-only different to immutability?', 'why a read-only and not an immutable keyword?', etc. These are discussed in the original proposal and follow up comments and don't need to be repeated here. I encourage anyone reading this proposal to also look at this previous one for additional context, although this one should be understandable on its own.

Other posts from which I may borrow or get inspiration are Evaluation of read-only slices, Go - Immutability or Why Const Sucks.

Language changes

The ro keyword

The fundamental change in this proposal is the ro (read-only) keyword, taken in almost the same form as in the original proposal. Some highlights that still hold:

Permissions are transitive: a component retrieved from a read-only value is treated as read-only.

There is an automatic conversion from T to ro T.

This proposal is concerned exclusively with avoiding modifications to values, not variables.

A value of read-only type may not be immutable, because it may be referenced through another type that is not read-only.

On the other hand, the ideas around 'Permission Genericity' in the original proposal are dropped (as well as the ro? syntax).

The meaning of ro applied to interfaces is also subtly changed. In the original it is described as:

If I is an interface type, then ro I is effectively the sub-interface that contains just the read-only methods of I.
If type T implements I, then type ro T implements ro I.

In this proposal the meaning is slightly different - for an interface I, ro I contains a ro T for T implementing I. If type T implements I, then type ro T implements ro I, and vice versa.
(edited)
Additionally, if T implements I and all methods in I have a ro receiver in T, then ro T also implements I.

Anonymous functions are not mentioned in the original proposal, but this one regards them as a write operation and therefore not callable in ro form.

relaxed typing rules for ro qualifiers

The introduction of the ro keyword is intended to bring all the compile-time read-only guarantees at the minimum cost to developers. To achieve it, this proposal departs from the original one and establishes that the typing rules for ro-qualified types are somewhat relaxed, as in many contexts they are interchangeable with their non-ro form. Conceptually, ro-qualifying introduces compile time restriction on what operations are allowed but are largely indistinguishable at runtime.

It is a given a T may be automatically converted to ro T, but not in reverse. Similarly special allowances are introduced for functions and methods, not otherwise allowed in go.
In more detail:

Any ro-qualified argument in a function may be given a non-ro argument:

func foo(ro X) {...}
x := X{}
foo(x) // legal

Any method requiring a ro-qualified receiver may be called on a non-ro receiver:

type foo struct {}
func (_ ro foo) Bar() {...}
f := foo{}
f.Bar() // legal

Any function may be converted to another similar function with some ro-qualifying argument(s) removed:

func foo(_ ro X) {...}
f := foo          // type func(ro X)
var _ func(X) = f // legal, type func(X)

Any method with a ro receiver or argument may implement a method without it:

type foo struct {}
func (_ ro foo) Bar1() {...}
func (foo) Bar2(_ ro X) {...}

type I interface {
    Bar1()
    Bar2(X)
}
var _ I = foo{} // legal

Similar rules apply to return values - a function or method with a non-ro return value can be converted or used to implement one where the return value is ro-qualified. The same rules apply to maps, arrays, slices, channels, i.e. []X may be used as an argument to []ro X, etc. Overall, the principle is that inputs may be relaxed to permit more general forms (non-ro), and outputs may be tightened to produce more specific forms (ro).

By having ro-qualifying not produce fully independent types the need for duplicate implementations (ro and non-ro forms) is averted and the read-only can be supported with very little additional complexity or work by the developer.

Note this is not overloading as only a single function or method may exist with a given name, and the same implementation will be used regardless of the ro-ness of the arguments.

To support this language changes the linker must be relaxed to allow linking when the function types differ only in ro-ness as described above.

Interface type assertion and ro

Fundamentally, the above ro syntax allowing seamless conversions around ro-ness relies on the runtime form of a type X being identical to that of ro X. This is trivially true for non-interface types, as in the go model and unlike many other languages the type is not embedded in their runtime value. This can even be trivially extended to interfaces with regards to methods being called - calling a method on a ro I can be implemented identically to calling the same method on I. The itab can be reused and the same function will be called regardless, while non-ro receiver methods can't be called in the first place so need no special handling. However, type assertion does not immediately follow. Consider the below:

i := interface{}(X{})  // represented in runtime as {type of X, value}
j := ro interface{}(i) // type ro interface, has same runtime representation as above {type of X, value}. The value within it though is not X but ro X

To satisfy this, asserting of ro-qualified interfaces to non-ro qualified struct is illegal. When asserting a ro I to a ro-qualified struct ro X the the runtime must confirm the runtime type within the interface is X, even though it is returning a ro X. For example:

i := interface{}(X{})  // X in an interface
j := ro interface{}(i) // ro X in a ro interface, represented in memory as {type of X, value}
_, _ = j.(X)           // illegal, ro interface can't be a non-ro struct
_. ok := j.(ro Y)      // legal but false, the runtime type in the interface is X not Y
_. ok := j.(ro X)      // legal and true, the runtime type in the interface is X which is implicitly ro X

The assertion of ro-interfaces to other ro-interfaces is fundamentally the same as that of non-ro interfaces, just in ro space:

x := X{}           // assume X implements interfaces I1 and I2, but not I3
i := ro I1(x)      // type ro I1, represented in memory as {type of X, value}
_, ok := i.(ro I3) // legal but false, X does not implement I3
_, ok := i.(ro I2) // legal and true, X does implement I2

(edited)
There is also the case ro T implementing I(which occurs when T has ro-receivers on all methods required by I), which means I is implemented by both the ro and non-ro form. The fundamental property they must guarantee is that a ro T assigned to one such I may not be asserted back to a T, nor be asserted to any interface a ro T would not satisfy. Since both T and ro T may be contained by such an I and yet the distinction must be made by the runtime, they must be represented differently within the interface's type value at runtime and interpreted correctly while asserting. For example:

// Foo has non-ro receivers
type Foo interface {
    Bar()
}

// foo1 implements Foo and ro Foo
type foo1 struct{...}
func (_ ro foo) Bar() {...}

// foo2 implements Foo but not ro Foo
type foo2 struct{...}
func (_ foo) Bar() {...}

var f Foo

f = foo2{}
_, ok := f.(ro foo2) // illegal, does not implement it
f = ro foo2{}        // illegal, does not implement it

f = foo1{}          // foo1 in a Foo, represented in memory as {type of foo, value}
_, ok := f.(ro foo1) // legal but false, runtime type is foo
_, ok := f.(foo1)    // legal and true, runtime type is foo

f = ro foo1{}       // ro foo in a Foo, interface representation in runtime is {type of ro foo, value}
_, ok := f.(ro foo1) // legal and true, runtime type is ro foo
_, ok := f.(foo1)    // legal but false, runtime type is ro foo

To support this language changes, the runtime must be modified to correctly implement this new type asserting rules.

Remarks

The empty interface

The empty interface interface{} still matches everything and can always be asserted back to the exact type that was assigned to it. No change in that respect, it can still be used as a reversible wildcard type match.

Migration

(edited)
These changes, in it's rawest form, are backwards compatible in that existing go code will work for it.
However many methods and types (and some interfaces, though those may be rarer) should be modified to reflect the new ro functionality. A key aspect though is that they could be migrated somewhat gradually with modest (if any) breaking changes at each step. In particular note that there is no cost in making the arguments of functions ro when not in breach of read-only rules in the first place (including receivers for methods). This way, the methods and functions lowest in the stack can be migrated first, and then those relying on those, etc. gradually progressing up the stack. Interfaces will not normally require an update (as ro types may implement non-ro interfaces), though in some cases they may want to be updated regardless (such as for immutability purposes), although that will likely result in backward incompatible changes.

Code relying on type assertion may also need re-visiting given interfaces formerly using T may be migrated to be ro T instead and assertion on T will no longer work.

The case for mutable keyword

Adding ro can be a very limiting restriction and an exception to the rule may be wanted, such as a mut (mutable) keyword. This is done in some other languages with similar features. I believe adding this is a bad idea but it is worth debating given its implication for establishing best practices around ro.

ro []ro T vs ro []T

For slices I have taken the ro syntax of the original proposal, namely []ro T to mean 'slice of read-only Ts' and ro []T to mean 'read-only slice of read-only Ts'. Alternatively, ro []ro T could take that meaning, and ro []T becomes 'read-only slice of Ts'. While more verbose, this syntax is more expressive and powerful. I believe either this or comparable solutions should be explored, and possibly also for maps and arrays.

Immutability

Transparent feature on top of ro

Consider the following:
IF the compiler can assert all references to a ro variable are also ro themselves, it can treat the variable as immutable

Note here the reference to compiler - it is not the developer who decides if a ro variable is, or even might be, immutable, but the compiler. The immutable property is completely transparent to developers and is irrelevant in all other regards other than performance - the logic of a method is identical regardless. To achieve this, the compiler must effectively do escape analysis on ro's - any non ro-escaping ro variable can be treated as immutable, but one that ro-escapes can't. This is very similar to go's heap escape analysis - whether a variable ends up in heap or stack - which is up to the compiler and irrelevant for all (non-performance) purposes to the developer.

In other languages immutability often means the memory is only written to once (during initialisation) and remains unchanged until reclaimed (const in C, final in java, etc). This is not the case in this proposal - non-immutable memory may become immutable after it has been initialised or even modified, and mutable ro variables may coexist and reference the same memory as immutable ro variables (though no non-ro variables). In this sense immutable ros are not like const (in the current go sense, not referring to C-style const).

Initialisation

Initialising immutable ros (via ro return statement)`:

// an immutable ro
func foo() /* im */ ro []int {
  out := []int{1}      // not immutable, not even ro
  out = append(out, 1) // not immutable or ro (and in fact is being modified!)
  return out           // ro AND immutable, despite having been non-immutable (and non-ro) before. Has NOT been deep-copied
}

// an immutable ro, which may be referenced by another ro (mutable or not, irrelevant here)
func bar() /* im */ ro []int {
  // note ro argument
  auxFunc := func(x ro []int) {...}

  out := []int{1} // not immutable, not even ro
  auxFunc(out)    // out escapes, but in ro form
  return out      // ro AND immutable, despite having been non-immutable (and non-ro) before and being referenced by another ro. Has NOT been deep-copied
}

// ro but non-immutable, may have non-ro references remaining
func nonImmutable() /* mut */ ro []int {
  auxFunc := func(x []int) {...} // note non-ro argument

  out := []int{1} // not immutable, not even ro
  auxFunc(out)    // out escapes in non-ro form
  return out      // ro but NON immutable as it may be referenced by a non-ro. Has NOT been deep-copied
}

Initialising immutable ros (via non-ro return statement and casting afterwards):

// non ro-escaping return value
func foo() []int {
  out := []int{1}
  return out // not ro but non ro-escaping
}

var _ /* im */ ro []int = foo() // immutable as return value not available after assignment

x := foo()
var _ /* mut */ ro []int = x // mutable as x is ro-escaping this assignment
// some non-ro use of x

// ro-escaping return value
func bar() []int {
  auxFunc := func(x []int) {...}

  out := []int{1}
  auxFunc(out) // causes ro-escaping
  return out   // not ro and ro-escaping
}

var _ /* mut */ ro []int = bar() // mutable as return value is already ro-escaping

Implementation

As stated before, ro is primarily a means of restricting what operations are allowed, but is virtually irrelevant at runtime. Immutability is the exact opposite, it is transparent to the developer but modifies how the code is executed. For this reason, while ro-qualified types are separate types, immutability is not. Instead it is an optimisation built into the binary, whereby function calls with inputs known by the compiler to be immutable may be linked to alternative, optimized implementations of the same function. This is similar to overloading in that a single function may have multiple implementations, but differs in that it is not the developer but the compiler that picks which one is used.

I am not certain my understanding about the go runtime, compiler and linker are entirely correct here, please share if you know better.

Some changes to both the compiler and the linker would be needed to support this 'immutability from read-only' feature'. In particular methods with ro inputs (methods that have some ro argument or receiver) would require the compiler creating multiple implementations, potentially one for each ro combination. It would then be up to the linker to decide which is called based on the immutability of the arguments and target (as known at compile/link time). For example:

func foo(ro []int) {...}

would be compiled in 2 separate ways, identical in all ways except the immutable one may include the additional optimisations:

/* compiler pseudo-code, not actual go syntax */
// compiler: foo#0 func(mut ro []int) {...}
// compiler: foo#1 func(im ro []int) {...}

It would be up to the linker to link to the right one:

foo([]int{1}) // uses immutable implementation foo#1

x := []int{1}
foo(x) // uses mutable implementation foo#0
// some other non-ro use of x

This property must extend to interfaces. There will be nothing in the runtime value of an interface that states it's immutability state, hence the compiler/linker must be the one that decides again. I think this can be done by modifying the use of itab (or a similar new type, say itab2) to support not just multiple types but also multiple mutability-status on these types. When a ro input method is called on an interface the type resolution getitab (or a similar new method, say getitab2) will be called with some additional argument that denotes it's immutability state, imState. That way it can resolve the correct method to call, including immutability status. For example:

type foo struct {}

/* compiler pseudo-code, not actual go syntax */
// compiler: foo:Bar#0 func(mut ro []int) {...}
// compiler: foo:Bar#1 func(im ro []int) {...}
func (foo) Bar(ro []int) {...}

type foo2 struct {}

/* compiler pseudo-code, not actual go syntax */
// compiler: foo2:Bar#0 func(mut ro []int) {...}
// compiler: foo2:Bar#1 func(im ro []int) {...}
func (foo2) Bar(ro []int) {...}

type Foo interface {
    // compiler: itab2 [imState][type]
    //           itab2 [0][0] = foo:Bar#0
    //           itab2 [0][1] = foo2:Bar#0
    //           itab2 [1][0] = foo:Bar#1
    //           itab2 [1][1] = foo2:Bar#1
    Bar(ro []int)
}

f := Foo(foo{})

// immutable
x := ro /* im */ []int{1}

// immutable input => imState=1.
// imState is a new argument to getitab2 (otherwise equal to getitab)
// the value of imState known at compile time and is different constant for each separate form of a ro input method (#0, #1 in my syntax).
// the below interface therefore calls
// getitab2(*interfacetype, *_type, bool, imState=1)
// which then resolves to the immutable form of the specific method, i.e. either foo:Bar#1 or foo2:Bar#1
// given the type of f (this is still, of course, evaluated at runtime) it calls foo:Bar#1, the immutable form of the method
f.Bar(x)

Note that the below only has one ro argument, but of course methods may have multiple ones, and the receiver itself may be ro, sogetitab2 must support different values for imState. How best to define those I leave as a future task and should be explored more carefully.

Remarks

How meaningful are the performance gains?

The purpose of immutability is to provide performance gains by avoiding unnecessary memory reads. How significant those gains are is not obvious, though. It should be investigated before commiting to this changes.

Binary bloating

If all the above changes are implemented, these may have the unintended consequence of bloating the binary as functions may have multiple implementations. Immutability is, however, at the compiler/linker's discretion and so it may decide to drop immutable implementations of methods that do not get called often or are otherwise large and provide little gain, and instead use the mutable forms.

Compile time impact

Build time of individual packages may be negatively affected as multiple implementations may need to be built for a single method, as well as an additional step of escape analysis of ro. Longer build time for binaries will inevitably follow.

Benchmarking and immutability

Given immutability is transparent to the developer but improves performance, it may be a double edge sword when running benchmarks. It is possible for a benchmark to be run on immutable inputs but then the binary does not (or vice-versa), present misleading results with little visibility to the developer.
Some risk mitigation best practices should be explored.

Immutable parameters

So far immutability has covered arguments to functions, but it should also cover parameters held within other types. For example, in the below, can parameter t be considered immutable?

type X struct {
    t ro T // immutable or mutable?
    ...
}

It is generally impossible for the compiler to identify where such X was initialised or modified, and therefore must assume the parameter is mutable. An exception can be made when all writes to such parameter are known to set immutable values, in which case the parameter can be considered immutable. This can only realistically be done for private members, and even then while the parameter may have an immutable value, it may still be changed for another immutable value. For example:

type X struct {
    t ro []int // immutable, because is private and all writes in the package set an immutable rp []int
    ...
}

x := X{
    t: []int{1,2} // t set to immutable slice
}
func(_ ro []int) {...} (x.t)  // called in it's immutable form, x.t is immutable
func(_ ro *[]int) {...} (x.t) // called in it's mutable form, &x.t is mutable
x.t = []int{1} // legal, t is re-set to another immutable slice

This makes the immutability of parameters a fairly limited property. One way to make it more powerful would be to allow ro to be applied to parameters in the following way:

type X struct {
    ro t []int // this would mean a read-only []int which can't be overwritten with different ro []int value. Also &x.t can now be considered immutable
    ...
}

For reference, this is similar to 'avoiding modifications to values, not variables' mentioned in the original proposal, but applied to parameters not variables.

Although this would also require restrictions on how dereference is done in go, i.e.

x := X{
    t: []int{1,2} // t set to immutable slice
}
xPtr := &x  // is still type *X, not ro *X
*xPtr = X{} // has to be made illegal, this overwrites x.t

This would be a significant change to the language with many breaking changes, so should not be seen as a requirement of this proposal.

Immutability Check

There may be situations when the runtime wants to validate the immutability state of a variable. A public method may be added to the runtime package for this purpose:

package runtime

/* compiler pseudo-code, not actual go syntax */
// compiler: IsImmutable#0 func(mut ro interface{}) bool {return false}
// compiler: IsImmutable#1 func(im ro interface{}) bool {return true}
func IsImmutable(ro interface{}) bool

The special thing about this method is that, while for all other methods immutability is considered merely a performance feature, in this the output actually changes. The mutable form of the function returns false, the immutable true.

This allows for the syntax below:

func  ToImmutable(x ro []int) /* im */ ro []int {
    if runtime.IsImmutable(x) {
        // input is immutable, no change needed
        return x
    }

    // input not immutable, deep-copy it
    copied := make([]int, len(x))
    copy(copied, x)
    return copied
}

Which in turns gives the option to actually enforce immutability to any given API (at the cost of deep coping in the inputs aren't immutable). Also note that the output of IsImmutable will be known at compile time anyways, so a smart compiler will inline it and simplify methods like ToImmutable.

For the same reason it could be added as a line directive which will fail compilation when it doesn't hold, say:

//go:immutable
var _ /* im */ ro []int = x

This could in turn provide the same (if more verbose) benefits of other proposals introducing immutable types. While I would make the case that this directive shouldn't be abused, it does ensure that there is support for it for the few times immutability is a serious concern for an API.

string vs ro []byte

An immutable ro []byte is almost identical to a string, and opens up for conversion between one and the other without any need for memory allocation and deep copying:

// string to ro []byte
s := "Hello World" // type string
_ = ro []byte(s)     // type ro []byte, is immutable. Does not require deep copying, the SliceHeader can use the same Data as that held by the string

// ro byte to string
imB := ro []byte{'a'} // immutable, ro []byte
_ = string(imB)       // string, does not require deep copying as the  input was immutable
mutB := /* assume some mutable ro []byte*/
_ = string(mutB)     // string, does require deep copying as any other []byte would in current go

This makes the conversion between string and ro []byte effectively free, and that from ro []byte to string potentially free, but depends on context.

Note this exact comparison is discussed extensively in Evaluation of read-only slices

Conclusion

Introducing read-only types to go2 has multiple advantages for go development in the form of clearer APIs and safer code. I believe the performance gains of immutability is also a feature of read-only types, even if it hadn't been recognised as such before. The ideas introduced or expanded in this proposal are not complete, detailed or final, but target much of the concerns from the community and go project maintainers around introducing read-only types. Overall I hope this will lead to new discussion and maybe tilt the balance towards having read-only support added to go2.

@gopherbot gopherbot added this to the Proposal milestone Dec 12, 2018
@rsc rsc added the v2 An incompatible library change label Dec 12, 2018
@deanveloper
Copy link

I'm personally a fan of keeping read-only-ness as a property of function parameters rather than of types as a whole. We lose a little bit of flexibility but it makes sure that the Go type system stays nice and simple.

That being said, while I'm not exactly a huge fan, I'm not opposed to this either. I think some of it gets a bit complicated when we get into how these interact with interfaces. It seems to mostly be caused by polluting the simplicity of the Go type system.

I think it's quite clear what ro []Foo means at first glance, but it's not at all clear or intuitive what interface { ro Foo() } means.

@JavierZunzunegui
Copy link
Author

JavierZunzunegui commented Dec 12, 2018

but it's not at all clear or intuitive what interface { ro Foo() } means.

That would be 'has a method Foo that doesn't change the receiver'.

(updated)
If it is clear or not is debatable, what is worth highlighting is why it is needed - a ro interface must not access the non-ro methods on the struct that implements it, and banning all method calls on a ro interface is way too restrictive (and effectively puts interfaces in conflict with ro feature).

The point being, removing just ro from interfaces is not an option. Either more features of the proposal are removed to the point of it not being needed, or it is kept (or some other solution is found that provided the required read-only guarantees)

@deanveloper
Copy link

deanveloper commented Dec 12, 2018

Oh I'm aware it's needed, I just don't think it's clear. It looks like the ro applies to the whole function rather than just the receiver. ro applying to the whole function of course doesn't make sense if you know how the ro modifier works, but if you don't know, it's definitely not apparent.

For instance this may look confusing to someone:

type Reader interface {
    ro Read([]byte) (int, error)
}

If you don't know what ro means in the context of the interface, it looks a lot like it could mean...

  • The function cannot modify any kind of outside state (it's a "read only function")
  • Syntactic sugar for all arguments are read only (doesn't make sense for a reader, we need to modify the []byte)
  • It's some kind of crazy magic

But either way it's just syntax, so if there is a better way to do it, it should be easily changable. Doesn't hurt the underlying concept behind the proposal.

@deanveloper
Copy link

I think some of it gets a bit complicated when we get into how these interact with interfaces.

Also by this, I didn't only mean with the example I gave. Just the general logic (which is very sound if I may add, just confusing) behind which types can implement what interfaces can get very confusing, and as I said, this is because by adding this feature, we would be adding a layer of complexity on Go's type system.

Again I'm not super against this proposal or anything, just stating my main criticisms.

@zigo101
Copy link

zigo101 commented Dec 22, 2018

@JavierZunzunegui
In your last example, rox.Bar() is invalid for the type ro X doesn't has the method Bar.
The Bar method is declared for type X, not ro X. The method set of X should be super set of ro X, which means for every method declared for ro X, an implicit same prototype method will be declared for X, but not vice versa.

@JavierZunzunegui
Copy link
Author

@go101 thanks for spotting, updated above

@ianlancetaylor
Copy link
Member

I'm not clear on how this works for interfaces. Can I store an ro-qualified value into an ordinary interface type? The comment that anything is permitted in a interface{} suggests that this is permitted. When I type assert on that value, do I have to type assert back to an ro-qualified type?

How does this work with the reflect package?

As you say, this is quite similar to #22876. Is there a reason to discuss it separately?

@JavierZunzunegui
Copy link
Author

JavierZunzunegui commented Jan 9, 2019

Can I store an ro-qualified value into an ordinary interface type?

As the proposal stands, no (assuming ordinary interface = 'has some non-ro receiver method'). It is allowed for interfaces where all methods have ro receiver. The empty interface in this sense is non-ordinary as it has no methods, accepting any T or ro T.

Reading between the lines in your question though, you may have a point that restriction is excessive. Instead the correct rule may be: ro T may implement I (non-ro) if all methods in I are present in T with a ro receiver.

I will double check this statement is correct later today when I have more time and update the proposal accordingly.

(edited)
The condition was indeed too restrictive at first, I updated the document accordingly. The answer now is:
yes, if the type has a ro receiver on all the methods required by the interface.

@JavierZunzunegui
Copy link
Author

When I type assert on that value, do I have to type assert back to an ro-qualified type?

Yes, you must assert to the correct of T or ro T. I guess you could allow a non-ro to be type asserted as a ro (effectively doing assertion, then type conversion to ro), but I don't see much gain in that

@JavierZunzunegui
Copy link
Author

How does this work with the reflect package?

Have you got a more specific question with regards to reflect?

I have tried to explain in the doc, particularly in Interface type assertion and ro, how the ro-ness is represented in an interface value so that the runtime may honor the right ro-ness of a type (i.e. never un-ro it!).

The reflect package, not explicitly mentioned in the doc, would have to be updated to preserve this ro guarantees - conceivably making ro a bool field in Type, Value.CanSet false for ro types, etc.

@JavierZunzunegui
Copy link
Author

As you say, this is quite similar to #22876. Is there a reason to discuss it separately?

I still think there are still significant differences between the two, particularly my proposal blurs in some regards the distinction between ro and non-ro types to avoid code duplication and const-poisoning, and introduces immutability. Since const-poisoning in particular seems to be one of the mayor critiques to the original document, I hope this may qualify for renewed debate.

I think Jonathan himself states his proposal is more than anything a stepping stone towards the right direction. In this one I try to build on it and address the issues the community saw with his.

@ianlancetaylor
Copy link
Member

Can you expand on how this proposal avoids const-poisoning? It seems to me that it is necessary to use the ro qualifier everywhere. Where can it be avoided?

@JavierZunzunegui
Copy link
Author

Can you expand on how this proposal avoids const-poisoning? It seems to me that it is necessary to use the ro qualifier everywhere. Where can it be avoided?

Functions can be written with the most restrictive inputs with regards to ro-ness at no cost. For example, if a function Foo takes in a X but doesn't modify or have it escape the function, it can declare that input as ro X, making the intention clearer and, potentially, benefiting from immutability optimisations. When it comes to using Foo, it may be used interchangeably as either a function with input X or ro X. The same principle applies to methods and interface implementations, and similarly but inversely (the flexibility is in the interface, not the implementation) for ro in return values.

@JavierZunzunegui
Copy link
Author

Following my latest edit to the proposal (non-ro interfaces are implemented by ro types more leniently, with no restriction on the part of the interface and requiring the type to have ro receivers for the interface's methods), it may be worth to explicitly state the purpose of a ro receiver within an interface and a ro interface in general.

A ro receiver in an interface allows the method to continue to be called on the ro interface. This becomes non-trivial when the interface is part of a structure which has in turn been made ro and still needs the interface method be called. For example:

type FooIface interface {
  Bar()
  ro RoBar()
}

type FooType struct {
  i FooIface
}

var x = FooType{i: ...}
x.i.Bar() // legal
x.i.RoBar() // legal

var rx = ro x
rx.i.Bar() // illegal, Bar is not `ro` in `FooIface`
rx.i.RoBar() // legal, RoBar is `ro` in `FooIface`

Additionally, ro interfaces remain significant to make full use of the immutability features.

@deanveloper
Copy link

deanveloper commented Jan 10, 2019

When it comes to using Foo, it may be used interchangeably as either a function with input X or ro X. The same principle applies to methods and interface implementations, and similarly but inversely (the flexibility is in the interface, not the implementation) for ro in return values.

This does not fix the const poisoning problem. The problem comes from if I have a function that takes a ro parameter, then everything that the parameter is passed in to must also be qualified with ro. This means that I now must traverse all of my functions and make sure that they all have the proper ro qualifiers.

This also becomes a problem when using libraries, especially ones that were written in Go 1. I would not be able to use my ro types with old libraries because we cannot guarantee that the old library will treat it as a read-only.

@JavierZunzunegui
Copy link
Author

This does not fix the const poisoning problem

I don't think the issue you are describing is const poisoning. By limiting poisoning I mean that you are free to write your function with or without ro arguments, the caller may still call it without ro (and therefore your migration will not break callers). In that sense you are not poisoning the stack on top of your function.

What you are saying is that to migrate your function to ro in the first place you need dependencies down the stack to be ro, which is correct. By that isn't const poisoning, it just means you can't migrate all your code to ro on day one, it is a process thats starts at the lowest level of the stack and makes its way up. But throughout this process or even after it is complete, existing pre-ro code will still work, even if calling ro enabled functions (assuming we don't make deliberate breaking changes, which I'd imagine will be some).

See the Migration section on the proposal.

@deanveloper
Copy link

From this blog post (which is coincidentally written by Ian):

Although const qualified pointers are just documentation, in practice they are used far more often than const variables. When the C90 standard came out and C compilers started supporting const, programmers spoke of const-poisoning: the feeling that once you use the const qualifier anywhere, it spread throughout your program as it had to be tracked through all assignments and function calls to avoid compiler warnings. Adding const-poisoning to a program does not make it any more correct or reliable. It’s just documentation, albeit documentation that the compiler enforces.

What this is describing is that once you want to use it in a single place, you need to use it everywhere. We are already starting to see the same thing with "context poisoning", where we see context.Context being used everywhere. Again, this is because once you use it in one place, you need to use it everywhere down the stack. Also, for migration, I'd hate to see even more variations of every function (with Foo, FooContext, and FooRoContext).

If you want to mitigate const-poisoning, you need to offer some way to pass the data from a variable that was originally ro to a function which takes a non-ro type. This of course sacrifices immutability, but it prevents the poisoning problem.

@JavierZunzunegui
Copy link
Author

@deanveloper I still think you are seeing poisoning in the wrong way. "context poisoning" is very real because it requires the changes to be made up the stack - the context must be passed to your function, which requires the caller to require it from its caller, etc. all the way up. The context case does not require the propagation down the stack (you can stop passing the context to lower functions when it is no longer needed). ro is opposite, the caller isn't affected, but the called methods are (recursively). This is a much smaller problem: it can be done gradually, does not require methods to be repeated and once building you don't have to worry about what code that depends on your function has been broken.

I'd hate to see even more variations of every function (with Foo, FooContext, and FooRoContext)

Following the point above, Foo could be given ro arguments and still remain named Foo and be used in the same way as it was before ro was introduced. Only when the return value is made ro is RoFoo (or a breaking change) required.

@deanveloper
Copy link

deanveloper commented Jan 10, 2019

My bad, when I referred to context poisoning you are correct that context traverses up the stack. But in order to make full use of context, it must also traverse down the stack as far as it can if your API was not originally designed to support it.

This is a much smaller problem: it can be done gradually, does not require methods to be repeated and once building you don't have to worry about what code that depends on your function has been broken.

It depends on your perspective here. In terms of migration, it cannot be done gradually. If I want to update my library to use ro, I need to update my entire library at once. Not to mention breaking changes if I need to return ro types.

On a separate note - we'd also want to update the standard library interfaces, yes? Something like type Writer interface { Write(ro []byte) (int, error) } would be preferable, but we cannot do this because we'd break all implementations of Writer. This is mentioned in your Migration section but seems to be glossed over in less than a sentence.

Other things we'd want to update but can't:

type Stringer interface { ro String() string }
type Formatter interface { ro Format(...) (...) }
// Other formatting interfaces...

type WriterTo interface { WriteTo(Writer, ro []byte) (int, error) }
// Other IO interfaces...

There really are a lot of interfaces that should be moved over, but can't be for backward compatability reasons. I think it's a mistake to simply gloss over this.

@JavierZunzunegui
Copy link
Author

In terms of migration, it cannot be done gradually.

It can be done somewhat gradually - update implementations before interfaces, update only some methods first, update dependency libraries before your library itself... No challenge on the ro return comment though, that is a breaking change.

we'd also want to update the standard library interfaces, yes?

Yes, and your examples are valid. Short answer is we would, over some time, introduce all these as breaking changes (go2 surely has some tolerance to breaking changes!). Again, it doesn't have to be done in one massive, all-breaking change but can be done gradually - say Stringer first, then Writer, etc. This must also be managed carefully to allow for easy gradual changes, for example move all implementations before the interface is changed and enforce it via linter/vet or by having both forms of the interface at one, separated by a build tag, to encourage users to guarantee they are ready for the move before it happens. But ultimately they are breaking changes and unmaintained go1 code would cease to build in go2.

Also worth mentioning I think another important interface to update would be error:

type error interface {
  ro Error() string // ro receiver
}

@ianlancetaylor
Copy link
Member

I do still feel that this approach is vulnerable to const-poisoning. One particular aspect of const-poisoning is that in practice, as the program changes, you sometimes need to change a value that was marked as ro. Then you need to remove the ro, and remove it from all callers, and all their callers, etc. Experience teaches us that if the type system has qualifiers, there will be times when those qualifiers have to be removed. This is a problem with type qualifiers in general, and is part of the reason why Go has only one type qualifier (the direction of a channel).

@JavierZunzunegui
Copy link
Author

Then you need to remove the ro, and remove it from all callers, and all their callers, etc.

Yes, but then again removing ro from an argument is a breaking change, it's effects (and causes) are not that different from say adding a new argument which may also need callers modified to take the extra argument, etc. The natural way to mitigate this is developing and promoting good ro practices - for example, don't add ro to interfaces or public methods carelessly just because they happen to not write yet, but OK in private methods. While ro can be misused (like const, final, ... in other languages) the go team and community can educate and encourage it's proper use.

Writing bad code or carelessly designed libraries is a bigger problem than read-only alone. My personal opinion is almost any feature can potentially be mis-used, limiting a language by that principle would leave us with a no features at all.

Also worth highlighting again the potential significance of read-only (not this proposal in particular) towards bringing about other higher level features in the future (such as data-race freedom as @jba argues in his document).

@ianlancetaylor
Copy link
Member

To summarize, I'm saying that practical experience shows that type qualifiers are painful in large programs. You are saying that people should adopt better programming practices. I think we are both right. But as a general rule the development of the Go language is guided more by practical experience than by aspirational hopes.

@JavierZunzunegui
Copy link
Author

But as a general rule the development of the Go language is guided more by practical experience than by aspirational hopes.

True, but perhaps I should have clarified what I mean by promoting good practices. If good ro practices are built into the tooling (into vet or some other similar tool), that would reduce the likelihood of the ro feature being misused in a harmful way (such as the poisoning example you presented). I think practical experience of tooling in go is positive (fmt, vet, lint, goimports are almost universal, right?), so in that sense we can be hopeful that, if protected by tooling, ro will not be misused often.

As a hypothetical example, the tool could discourage the ro qualifiers with least impact but most likely to cause conflicts down the line. For example all ro receiver interfaces, ro arguments with non-ro methods, ro arguments which escape the current package, etc. could cause a warning, suppressed by some special comment, i.e. // ro-vet: some explanation of why you are overruling the warning.

@JavierZunzunegui
Copy link
Author

To summarize myself, acknowledging the risks of ro (or any read-only proposal for that matter) being misused is sensible, but should perhaps be taken as a motivation to think ahead and take some risk mitigation measures rather than as cause for proposal dismissal.

@ianlancetaylor
Copy link
Member

Earlier I asked whether this could just be discussed with #22876. You suggested that this proposal was better than that one with regard to const-poisoning. Why is that? To me they same approximately the same in that regard.

@JavierZunzunegui
Copy link
Author

  • Same: removing ro from a function's argument breaks usages where the value is already ro
func foo() {
  x := ro []int{}
  bar(x)
}

func bar(ro []int) {...}  // removing ro here breaks under both proposals
  • Better: adding ro to a function's argument breaks nothing
func foo() {
  x := []int{} // non-ro
  func(f func([]int) {
    f(x)
  }(bar)
}

func bar([]int) {...}  // adding ro here is OK in this proposal, not in the other

// similarly (and probably more relevant) for interface implementation

From a poisoning perspective I think the difference can be summarized as:
#22876 assumes implicit conversion to ro for variables, this one also implicit conversion (to non-ro) of function arguments and to ro for return values.

@ianlancetaylor
Copy link
Member

Would it be correct to say that in this proposal there is an implicit conversion from func(ro T) to func(T), for any type T?

That seems like a modification that could be discussed in the context of #22876. It is an improvement but it doesn't seem very different.

@jba
Copy link
Contributor

jba commented Jan 15, 2019

If you drop permission genericity, how do you handle the problem it addresses? For example, how would you write

func tail(x ro? []int) ro? []int { return x[1:] }

@JavierZunzunegui
Copy link
Author

there is an implicit conversion from func(ro T) to func(T)

Yes, also from func() T to func() ro T and most significantly for interface implementation:

type Foo interface {
  Bar(T)
}

type foo struct {}
func (f foo) Bar(ro T) {...}

var _ Foo = foo{} // foo implements Foo despite not matching in ro

That seems like a modification that could be discussed in the context of #22876

Sure. These changes are the main distinction between the proposals, of course if they are added to the original they can be discussed there. I discuss these in more detail in relaxed typing rules for ro qualifiers. I think the distinction around interfaces and type assertion is small but relevant.
Immutability is unrelated but can be applied to both proposals regardless (and in fact I already posted those ideas in #22876 (comment)).

@JavierZunzunegui
Copy link
Author

If you drop permission genericity, how do you handle the problem it addresses?

With two separate methods (tail(x []int) []int and roTail(x ro []int) ro []int). It is unfortunate duplicity, but I think preferable..

If we had to write two variants of every function like this, the benefits of read-only types would be outweighed by the pain they cause.

I don't think these situations are common. ro? is only really relevant if the variable is returned as a ro?

  • how likely (or sensible) are functions func(ro? T), func(ro? T) T or func(ro? T) ro T? Why would the logic of these functions be different if the input is ro or not? It seems to me to distinguish the two is a code smell, and ro? is enabling it.
  • functions following the ro? return pattern, i.e. func(ro? T) ro? T are mainly getters or small functions like the example one. Writing them twice is not a big deal in exchange for clarity and simplicity.

Functions as variables with ro? are problematic. I discuss it in #22876 (comment), but x := tail means x is type func(ro? []int) ro? []int. That means ro? is a full type (not simply a keyword to help developers create 2+ functions at once), and then there are ro??, ro???, etc. which make that even more complex.

Also as you have already highlighted in Subsumed by Generics, ro? would not be needed in generics existed. Since the hope exists go will one day have generics it seems problematic down the line to introduce ro?.

On the more opinionated side, I see the purpose you want to address with ro? but find the feature too intrusive for little gain. If writing a few methods twice is that inconvenient we can always create a simple code generation tool that effectively does the same thing without changing the whole language (with separate names though, no genericity).

@jba
Copy link
Contributor

jba commented Jan 16, 2019

I think the need for ro? is greater than you think. The following functions from the bytes package would need it:

  • Fields, FieldsFunc
  • Split, SplitAfter, SplitAfterN, SplitN
  • TrimXXX (9 functions)

These are not "mainly getters or small functions".

@JavierZunzunegui
Copy link
Author

The above can do without ro? just fine:

// common private auxiliar method
func trim(s ro []byte, cutset string) (int, int) {...}

// exactly as before
func Trim(s []byte, cutset string) []byte {from, to := trim(s, cutset); return s[from:to]}

// We should agree on a standard for ro-version of such functions - suffixed by 'Ro' for example
func TrimRo(s ro []byte, cutset string) ro []byte {from, to := trim(s, cutset); return s[from:to]}

SplitXXand FieldsXX functions can be done similarly with some 'next' loop.

While of course we have new functions with separate names, I actually find this positive - go is very explicit and not introducing genericity (even in this very limited form) keeps the language easy to understand.

These are not "mainly getters or small functions".

They can be seen as small functions wrapping a larger, private, common function.

Either way, for the few situations when there is no easy implementable common function to call or perhaps performance-wise a different approach is preferred, a simple code generation tool should be developed that can easily generate 2 versions out of one.

Does this not make ro? an very niche and arbitrary feature?

@deanveloper
Copy link

deanveloper commented Jan 16, 2019

It's also noteworthy that ro? is useless if the generics draft is adopted

@ianlancetaylor
Copy link
Member

I'm going to go ahead and close this in favor of #22876. While this one is different, the differences are small, and we can discuss the ideas here on that issue.

@golang golang locked and limited conversation to collaborators Jan 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge Proposal v2 An incompatible library change
Projects
None yet
Development

No branches or pull requests

7 participants