diff --git a/proposals/p2006.md b/proposals/p2006.md index 5a9c194a8f2ec..63843c232f97c 100644 --- a/proposals/p2006.md +++ b/proposals/p2006.md @@ -13,52 +13,1155 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception ## Table of contents - [Problem](#problem) -- [Background](#background) -- [Proposal](#proposal) -- [Details](#details) -- [Rationale](#rationale) + - [Conceptual integrity between local variables and parameters](#conceptual-integrity-between-local-variables-and-parameters) +- [Proposal summary](#proposal-summary) +- [Background, context, and use cases from C++](#background-context-and-use-cases-from-c) + - [`const` references versus `const` itself](#const-references-versus-const-itself) + - [Pointers](#pointers) + - [References](#references) + - [Special but critical case of `const T&`](#special-but-critical-case-of-const-t) + - [R-value references and forwarding references](#r-value-references-and-forwarding-references) + - [Mutable operands to user-defined operators](#mutable-operands-to-user-defined-operators) + - [User-defined dereference and indexed access syntax](#user-defined-dereference-and-indexed-access-syntax) + - [Member and subobject accessors](#member-and-subobject-accessors) + - [Non-null pointers](#non-null-pointers) + - [Syntax-free dereference](#syntax-free-dereference) +- [Detailed proposal](#detailed-proposal) + - [Immutable views of values](#immutable-views-of-values) + - [Function parameters](#function-parameters) + - [Immutability and addresses](#immutability-and-addresses) + - [Copies](#copies) + - [Comparison to C++ parameters](#comparison-to-c-parameters) + - [Variables introduce L-Values with mutable storage](#variables-introduce-l-values-with-mutable-storage) + - [Pointers for indirect, potentially mutating objects](#pointers-for-indirect-potentially-mutating-objects) + - [Operators overloading through implementing interfaces](#operators-overloading-through-implementing-interfaces) + - [User defined pointer-like types and dereference operations](#user-defined-pointer-like-types-and-dereference-operations) + - [Indexed access syntax](#indexed-access-syntax) + - [Ephemeral sequences](#ephemeral-sequences) + - [Return values and accessors](#return-values-and-accessors) + - [`const` and thread safe interface distinction](#const-and-thread-safe-interface-distinction) + - [Lifetime overloading](#lifetime-overloading) + - [Syntax-free dereference](#syntax-free-dereference-1) +- [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals) - [Alternatives considered](#alternatives-considered) + - [Immutable value escape hatch](#immutable-value-escape-hatch) + - [References in addition to pointers](#references-in-addition-to-pointers) + - [Automatic dereferencing](#automatic-dereferencing) + - [Exclusively using references](#exclusively-using-references) ## Problem -TODO: What problem are you trying to solve? How important is that problem? Who -is impacted by it? +There are a collection of intertwined design questions around declaring +variables, function parameters, providing functionality similar to `const` in +C++, pointers, and references. Examining these in isolation is difficult because +of their complex interactions. -## Background +While this document includes a specific proposal, it importantly includes laying +out the motivations for that proposal. The motivation includes both the use +cases and tradeoffs between those use cases, as well as challenges of some of +the alternatives. -TODO: Is there any background that readers should consider to fully understand -this problem and your approach to solving it? +### Conceptual integrity between local variables and parameters -## Proposal +Two of the most fundamental refactorings in software engineering are _inlining_ +and _outlining_ of regions of code. These operations introduce or collapse one +of the most basic abstraction boundaries in the language: functions. When +performing this refactoring, there is a need to translate between local +variables and parameters in both directions. In order to ensure these +translations are unsurprising and don't face significant expressive gaps or +behavioral differences, it is important to have strong semantic consistency +between local variables and function parameters. While there are some places +that these need to differ, there should be a strong overlap of the core +facilities, design, and behavior. -TODO: Briefly and at a high level, how do you propose to solve the problem? Why -will that in fact solve it? +## Proposal summary -## Details +A high-level overview of this proposal: -TODO: Fully explain the details of the proposed solution. +- Provide a dedicated, efficient, and simple default model for immutable + function input parameters, local variables. + - For example, `fn Print(x: i32);` rather than either + `void Print(const int &x);` or `void Print(const int x);`. +- Make variables that form [L-Values](https://en.wikipedia.org/wiki/L-value) + and mutable storage explicit and available even as function parameters. +- Use pointers for indirect access, including output or in/out function + parameters. + - L-Values exist outside the type system and are formed by local variables + or dereferencing a pointer. +- Define operators and other expression syntaxes in terms of a syntactic + rewrite into interface member function calls. + - For example: `x + y` becomes `x.(Addable(typeof y).Add)(y)`. +- Use something like an + [extending adaptor](/docs/design/generics/details.md#extending-adapter) to + model a `const` type with a subset of the type's interface, for example with + thread-compatible APIs. +- Don't provide overloading on the lifetime of arguments (for now). +- Don't provide syntax-free dereferencing (for now). -## Rationale +The goal is to address a broad range of use cases where C++ leverages references +of various forms with a simpler and better set of primitives in Carbon. Some key +motivating points for this approach: -TODO: How does this proposal effectively advance Carbon's goals? Rather than -re-stating the full motivation, this should connect that motivation back to -Carbon's stated goals and principles. This may evolve during review. Use links -to appropriate sections of [`/docs/project/goals.md`](/docs/project/goals.md), -and/or to documents in [`/docs/project/principles`](/docs/project/principles). -For example: +- Having a single, simple, and obvious way to form input parameters to + functions and local immutable values removes a false choice and both + syntactic and performance overhead compared to C++. +- Unifying on pointers rather than including references, and especially + multiple forms of references, significantly simplifies the type system. + - This is especially important as we expect future development of safety + features such as lifetime tracking to require growing complexity in the + same space, potentially _multiplicatively_ rather than _additively_. +- Syntactically rewriting operators and expression syntaxes into interface + member function calls leverages a more principled open extension design + pattern compared to operator overloading. + - It also helps reduce the need for + [ADL](https://en.cppreference.com/w/cpp/language/adl), a notoriously + complex part of C++. +- Using interfaces and facet types to model interface subsetting like `const` + from C++ gives a more general and consistent tool. + - For example, it will avoid entirely custom type conversion rules for + `const` that cannot be composed with library designs. -- [Community and culture](/docs/project/goals.md#community-and-culture) -- [Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem) -- [Performance-critical software](/docs/project/goals.md#performance-critical-software) -- [Software and language evolution](/docs/project/goals.md#software-and-language-evolution) -- [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write) -- [Practical safety and testing mechanisms](/docs/project/goals.md#practical-safety-and-testing-mechanisms) -- [Fast and scalable development](/docs/project/goals.md#fast-and-scalable-development) -- [Modern OS platforms, hardware architectures, and environments](/docs/project/goals.md#modern-os-platforms-hardware-architectures-and-environments) -- [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code) +## Background, context, and use cases from C++ + +The primary goal of this proposal is to address the use cases and needs of C++ +code as it exists, while reducing the complexity and primitives used to create +evolutionary space for safe primitives to be added. + +This simplification is achieved by suggested a significantly different set of +core building blocks compared to C++, and this divergence is something that +should be clearly understood. However, the design tries to minimize changes the +overarching patterns that emerge currently in C++ API design. The different +building blocks being used hopefully simplify and address some of these API +design shortcomings, but don't require shifting to a completely new model. For +example, moving to a model of everything being a pointer to the heap similar to +how Java works, or moving parameters to default to lazy evaluation aren't the +goal of this proposal. + +Some secondary goals follow naturally from the overarching goals of Carbon: +performance, ease of understanding, and ease of implementation. + +Let's start by deeply examining C++'s fundamental facilities in this space +involving `const` qualification, references (including R-value references), and +pointers. The first step is to understand the practical use cases these tools +address, as well as any shortcomings or gaps that should be addressed by Carbon. + +### `const` references versus `const` itself + +C++ provides overlapping but importantly separable semantic models which +interact with `const` references. + +1. An _immutable view_ of a value +2. A _thread-safe interface_ of a + [thread-compatible type](https://abseil.io/blog/20180531-regular-types#:~:text=restrictions%20or%20both.-,Thread-compatible,-%3A%20No%20concurrent%20call) + +Some examples of the immutable view use case are provided below. These include +`const` reference parameters and locals, as well as `const` declared local and +static objects. + +``` +void SomeFunction(const int &id) { + // Here `id` is an immutable view of some value provided by the caller. +} + +void OtherFunction(...) { + // ... + + const int &other_id = ; + + // Cannot mutate `other_id` here either, it is just a view of the result of + // `` above. But we can pass it along to another + // function accepting an immutable view: + OtherFunction(other_id); + + // We can also pass ephemeral values: + OtherFunction(other_id + 2); + + // Or values that may be backed by read-only memory: + static const int fixed_id = 42; + OtherFunction(fixed_id); +} +``` + +The _immutable view_ `id` in `SomeFunction` can be thought of as requiring that +the semantics of the program be exactly the same whether it is implemented in +terms of a view of the initializing expression or a copy of that value, perhaps +in a register. + +The implications of the semantic equivalence help illustrate the requirements: + +- The input value must not change while the view is visible, or else a copy + would hide those changes. +- The view must not be used to mutate the value, or those mutations would be + lost if made to a copy. +- The identity of the object must not be relevant, or else inspection of its + address would reveal whether a copy was used. + +Put differently, these restrictions makes a copy valid under the +[as-if rule](https://en.cppreference.com/w/cpp/language/as_if). + +The _thread-safe interface_ use case is the more prevalent use of `const` in +APIs. It is most commonly seen with code that looks like: + +``` +class MyThreadCompatibleType { + public: + // ... + + int Size() const { return size; } + +private: + int size; + + // ... +}; + +void SomeFunction(const MyThreadCompatibleType *thing) { + // .... + + // Users can expect calls to `Size` here to be correct even if running + // on multiple threads with a shared `thing`. + int thing_size = thing->Size(); + + // ... +} +``` + +The first can seem like a subset of the second, but this isn't really true. +There are cases where `const` works for the first use case but doesn't work well +for thread-safety: + +``` +void SomeFunction(...) { + // ... + + const std::unique_ptr data = ComputeBigData(); + + // We never want to release or re-allocate `data` and `const` + // makes sure that doesn't happen. But the actual data is + // completely mutable! + // ... +} +``` + +These two use cases can also lead to tension between shallow const and deep +const: + +- Immutability use cases will tend towards shallow(-er) const, like pointers. +- Thread safety use cases will tend towards deep(-er) const. + +### Pointers + +The core of C++'s indirect access to an object stored somewhere else comes from +C and its lineage of explicit pointer types. These create an unambiguous +separate layer between the pointer object and the pointee object, and introduce +dereference syntax (both the unary `*` operator and the `->` operator). + +C++ makes an important extension to this model to represent _smart pointers_ by +allowing the dereference operators to be overloaded. This can be seen across a +wide range of APIs such as `std::unique_ptr`, `std::shared_ptr`, +`std::weak_ptr`, etc. These user-defined types preserve a fundamental property +of C++ pointers: the separation between the pointer object and the pointee +object. + +The distinction between pointer and pointee is made syntactically explicit in +C++ both when _dereferencing_ a pointer, and when _forming_ the pointer or +taking an object's address. These two sides can be best illustrated when +pointers are used for function parameters. The caller code must explicitly take +the address of an object to pass it to the function, and the callee code must +explicitly dereference the pointer to access the caller-provided object. + +### References + +C++ provides for indirection _without_ the syntactic separation of pointers: +references. Because a reference provides no syntactic distinction between the +reference and the referenced object--that is their point!--it is impossible to +refer to the reference itself in C++. This creates a number of restrictions on +their design: + +- They _must_ be initialized when declared +- They cannot be rebound or unbound. +- Their address cannot be taken. + +References were introduced originally to enable operator overloading, but have +been extended repeatedly and as a consequence fill a wide range of use cases. +Separating these and understanding them is essential to forming a cohesive +proposal for Carbon -- that is the focus of the rest of our analysis of +references here. + +#### Special but critical case of `const T&` + +As mentioned above, one form of reference in C++ has unique properties: +`const T&` for some type `T`, or a _`const` reference_. The primary use for +these is also the one that motivates its unique properties: a zero-copy way to +provide an input function parameter without requiring the syntactic distinction +in the caller and callee needed when using a pointer. The intent is to safely +emulate passing by-value without the cost of copying. Provided the usage is +immutable, this emulation can safely be done with a reference and so a `const` +reference fits the bill here. + +However, to make zero-copy, pass-by-value to work in practice, it must be +possible to pass a _temporary_ object. That works well with by-value parameters +after all. To make this work, C++ allows a `const` reference to bind to a +temporary. However, the rules for parameters and locals are the same in C++ and +so this would create serious lifetime bugs. This is fixed in C++ by applying +_lifetime extension_ to the temporary. The result is that `const` references are +quite different from other references, but they are also quite useful: they are +the primary tool used to fill the _immutable view_ use case of `const`. + +One significant disadvantage of `const` references is that they are observably +still references. When used in function parameters, they cannot be implemented +with in-register parameters, etc. This complicates the selection of readonly +input parameter type for functions, as both using a `const` reference and a +by-value parameter force a particular form of overhead. Similarly, range based +`for` loops in C++ have to choose between a reference or value type when each +would be preferable in different situations. + +#### R-value references and forwarding references + +Another special set of use cases for references are R-value and forwarding +references. These are used to capture _lifetime_ information in the type system +in addition to binding a reference. By doing so, they can allow overload +resolution to select C++'s _move semantics_ when appropriate for operations. + +The primary use case for move semantics in function boundaries was to model +_consuming input_ parameters. Because move semantics were being added to an +existing language and ecosystem that had evolved exclusively using copies, +modeling consumption by moving into a by-value parameter would have forced an +eager and potentially expensive copy in many cases. Adding R-value reference +parameters and overloading on them allowed code to gracefully degrade in the +absence of move semantics -- their internal implementation could minimally copy +anything non-movable. These overloads also helped reduce the total number of +moves by avoiding moving first into the parameter and then out of the parameter. +This kind of micro-optimization of moves was seen as important because some +interesting data structures, especially in the face of exception safety +guarantees, either implemented moves as copies or in ways that required +non-trivial work like memory allocation. + +Using R-value references and overloading also provided a minor benefit to C++: +the lowest-level mechanics of move semantics such as move construction and +assignment easily fit into the function overloading model that already existed +for these special member functions. + +These special member functions are just a special case of a more general pattern +enabled by R-value references: designing interfaces that use _lifetime +overloading_ to _detect_ whether a move would be possible and change +implementation strategy based on how they are called. Both the move constructor +and the move-assignment operator in C++ work on this principle. However, other +use cases for this design pattern are so far rare. For example, Google's C++ +style +[forbids](https://google.github.io/styleguide/cppguide.html#Rvalue_references) +R-value references outside of an enumerated set of use cases, which has been +extended incrementally based on demonstrated need, and has now been stable for +some time. While overloading on lifetime is one of the allowed use cases, that +exemption was added almost four years after the initial exemption of move +constructors and move assignment operators. + +#### Mutable operands to user-defined operators + +C++ user-defined operators have their operands directly passed as parameters. +When these operators require _mutable operands_, references are used to avoid +the syntactic overhead and potential semantic confusion of taking their address +explicitly. This use case stems from the combined design decisions of having +operators that mutate their operands in-place and requiring the operand +expression to be directly passed as a normal function parameter. + +#### User-defined dereference and indexed access syntax + +C++ also allows user-defined operators that model dereference (or indirecting in +the C++ standard) and indexed access (`*` and `[]`). Because these operators +specifically model forming an L-value and because the return of the operator +definition is directly used as the expression, it is necessary to return a +reference to the already-dereferenced object. Returning a pointer would break +genericity with builtin pointers and arrays in addition to adding a very +significant syntactic overhead. + +#### Member and subobject accessors + +Another common use of references is in returns from member functions to provide +access to a member or subobject, whether const or mutable. This particular use +case is worth calling out specially as it has an interesting property: this is +often not a fully indirect access. Instead, it is often simply selecting a +particular member, field, or other subobject of the data structure. As a +consequence, making subsequent access transparent seems especially desirable. + +However, it is worth noting that this particular use case is also an especially +common source of lifetime bugs. A classic and pervasive example can be seen when +calling such a method on a temporary object. The returned reference is almost +immediately invalid. + +#### Non-null pointers + +A common reason for using mutable references outside of what has already been +described is to represent _non-null pointers_ with enforcement in the type +system. Because the canonical pointer types in C++ are allowed to be null, +systems that forbid a null in the type system use references to induce any null +checks to be as early as possible. This effects a +"[shift left](https://en.wikipedia.org/wiki/Shift-left_testing)" of handling +null pointers, both moving the error closer to its cause logically and +increasing the chance of moving earlier in the development process by making it +a static property enforced at compile time. + +References are imperfectly suited to modeling non-null pointers because they are +missing many of the fundamental properties of pointers such as being able to +rebind them, being able to take their address, etc. Also, references cannot be +safely made `const` in the same places that pointers can because that might +unintentionally change their semantics by allowing temporaries or extending +lifetimes. + +#### Syntax-free dereference + +Beyond serving as a non-null pointer, the other broad use case for references is +to remove the syntactic overhead of taking an address and dereferencing +pointers. In other words, they provide a way to have _syntax-free dereferences_. +Outside of function parameters, removing this distinction may provide a +genericity benefit, as it allows using the same syntax as would be used with +non-references. In theory code could simply use pointers everywhere, but this +would add syntactic overhead compared to local variables and references. For +immutable code, the syntactic overhead seems unnecessary and unhelpful. However, +having distinct syntax for _mutable_ iteration, container access, and so on +often makes code more readable. + +There are several cases that have come up in the design of common data +structures where the use of distinct syntaxes immutable and mutable operations +provides clear benefit: copy-on-write containers where the costs are +dramatically different, and associative containers which need to distinguish +between looking up an element and inserting an element. This tension should be +reflected in how we design _indexed access syntax_. + +Using mutable references for parameters to reduce syntactic overhead also +doesn't seem particularly compelling. For passing parameters, the caller syntax +seems to provide significant benefit to readability. When using _non-local_ +objects in expressions, the fact that there is a genuine indirection into memory +seems to also have high value to readability. These syntactic differences do +make inline code and outlined code look different, but that reflects a behavior +difference in this case. + +## Detailed proposal + +Carbon should have a set of primitives, as small as possible, to address the +compelling use cases from C++ outlined above. For each use case that we examine, +we use a label you can search for above to describe it: + +- _immutable views_ +- _consuming input_ +- _non-null pointers_ +- _mutable operands_ +- _smart pointers_ +- _user-defined dereference_ +- _member and subobject accessors_ +- _indexed access syntax_ +- _thread-safe interfaces_ +- _lifetime overloading_ +- _syntax-free dereference_ + +The resulting primitives should directly and effectively address each use case, +not merely replicate the various patterns present in C++. + +The only use case that this proposal specifically doesn't address is +_syntax-free dereference_. Instead, we suggest it shouldn't be a priority for +Carbon, especially in interface boundaries. Some ideas for steps that could be +taken in the future here are mentioned at the end. + +### Immutable views of values + +The default and most primitive construct for both locals and parameters in +Carbon should be immutable views of values, or just _values_ in Carbon. These +should cover the exact intended use cases of `const` reference parameters (and +locals) in C++, but should improve on them by enabling in-register parameter +passing and other relevant optimizations. This doesn't replace the broader and +more fundamental use of `const` for +[thread-safe interface subsets](#const-and-thread-safe-interface-distinction). +So the C++ code: + +``` +const int count = 42; +``` + +Would become: + +``` +let count: i32 = 42; +``` + +**Syntax alternatives not currently proposed:** + +- It is appealing to consider the `val` introducer instead of `let` given that + these are expected to be called "values". However, there is some concern + over the visual similarity of `val` and `var`, and if pronounced as written + instead of as "value" ar "variable", the difference relies on a subtle + consonant distinction not present in some widely used spoken language. That + said, it is worth investigating whether there is a real user preference + between these introducers. Kotlin even has both `val` and `var` with a + similar distinction and so might serve as the basis for investigating this. + For now, using the less controversial spelling is proposed as this is an + easy thing to change later. +- Another potential introducer would be `const`. However the semantics would + be subtly different even if the use cases would be similar, which might make + reuse of the keyword more confusing than helpful. Also, it is longer and + expected to be extremely common. + +#### Function parameters + +Most function parameters in C++ are immutable, and passed as either a `const` +reference to avoid copies or by-value to avoid the indirection of an explicit +reference. When C++ code passes by-value in these cases users often omit `const` +even when they could use it, the same way they do for local variables -- C++'s +requirement to remember an extra qualifier ends up with code failing to document +where mutability is required. Carbon code is expected to have the same desired +default, but we want to provide a single, easier default option, especially for +function parameters. As a consequence, Carbon parameters should be defined the +same way as `let` above. The C++ functions: + +``` +auto LogSize(const Container &large_data) { + Logger.Print(large_data.Size()); +} + +auto Sum(const int x, const int y) -> int { + return x + y; +} +``` + +would become Carbon code such as: + +``` +fn LogSize(large_data: Container) { + Logger.Print(large_data.Size()); +} + +fn Sum(x: i32, y: i32) -> i32 { + return x + y; +} +``` + +#### Immutability and addresses + +It is a programming error to mutate such a value or to take its address -- +indeed, it might not _have_ an address and it might be implemented with readonly +memory. These restrictions are stronger than `const` references in C++ +specifically to allow important implementation strategies such as automatically +passing parameters in registers. However, because the restrictions include not +taking the address of these values, safety can be enforced at compile time. + +**Open question:** It may be necessary to provide some amount of escape hatch +for taking the address of values. It seems likely that at least C++ interop will +notionally take their address to form `this` pointers and call methods. This +proposal suggests adding any such escape hatch as narrowly as possible and only +when needed. Ideally, only interop will ever need it. + +If a further escape hatch is needed, this kind of fundamental weakening of the +semantic model of Carbon would be a good case for some syntactic marker like +Rust's `unsafe`, although rather than a region, it would seem better to tie it +directly to the operation in question. For example: + +``` +struct S { + fn ImmutableMemberFunction[me: Self](); + fn MutableMemberFunction[addr me: Self*](); +} + +fn F(immutable_s: S) { + // This is fine. + immutable_s.ImmutableMemberFunction(); + + // This requires an unsafe marker in the syntax. + immutable_s.unsafe MutableMemberFunction(); +} +``` + +Again, this proposal doesn't suggest adding this unsafe escape hatch until a +specific use case is understood as well as the critical restrictions on how the +pointer is used are well understood and can be specified. + +#### Copies + +These values do not semantically introduce a new copy, and they support +_non-copyable_ and _non-movable_ types such as mutexes. However, they _allow_ +copies when the type is copyable. + +They can also be used with +[polymorphic types](/docs/design/classes.md#inheritance), for example: + +``` +base class MyBase { ... } + +fn UseBase(b: MyBase) { ... } + +class Derived extends MyBase { ... } + +fn PassDerived() { + var d: Derived = ...; + // Allowed to pass `d` here: + UseBase(d); +} +``` + +This is still allowed to create a copy, but it must not _slice_. Even if a copy +is created, it must be a `Derived` object, even though this may limit the +available implementation strategies. + +#### Comparison to C++ parameters + +While these are called _values_ in Carbon, they are not related to "by-value" +parameters as they exist in C++. C++ by-value parameters are semantically +defined to create a new local copy of the argument, although it may move into +this copy. + +Carbon's values are much closer to a `const` reference in C++ with extra +restrictions such as allowing copies under "as-if" in limited cases and +preventing taking the address. Combined, these allow implementation strategies +such as in-register parameters. + +### Variables introduce L-Values with mutable storage + +Within any immutable view value pattern such as described above, a sub-pattern +can be marked as a _variable_ that should get mutable storage and be an +[L-Value](). +Objects within that sub-pattern can have their address taken and are mutable. +They must be copied or moved into fully in order to support this, they cannot +merely refer to some external data used to initialize them. For example: + +``` +let (x: i64, var y: i64) = (1, 2); +``` + +This code will introduce a local `x` that merely names an immutable value, and a +variable `y` introduced by `var` which is mutable. The `y` variable also has a +meaningful address like non-const variables in C++. + +When the entire pattern would be a variable sub-pattern, the local declaration +can simply start with `var` for convenience: + +``` +var accumulate: i64 = 0; +``` + +**Syntactic alternatives not currently proposed:** + +- Rust uses the keyword `mut` for a similar distinction, and does not make + `let` optional. Carbon could adopt either aspect of those -- the keyword or + the requirement for `let` in every case. However, requiring two introducers + seems to force unnecessary ceremony into the language. The introducer `mut` + seems significantly less obvious to programmers familiar with any other + language, and isn't as obvious of an abbreviation as `var` is for + "variable". These issues are less pronounced when always following `let`, + but still present, especially nested within a pattern. + +Function parameter patterns have variable sub-patterns just as locals can, and +these parameters are local variables within the function. They work precisely +the same as a _non-const_ by-value parameter in C++, including the support of +_consuming input_ use cases. + +``` +fn RegisterName(var name: String) { + // `name` is a local L-value in this function and can be mutated. +} +``` + +This is extremely similar to the behavior of a C++ by-value function: + +``` +void RegisterName(std::string name) { + // Can move out of `name` here as well. +} +``` + +This does introduce a tradeoff compared to using R-value references because it +may result in multiple moves. In some cases this Carbon pattern will both have +to move into the parameter and out of the parameter: + +``` +class NamedEntity { + var name: String; + ... + + fn Create(var name: String) { + // This will move out of the `name` parameter, assuming a hypothetical + // move-operator spelled `~`. + return {.name = ~name}; + } +} + +fn F() { + var my_name: String = "Jane"; + + // This will move into the parameter first, and then into the member. + var my_entity: NamedEntity = NamedEntity.Create(~my_name); + // ... +} +``` + +Carbon should be able to address the challenges of having both of these moves +even better than C++ by: + +- Ensuring that moves are ubiquitously supported and don't regularly degrade + to copies. +- Insisting that moves be exceptionally cheap and easily optimized by avoiding + exception handling complexity and overhead. +- Improving on the ability to remove these extra moves by improving things + like member initialization semantics, etc. + +While there may be some extra moves retained, multiple C++ code bases have been +very successful at using by-value consumption APIs without performance overhead +and so this is not expected to be a practical problem for Carbon. + +### Pointers for indirect, potentially mutating objects + +Pointers provide an explicit and unambiguous model for this that is familiar and +widespread already. Carbon will have pointers with syntax closely matching C++. +The syntax for dereferencing a pointer will be a prefix unary operator `*`, +forming a pointer type is a postfix unary operator `*` applied to the pointee +type, and `&` can be used to take the address of an L-value like a local +variable: + +``` +var i: i32 = 42; +var p: i32* = &i; + +*p = 13; +``` + +This syntax is chosen specifically to remain as similar as possible to C++ +pointer types as they are commonly written in code. The different alternatives +and tradeoffs for this syntax issue were discussed extensively in +[#523](https://github.com/carbon-language/carbon-lang/issues/523). + +While variables can be _directly_ mutated, pointers are used to model _indirect_ +mutable objects. Because the pointer points to storage that may be modified +(even if not in another thread), the indirection is fundamental to the +semantics. Reading a copy of the pointed to value would be _incorrect_ in these +cases because the copy might be stale. Similarly writes through the pointer +should be reflected in changes to the storage that pointer points to. + +Pointer traversal sometimes has a noticeable cost. Having explicit pointers will +show developers in the codebase where they are explicitly traversing memory and +allow them to optimize them when necessary. + +However, pointers in Carbon should be restricted to avoid some of the most +pervasive and unsafe coding patterns of C and C++. First and foremost, they are +required to be non-null. Nullable pointers must be represented using an +explicitly separate type such as an optional wrapping a pointer. This allows the +type system to help ensure the points in code that can introduce a null pointer +are precisely modeled and enables significantly more reliable checking for +correct null tests prior to use of the pointer. It also follows the lead of most +major new languages, for example Rust, Swift, and Kotlin. + +Carbon pointers should also not support pointer arithmetic. Instead some +separate slicing and/or indexing system should be used to index memory in a way +that makes it easy to associate bounds and dimensionality for both better static +and dynamic bug detection. Such a system is left to a future proposal. + +These restricted pointers address specific use cases for references without the +challenges of being unable to refer to the pointer separate from the pointee, +and with obvious ways to rebind to implement things like copy and move. + +### Operators overloading through implementing [interfaces](/docs/design/generics/overview.md#interfaces) + +The original motivating use case for C++ references is operator overloading, +specifically with open extension to allow user defined types to work with +operator expression syntax. + +Instead of overloading, Carbon's proposed generics should be used to model the +open extension mechanism and implement the relevant semantics. Binary and unary +operators, as well as other similar syntactic constructs in the expression +grammar, should be semantically specified in terms of a set of rewrite rules +into normal (member) function calls through the relevant generic interface. + +These rewrite rules can also, in theory, take the address of objects where that +is valid and necessary, but it is expected instead that they simply rely on +member functions using `addr` to do this. Using `addr` in this way allows for +_mutable operands_ without requiring references by taking the address of L-value +operand expressions implicitly. + +The intended result of this design is that overloaded operators and other +expression syntax all ultimately decompose into member function calls (perhaps +through interface implementations). These member function calls in turn provide +the semantic rules rather than needing complex, custom semantics for operators +themselves. + +Example of how assignment and a difference operator that changes type might +work: + +``` +interface Assignable { + fn Assign[addr me: Self*](arg: Self); +} + +interface Subtractable { + let ResultType:! Type; + + fn Subtract[me: Self](rhs: Self) -> ResultType; +} + +class Point { + var x: f32; + var y: f32; + + impl as Assignable { + // How the member function is spelled is an orthogonal question. + fn Assign[addr me: Self*](arg: Point) { + me->x = arg.x; + me->y = arg.y; + } + } + + // Not sure this actually makes sense for points, but it's just an example... + impl as Subtractable { + let ResultType:! Type = f32; + + // Again, the names of methods and interfaces are just placeholders. + fn Subtract[me: Self](rhs: Self) -> f32 { + var x_delta: f32 = rhs.x - me.x; + var y_delta: f32 = rhs.y - me.y; + return Math.Sqrt(x_delta * x_delta + y_delta * y_delta); + } + } +} + +fn Example(a: Point, b: Point, dest: Point*) -> f32 { + if (...) { + // Rewritten to: (*dest).(Assignable.Assign)(a); + *dest = a; + } else { + // Rewritten to: (*dest).(Assignable.Assign)(b); + *dest = b; + } + + // Rewritten to: return a.(Subtractable.Subtract)(b); + return a - b; +} +``` + +Note that the rewrites use the proposed syntax for _qualified_ generic lookup, +ensuring it uses the implementation of the generic interface. Also, the above +rewrite rules imply specific order of operations and spelling of interfaces. +This is just an illustrative example and not meant to suggest how these specific +operators would necessarily work -- both assignment and subtraction are worth +careful design to understand how to structure their specific interface and +rewrite rules. + +### User defined pointer-like types and dereference operations + +Carbon should support user-defined pointer-like types such as _smart pointers_ +using a similar pattern as operator overloading or other expression syntax. That +is, it should rewrite the expression into a member function call on an +interface. Types can then implement this interface to expose pointer-like +_user-defined dereference_ syntax. + +The interface might look like: + +``` +interface Pointer { + let ValueT:! Type; + fn Dereference[me: Self]() -> ValueT*; +} +``` + +Here is an example using a hypothetical `TaggedPtr` that carries some extra +integer tag next to the pointer it emulates: + +``` +class TaggedPtr(T:! Type) { + var tag: Int32; + var ptr: T*; +} +external impl [T:! Type] TaggedPtr(T) as Pointer { + let ValueT:$ T; + fn Dereference[me: Self]() -> T* { return me.ptr; } +} + +fn Test(arg: TaggedPtr(T), dest: TaggedPtr(TaggedPtr(T))) { + **dest = *arg; + *dest = arg; +} +``` + +There is one tricky aspect of this. The function in the interface which +implements a pointer-like dereference must return a raw pointer which the +language then actually dereferences to form an L-value similar to that formed by +`var` declarations. This interface is implemented for normal pointers as a +no-op: + +``` +impl [T:! Type] T* as Pointer { + let ValueT:$ Type = T; + fn Dereference[me: Self]() -> T* { return me; } +} +``` + +Dereference expressions such as `*x` are syntactically rewritten to use this +interface to get a raw pointer and then that raw pointer is dereferenced. If we +imagine this language level dereference to form an L-value as a unary `deref` +operator, then `(*x)` becomes `(deref (x.(Pointer.Dereference)()))`. + +Carbon should also use a simple syntactic rewrite for implementing `x->Method()` +as `(*x).Method()` without separate or different customization. + +### Indexed access syntax + +Using indexing syntax like `my_array[index]` will use a similar pattern to user +defined pointer-like types. Specifically, this syntax will _always_ dereference +and form an L-value in Carbon. Much like with the user-defined pointer-like +types, this starts with an interface: + +``` +interface Indexable { + let IndexT:! Type; + let PointerT:! Pointer; + fn Index[addr me: Self*](index: IndexT) -> PointerT; +} +``` + +The `Index` function returns some form of pointer. Indexing expressions such as +`my_array[index]` are then rewritten to use this interface to index and then +dereference the resulting pointer to form an L-value: +`*(my_array.(Indexable.Index)(index))`. An example for a trivial imagined array +type: + +``` +class Array { + var storage: i32*; + var size: i32; + + impl as Indexable { + let IndexT:! i32; + let PointerT:! i32*; + + fn Index[addr me: Self*](index: i32) -> i32* { + // Need some way to index pointers, pretty sure we'd rather not + // use raw pointer arithmetic here. + return storage + index; + } + } +} +``` + +This doesn't directly support returning by value. Instead, it prioritizes +matching the dereference and L-value behavior of this syntax across both C and +C++ languages and standard libraries. + +#### Ephemeral sequences + +Constructs that might benefit from returning a value rather than a pointer are +involve indexing into ephemeral ranges or things like C++'s +[input iterators](https://en.cppreference.com/w/cpp/iterator/input_iterator). +Carbon will require these to return a pointer to an ephemeral (changed on next +index) element of storage inside the indexed sequence: + +``` +class EphemeralIntegers { + var tmp: i32; + + impl as Indexable { + let IndexT:! i32; + let PointerT:! i32*; + fn Index[addr me: Self*](index: i32) -> i32* { + me.tmp = index; + return &me.tmp; + } + } +} +``` + +This kind of ephemeral indexable structure is risky because indexing it mutates +internal storage. While a similar design pattern works in C++, other C++ +approaches for modeling this won't translate to Carbon: + +- An overload that returns by value doesn't work in Carbon because it would + fail to be generic. C++ code sometimes gets away with such an overload + because of templates, but it breaks genericity even in C++ and standard + library concepts generally preclude this approach already. + +- Return a proxy object rather than a reference that emulates the behavior of + a reference would translate to a proxy _pointer_ in Carbon. However, this + wouldn't work in Carbon because the `Pointer` interface above doesn't allow + dereferencing to return the address of a member as it takes `me` as an + immutable view of the value. This makes the calling convention more + efficient (allows passing in registers) at the expense of being able to + express proxy pointers in this manner. + +### Return values and accessors + +While the proposed model of immutable views works well for parameters, they +don't work well for return values currently. The lifetime of an argument +expression can be easily assured to cover the lifetime of the parameter within +the function body. There is no such easy assurance for return values. As a +consequence, this proposal suggests return values are copied initially for +safety and predictability. + +Carbon should explore a properties system to allow both read-only and write +_member and subject accessors_. Such a system could present L-values similar to +local variables that would allow assignment and other mutation to work directly +and without lifetime issues. However, this is left to future work. + +Beyond properties, Carbon is expected to explore some lifetime tracking system, +and when that happens it should be considered for enabling non-copy returns of +immutable values with a tracked lifetime. These might provide for more general +or complex forms of read-only _member and subobject accessors_ than can be +represented through properties. + +More general mutating _member and subobject accessors_ should be represented +with pointers. While this has some ergonomic cost, it seems minimal and isolated +to a relatively rare use case. + +### `const` and thread safe interface distinction + +The more broad and fundamental use of `const` in C++ is to provide a meaningful +API subset that either precludes logical mutations, or more importantly provides +a "read-only" _thread-safe_ interface of a thread-compatible type. These +restrictions are about the _interface_, and importantly distinct from immutable +value parameters that are passed as-if by copy. + +The interface focus of `const` leads to the proposed design direction in Carbon: +interfaces and generics. Carbon generics provide a way for a type to distinguish +a smaller interface that it implements, and for APIs to use this narrow +interface to only interact with the underlying type in particular ways. While +they are presented as a way to make code _generic_ over multiple types, they can +also be used to simply enforce constraints on the interface exposed. + +Carbon should leverage this core facility to build semantically restricted +interfaces like thread-safe ones. If necessary, convenience syntax can be added +to make it easier for types to mark a subset of their API as forming such an +independent interface, but that should be done based on usage experience with +the basic generics facilities to serve this purpose and the real-world pain +points uncovered. + +### Lifetime overloading + +One use case not obviously or fully addressed by the tools proposed here is +overloading function calls by observing the lifetime of arguments. The use case +here would be selecting different implementation strategies for the same +function or operation based on whether an argument lifetime happens to be ending +and viable to move-from. + +This proposal suggests not immediately pursuing this use case. There is a +fundamental scaling problem in this style of overloading: it creates a +combinatorial explosion of possible overloads. Consider a function with N +parameters that would benefit from lifetime overloading. If each one benefits +_independently_ from the others, we would need 2N overloads to +express all the possibilities. + +Carbon should initially see if code can be designed without this facility. Some +of the tools needed to avoid it are suggested above such as the +[consuming](#variables-introduce-l-values-with-mutable-storage) input pattern. +But it is possible that more will be needed in practice. It would be good to +identify the specific and realistic Carbon code patterns that cannot be +expressed with the tools in this proposal in order to motivate a minimal +extension. Some candidates based on functionality already proposed here or for +[classes](/docs/design/classes.md): + +- Allow overloading between `addr me` and `me` in methods. This is among the + most appealing as it _doesn't_ have the combinatorial explosion. But it is + also very limited as it only applies to the implicit object parameter. +- Allow overloading between `var` and non-`var` parameters. +- Expand the `addr` technique from object parameters to all parameters, and + allow overloading based on it. + +Perhaps more options will emerge as well. Again, the goal isn't to completely +preclude pursuing this direction, but instead to try to ensure it is only +pursued based on a real and concrete need, and the minimal extension is adopted. + +### Syntax-free dereference + +Carbon should not prioritize a way to dereference with zero syntax on function +interface boundaries. The presence of a clear level of indirection is important +to mark there. It helps surface that an object that may appear local to the +caller is in fact escaped and referenced externally to some degree. + +It may prove desirable to provide an ergonomic aid to reduce dereferencing +syntax within function bodies, but this proposal suggests deferring that at +least initially in order to better understand the extent and importance of that +use case. If and when it is considered, a direction based around a way to bind a +name to an L-value produced by dereferencing in a pattern appears to be a +promising technique. + +A closely related concern to syntax-free dereference is syntax-free address-of. +Here, Carbon supports one very narrow form of this: implicitly taking the +address of the implicit object parameter of member functions. Currently that is +the only place with such an implicit affordance. It is designed to be +syntactically sound to extend to other parameters, but currently that is not +planned to avoid surprise. + +## Rationale based on Carbon's goals + +- Pointers are a fundamental components of all modern computer hardware -- + they are abstractly random-access machines -- and being able to directly + model and manipulate this is necessary for + [performance-critical software](/docs/project/goals.md#performance-critical-software). +- Simplifying the type system by avoiding both pointers and references is + expected to make + [code easier to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write). +- Creating space in both the syntax and type system to introduce ownership and + lifetime information is important to be able to address long term + [safety](/docs/project/goals.md#practical-safety-and-testing-mechanisms) + needs. +- Pointers are expected to be deeply familiar to C++ programmers and easily + [interoperate with C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code). ## Alternatives considered -TODO: What alternative solutions have you considered? +### Immutable value escape hatch + +We could provide immediately the escape hatch of a syntax to unsafely take the +address and perform mutations to an immutable value view in Carbon. This would +more easily match patterns like `const_cast` in C++. However, there seem to be +effective ways of rewriting the code to avoid this need so this proposal +suggests not adding that escape hatch now. We can added it later if experience +proves this is an important pattern to support without the contortions of +manually creating a local copy (or changing to pointers). + +### References in addition to pointers + +The primary and most obvious alternative to the design proposed here is the one +used by C++: have _references_ in addition to pointers in the type system. This +initially allows zero-syntax modeling of L-values, which can in turn address +many use cases here much as they do in C++. Similarly, adding different kinds of +references can allow modeling more complex situations such as different lifetime +semantics. + +However, this approach has two fundamental downsides. First, it would add +overall complexity to the language as references don't form a superset of the +functionality provided by pointers -- there is still no way to distinguish +between the reference and the referenced object. This results in confusion where +references are understood to be syntactic sugar over a pointer, but cannot be +treated as such in several contexts. + +Second, this added complexity would reside exactly in the position of the type +system where additional safety complexity may be needed. We would like to leave +this area (pointers and references to non-local objects) as simple and minimal +as possible to ease the introduction of important safety features going forward +in Carbon. + +### Automatic dereferencing + +One way to make pointers behave very nearly the same as references without +adding complexity to the type system is to automatically dereference them in the +relevant contexts. This can, if done carefully, preserve the ability to +distinguish between the pointer and the pointed-to object while still enabling +pointers to be seamlessly used without syntactic overhead as L-values. + +However, this makes code dereferencing a pointer and performing a non-local and +potentially mutating operation visually indistinct. Having visual markers for +this arguably provides some readability improvement for some people, but is +noise and a distraction for others. Reasonable judgement calls about which +direction to prefer may differ, but Carbon's principle of preferring lower +context sensitivity leans (slightly) toward explicit dereferencing instead. + +It is worth noting that there are existing languages that use exactly this or an +extremely similar pattern such as Rust. It is also relatively easy to imagine +moving from this proposal toward automatic dereferencing in the future as it +builds on the same core type system primitives. + +### Exclusively using references + +While framed differently, this is essentially equivalent to automatic +dereferencing of pointers. The key is that it does not add both options to the +type system but addresses the syntactic differences separately and uses +different operations to distinguish between the reference and the referenced +object when necessary. + +``` + +```