Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUGGESTION] Confusing std::vector ctor calls in Cpp1 and Cpp2 #193

Closed
wolfseifert opened this issue Jan 2, 2023 · 57 comments
Closed

[SUGGESTION] Confusing std::vector ctor calls in Cpp1 and Cpp2 #193

wolfseifert opened this issue Jan 2, 2023 · 57 comments

Comments

@wolfseifert
Copy link

In Cpp1 we have the confusing fact that

#include <vector>
using namespace ::std;
int main() {
  auto a = vector<int>{5, 1}; // vector of five and one
  auto b = vector<int>(5, 1); // vector of five ones
}

Experimenting with (pure) Cpp2 I got a similar case

main: () -> int = {
  a: std::vector<int> = (5, 1); // vector of five and one
  b := std::vector<int>(5, 1); // vector of five ones
}

This becomes clear when looking at the tanspiled code

  std::vector<int> a { 5, 1 };  // vector of five and one
  auto b { std::vector<int>(5, 1) }; // vector of five ones

For me the Cpp2 case looks even more confusing (than the Cpp1 case), because both ctor calls use parenthesis.

@MichaelCook
Copy link

Perhaps another way to state this issue is: Should cpp2 do fine-grained removal/repair of unfortunate cpp1 API choices?

@robertshepherdcpp
Copy link

robertshepherdcpp commented Jan 2, 2023

What about if we had a literal for the size, like _s so then we could just define the vector like:

a: std::vector<int> = (5, 1_s);  // a vector of 5 ones
b: std::vector<int> = (5, 1);     // a vector of 5 and 1

and then we could do it like this:

  b := std::vector<int>(5, 1_s); // vector of five ones

@JohelEGP
Copy link
Contributor

JohelEGP commented Jan 2, 2023

How would you combine that with existing suffixes like ULL?

@filipsajdak
Copy link
Contributor

Soon you will be able to write:

a: std::vector<int> = (5, 1 as std::size_t);

@robertshepherdcpp
Copy link

robertshepherdcpp commented Jan 2, 2023

@filipsajdak Yes, thats good. Would an overload for a std::size_t at the end of the vector be possible?

@fpelliccioni
Copy link
Contributor

@fpelliccioni Yes, thats good. Would an overload for a std::size_t at the end of the vector be possible?

I think you confused me with @filipsajdak

@JohelEGP
Copy link
Contributor

JohelEGP commented Jan 2, 2023

a: std::vector<int> = (5, 1 as std::size_t);

What about

  auto a = vector<std::size_t>{5z, 1z}; // vector of five and one
  auto b = vector<std::size_t>(5z, 1z); // vector of five ones

@filipsajdak
Copy link
Contributor

@JohelEGP cheater. 🙂

I agree that there is an issue. I am just trying to advertise the cpp2 casting feature.

What we want to achieve is unambiguous syntax. The spotted issue sure will need some reasonable solution.

@switch-blade-stuff
Copy link

IMO this would be fairly difficult to solve while keeping full compatibility with Cpp1, the issue is that in C++, the aggregate (i.e. {}) constructor favors initializer lists, while the "classic" parentheses never generate initializer lists.

So, this would either require a change in constructors of std::vector (and other containers too) to outright remove the size & value constructor, or use some kind of tag like with std::in_place_t to disambiguate overloads; or to forego the initializer-list/aggregate constructor form in Cpp2, which would prevent the ambiguity, but also disable initializer lists.

@switch-blade-stuff
Copy link

Honestly, it would be easier if C++ just left {} for initializer lists, and only use () for constructors. That way there would be no ambiguity, if you saw std::vector v = {0, 1}, you'd know it is definitely an initializer list, and the vector contains 0 and 1, instead of 1 zero.

@marioarbras
Copy link

Honestly, it would be easier if C++ just left {} for initializer lists, and only use () for constructors. That way there would be no ambiguity, if you saw std::vector v = {0, 1}, you'd know it is definitely an initializer list, and the vector contains 0 and 1, instead of 1 zero.

This would be really nice. An alternative idea could be to use an explicit cast for initializer lists.

v: std::vector = (5, 1) as std::initializer_list;

@robertshepherdcpp
Copy link

@marioarbras Or we could have it the other way round, depending on which is more common: initializing a vector with a std::initializer_list like:

  a: std::vector<int> = (5, 1); // vector of five and one

or initializing it with a certain amount of elements like:

  b := std::vector<int>(5, 1); // vector of five ones

Which would be the more common usage? Personally I use the initializer list more.

@hsutter
Copy link
Owner

hsutter commented Jan 2, 2023

Yes, fixing that problem is on my list to address as follows (I just haven't implemented it yet):

v1: vector<int> = (5, 1);   // size == 5, same as "= ((1, 1, 1, 1, 1))"
v2: vector<int> = ((5, 1)); // size == 2

The idea is that the parens in v2's initializer make it explicit that we are constructing with a single argument that is a list. (Do you "see" it visually, that the outer parens now clearly contain only a single thing? I think it's visually clear, but I'm curious to learn whether others will see it that way too.)

@switch-blade-stuff
Copy link

v: std::vector = (5, 1) as std::initializer_list;

Honestly, to me that looks just slightly better than manual insertion 😅

I suppose we could just explicitly use braces for initializer lists only, where if {stuff} is encountered in a constructor it would be replaced by ({stuff}), forcing the initializer-list overload.

Then again, this would fall flat for aggregate types such as std::array, as well as the "classic" C-style arrays that do not use std::initializer_list.

@switch-blade-stuff
Copy link

The idea is that the parens in v2's initializer make it explicit that we are constructing with a single argument that is a list.

What about this situation though:

v1: vector<int> = (x + 1);
v2: vector<int> = ((x + 1));

Is v2 a vector of 1 element x+1, or is it a vector of x+1 zeros, where the size was just enclosed in redundant parentheses?

@SebastianTroy
Copy link

SebastianTroy commented Jan 2, 2023 via email

@marioarbras
Copy link

marioarbras commented Jan 2, 2023

Yes, fixing that problem is on my list to address as follows (I just haven't implemented it yet):

v1: vector<int> = (5, 5);   // size == 5, same as "= ((5, 5, 5, 5, 5))"
v2: vector<int> = ((5, 5)); // size == 2

The idea is that the parens in v2's initializer make it explicit that we are constructing with a single argument that is a list. (Do you "see" it visually, that the outer parens now clearly contain only a single thing? I think it's visually clear, but I'm curious to learn whether others will see it that way too.)

This makes a lot of sense since you can already do this:

il:= (1, 1);

which transpiles to

auto il = { 1, 1 }; // same as auto il = std::initializer_list{ 1, 1 }

@switch-blade-stuff
Copy link

auto il = { 1, 1 }; // same as auto il = std::initializer_list{ 1, 1 }

Correct me if I'm wrong, but isn't std::initializer_list different from just {}? Also, to my knowledge initializer lists can't be manually constructed.

@hsutter
Copy link
Owner

hsutter commented Jan 2, 2023

@switch-blade-stuff

What about this situation though:
v1: vector<int> = (x + 1);
v2: vector<int> = ((x + 1));
Is v2 a vector of 1 element x+1, or is it a vector of x+1 zeros, where the size was just enclosed in redundant parentheses?

The extra parens would be meaningful, as they're saying that "this argument list contains a single element, which is a list." So:

v1: vector<int> = (x + 1);   // x+1 elements, all zero
v2: vector<int> = ((x + 1)); // one element, x+1

What do you think, does that sound reasonable?

@switch-blade-stuff
Copy link

switch-blade-stuff commented Jan 2, 2023

That does sound reasonable, although to me it still seems like a potentially error-prone thing to use double parentheses for lists, since the majority of people are used to {} meaning lists and parentheses meaning precedence groups.

Also gives me a bit of a tingle with templates, I cannot come up with a reasonable example off the top of my head right now but it could cause some unexpected pitfalls in complex template pack expansion (i.e. pack expansion wants to use parentheses for grouping, but because the pack has only 1 element it gets treated as a list).

@GPMueller
Copy link

Since functions are declared like mappings between tuples, (...) -> (...), I would find it consistent that assigning such a tuple to a vector, v: vector<int> = (5, 5) would produce a vector of two elements.
Not sure whether there's other parts of the language syntax that could be considered in this consistency argument.

Personally, I'd find the extra parens more confusing than the fact that a constructor-call is not necessarily the same as an assignment, so while I agree that the cpp1 example given in the original post has potential for confusion, I don't agree that the cpp2 example does.

I see more risk for confusion with

v1: vector<int> = (x + 1);
v2: vector<int> = ((x + 1));

since people are probably used to parentheses being only about association/ordering.

@switch-blade-stuff
Copy link

since people are probably used to parentheses being only about association/ordering.

I agree. Personally, at least, I have always thought (and felt, maybe?) of parentheses as math-related syntax, with the meaning of expression ordering/grouping. And I would assume that most people think of parentheses as something you use to disambiguate (a + b) * x from a + b * x first, and as such I would expect that people could find the double parentheses syntax confusing. It wouldnt help that most people would be already used to braces used to define lists.

What also can make it more confusing is the possibility of whitespace in between double parentheses. If ((x)) means the same as ( (x) ) it can create even more confusion than there already is about list initialization. Then on the other hand, if ((x)) and ( (x) ) mean different things, it wouldn't remove any confusion either.

The way I see it is that aggregate/list initialization needs to be dealt with by separating construction from lists, rather than replacing the current status-quo with a different syntax.

@SebastianTroy
Copy link

SebastianTroy commented Jan 2, 2023 via email

@HALL9kv0
Copy link

HALL9kv0 commented Jan 2, 2023

My 3 cents on the issue:

  1. Additional parentheses are somewhat confusing, and are easy to forget, neglect, especially for newcommers to the language.
  2. Parentheses construction syntax, v: vector = (5,1), is similar to a function call, and intuitively we can think of it as a function generating a vector of 5 elements, all equal to 1. Easier to introduce it to new cpp programmers.
  3. C-style arrays and cpp1 containers (std::vector the most used by far) use {} to initialize their elements. Therefore, v: vector = {1,2} is easier thought as a list of things I want to construct my vector with.
    TL;DR: My personal preference, {} for construction using list of elements, () for everything else. Depending on the bracket style intention is clear.

@fpelliccioni
Copy link
Contributor

Yes, fixing that problem is on my list to address as follows (I just haven't implemented it yet):

v1: vector<int> = (5, 1);   // size == 5, same as "= ((1, 1, 1, 1, 1))"
v2: vector<int> = ((5, 1)); // size == 2

The idea is that the parens in v2's initializer make it explicit that we are constructing with a single argument that is a list. (Do you "see" it visually, that the outer parens now clearly contain only a single thing? I think it's visually clear, but I'm curious to learn whether others will see it that way too.)

I think it is still confusing. What about named constructors?
For containers the default have to be initializer list constructing and named constructors should be used for other kind of constructions.
We need a way to map a Named Constructor in Cpp2 with an existing constructor of current C++.

@marioarbras
Copy link

@switch-blade-stuff

auto il = { 1, 1 }; // same as auto il = std::initializer_list{ 1, 1 }

Correct me if I'm wrong, but isn't std::initializer_list different from just {}? Also, to my knowledge initializer lists can't be manually constructed.

I could be wrong, but I think an initializer list is implicitly constructed when you do auto il = {...}. I think you can also manually construct them explicitly. Check these examples on cppreference. It looks like both auto il = { 1, 1 } and auto il = std::initializer_list{ 1, 1 } are allowed.

@HALL9kv0
Copy link

HALL9kv0 commented Jan 2, 2023

using Vec2D=std::vector<std::vector<int>>;
v1: Vec2D = (( (2,1) ,  ((3,4)) )) // v=[1 1; 3 4]
v2: Vec2D= { (2,1)  , {3,4} } // v=[1 1; 3 4]
v3: Vec2D= cpp2::from( (2,1)  , cpp2::from(3,4) ) // v=[1 1; 3 4]
v4: Vec2D= ( cpp2::vector::construct(2,1)  , (3,4) ) // v=[1 1; 3 4]

The above concept code is for illustrative purpose only, to highlight 4 different approaches to the subject.

  1. v1 uses the double-parentheses for construction from elements
  2. v2 uses {} for construction from elements and () for the other constructor
  3. v3 uses a custom cpp2::from constructor wrapper for the initialization list.
  4. v4 uses a custom cpp2::vector::construct constructor wrapper for the size, value constructor, similar to what @fpelliccioni suggests if I understood correctly.

Sorted preference max->min: v2, v3, v4, v1. In nested containers the (()) syntax becomes cumbersome and 4 )))) in a row reminds me of Lisp.
What is your preference?

@switch-blade-stuff
Copy link

What is your preference?

I agree with the v2 > v3 > v4 > v1 preference, although to me cpp2::from doesnt sound descriptive enough, maybe something like from_list or init_list instead?

Then again, {} is much more descriptive, since cpp2::from etc. blends itself with functions and constructors and is harder to notice.

v1 is just way too many parentheses.

@JohelEGP
Copy link
Contributor

JohelEGP commented Jan 2, 2023

std::initializer_list, the initializer of all evil.

IIRC, in Cpp2, () is a parameter list, which is necessary to initialize a std::vector via a multi-parameter constructors. That is in contrast to v : std::string = "", which doesn't require () around "". I suppose ("") would work, too. So maybe just require {} is for std::initializer_list parameters only (perhaps not even for aggregates), so v : std::vector<int> = {5, 1} is $(5, 1)$ and v : std::vector<int> = (5, 1) is $(1, 1, 1, 1, 1)$.

@jcanizales
Copy link

Per @wolfseifert 's experiment, the current status quo would have to be changed no matter what:

main: () -> int = {
  a: std::vector<int> = (5, 1); // vector of five and one
  b := std::vector<int>(5, 1); // vector of five ones
}

currently transpiles to:

std::vector<int> a { 5, 1 };  // vector of five and one
auto b { std::vector<int>(5, 1) }; // vector of five ones

(Confusing!)

For Option 0, the second case would need to be changed to produce:

auto b { std::vector<int>{5, 1} };

@jcanizales
Copy link

Related consideration: Do you already have a plan in mind for the syntax of designated initializers, @hsutter ? That might be what "forces" CPP2 to use { } for lists and aggregates anyway.

@switch-blade-stuff
Copy link

@SebastianTroy hmmm.. that tastes very... python-y

But to me the brackets proposal seems to hit the same issue as the double-parentheses one, in that the brackets are commonly associated with indexing, and using them for list initialization could create confusion.

@switch-blade-stuff
Copy link

@hsutter

But using braces also for lists doesn't seem like it would conflict with scopes -- I can't offhand think of where they'd overlap in the grammar

I dont think there is any overlap, no, since scopes do not appear anywhere a prvalue might. Also, using braces for lists will have a good mental parallel to scopes: a scope is a collection of statements, while a list/aggregate is a collection of expressions.

@msadeqhe
Copy link

msadeqhe commented Jan 3, 2023

I've read this comment from an issue in Carbon about grammar issues of if/for/while statements with {} initialization in Rust. Maybe it can be a concern in Cpp2 too.

@charlieman
Copy link

v1: vector<int> = 5, 1;   // size == 5, same as "= ((1, 1, 1, 1, 1))"
v2: vector<int> = _, (5, 1); // size == 2

@hsutter
Copy link
Owner

hsutter commented Jan 3, 2023

Thanks, everyone, for all the comments.

@SebastianTroy

Could we use square [ ] brackets for lists/aggregates instead?

Let's add that option to the list, and I've now implemented (3) too to try it out:

Option 3: Use [ ] for lists.

This syntax more visually connotes a list-of-things, it's symmetric with [ ] for element access of lists/arrays, and it's easier to see the difference between ( ) and [ ] (at least for me) than between ( ) and { }.

Its biggest drawback is one I'd encountered with [ ] before (*), that doing this creates some extra work to distinguish between [ ] used for subscripting (which have to be preserved as [ ] when lowering to Cpp1) and [ ] used for lists (which in this option would need to be emitted as { } to Cpp1). My learnings from such situations so far has generally been that "when it's harder for cppfront to know whether something is A or B, it's generally also harder for the programmer to see whether it's A or B" plus "and usually hits the 'have to do name lookup to figure out what it means' problem" (a key issue Cpp2 is aiming to solve is that name lookup is never needed for parsing, but it also shouldn't be needed to understand what an expression means). This highlighted to me that this path would make thing[1] at least visually ambiguous... if thing is an object with a subscript operator then it's a subscript operator call, but if thing is a type then it can be a function-style construction notation (see below).

(*) A much worse case of this is one of the reasons I went away from using [ ] for template argument lists, which I had been using in earlier drafts of Cpp2... I didn't want to have to do name lookup to parse an expression like thing[1]. This would be less bad than that, as at least both meanings of thing[1] would still be an expression, but it would still be a visual ambiguity that the human and the compiler would need to resolve each time.

Current resolution: (0) Status quo

I've implemented all three of (1) (2) and (3) and tried them out in the regression tests, and it's been an interesting experiment. After writing a code in each of the other three syntaxes, and even kind of liking each of those syntaxes, I found that when I switched back to status quo it felt even nicer still to just have ( ) and not have to teach or think about (( )), { }, or [ ] (each of which was at least a second thing to teach, and most of which additionally caused some secondary effects or at least visual ambiguities). Maybe it's just me, but after trying the others, "just parens" surprisingly felt like a small breath of fresh air. It felt good.

I'm still going to be open to further experience, but for now I don't think it's worth adding a language feature primarily to make vector<NumericType>'s disparaged two-parameter constructor work as a first-class initializer (you can still get it using a function-style expression, because Cpp1 and current Cpp2 allow using a type name as a callable name in the postfix-expression syntax).

The more potentially compelling use case for { } expression-lists in the future is the one I mentioned above -- it could be useful to have an explicit initializer-list expression syntax, which would allow calls like "f( {1, 2, 3} ). I've kept the code changes that would enable that to make it easier to try that in the future(not now), but I've deliberately disabled accepting the { }` there for the time being.


Re function-style construction notation:

b := std::vector<int>(5, 1); // vector of five ones

Yes, that happens to work (though I didn't initially intend it) because C++ naturally allows a type name to be used as effectively a callable entity, with the semantics of invoking the constructor. I could go out of my way to prevent it by emitting all nonmember function calls as a macro that uses if constexpr to test whether the id-expression is a type, but for now it doesn't seem worthwhile to ban it.

The :-consistent way of writing an expression scope (temporary) object would be as an unnamed object :sometype = (some, args), much like expression scope unnamed functions. For now I've deliberately disabled that so it errors out if you try to write it, but it would be the main alternative.


Do you already have a plan in mind for the syntax of designated initializers, @hsutter ? That might be what "forces" CPP2 to use { } for lists and aggregates anyway.

I've been deprioritizing them because they're not addressing one of the pressing problems with C++ and so can be looked at later. When I do look at them, I plan to consider them together with named arguments as a general feature, and avoid doing something special for constructors that doesn't work the same way for other function calls. So I don't think named arguments will be related to a brace syntax, if I do add them in the future for all functions arguments (not just constructor arguments).

Thanks everyone!

hsutter added a commit that referenced this issue Jan 3, 2023
…thread for #193

This commit paves the way for in the future (not now) possibly experimenting with allowing expression-list to have braces, if it's important to provide explicit initializer-list expression syntax (mostly for function arguments)
@hsutter hsutter closed this as completed Jan 3, 2023
@wolfseifert
Copy link
Author

So the only way to pass a vector of 5 and 1 to a function print is like this:

a: std::vector = (5, 1);
print(a); // prints 5 1

Neither

print(std::vector(5, 1));  // prints 1 1 1 1 1

nor

print(: std::vector = (5, 1));  // does not transpile

do the trick. Or did I miss something?

@jcanizales
Copy link

@wolfseifert does print((5, 1)) work?

@wolfseifert
Copy link
Author

No, it does not. I will open a new issue for this later.

hsutter added a commit that referenced this issue Jan 7, 2023
…comment thread

Sample test case:

```
print: (x:_) = {
    for x do :(elem:_) = {
        std::cout << elem << " ";
    }
}

main: ()->int = {
    print( : std::vector = (5,1); );
}
```
@hsutter
Copy link
Owner

hsutter commented Jan 7, 2023

@wolfseifert:

print(: std::vector = (5, 1));  // does not transpile

I've been meaning to allow unnamed objects eventually, and seeing you actually trying to write the code pushed me over the edge to implement it now. :)

This now works:

print: (x:_) = {
    for x do :(elem:_) = {
        std::cout << elem << " ";
    }
}

main: ()->int = {
    print( : std::vector = (5,1) );   // prints "5 1"
}

@msadeqhe
Copy link

msadeqhe commented Jan 8, 2023

Is it possible to infer type from function declaration?

print: (x: std::vector<int>) = {
    for x do : (elem: _) = {
        std::cout << elem << " ";
    }
}

main: () ->int = {
    print(: = (5, 1));   // Here
    print(: _ = (5, 1)); // Or like this
}

... and from variables declaration:

x :std::vector<int> = (5, 1);
y :std::vector<int> = : std::vector = (5, 1);
z :std::vector<int> = : = (5, 1); // Here
a :std::vector<int> = : _ = (5, 1); // Or like this

@wolfseifert
Copy link
Author

How I understand this so far:

  • x: std::vector = (5, 1);
    is the current "standard" way of initializing a vector. It is transpiled to
    std::vector x {5, 1};
  • y: std::vector = : std::vector = (5, 1);
    is a more cumbersome way of achieving the same thing. It is transpiled to
    std::vector y {std::vector{5, 1}};
    This variant however makes a lot of sense for something like
    y: std::vector = : myvector = (5, 1);
    where myvector is a subclass of vector but y must be of type vector.
  • z: std::vector = := (5, 1);
    is transpiled to
    std::vector z {auto{5, 1}}
    which does not C++ compile.

@filipsajdak
Copy link
Contributor

What do you think about that:

print( (5,1) as std::vector ); // for generic functions; prints "5 1"

taking_vector( forward (5,1) ); // assuming that this function expects a vector and 
                                // that we can deduce the type from the function declaration

For the std::vector "bad API" issue, maybe we should think about correcting it on the cppfront side e.g., like that:

main: () -> int = {
    print( std::vector<int>().create(5, 1) ); // prints 1 1 1 1 1
    // or ...
    print( create<std::vector>(5, 1) ); // prints 1 1 1 1 1
}

template <typename X>
auto create(std::vector<X>&&, std::size_t size, X&& value = X{}) -> decltype(auto)
{
    return std::vector<X>(size, value);
}

template <template <typename...> class V, typename X>
auto create(std::size_t size, X&& value = X{}) -> decltype(auto)
{
    return V<X>(size, value);
}

print: (x:_) = {
    for x do :(elem :_) = {
        std::cout << elem << " ";
    }
    std::cout << std::endl;
}

#include <vector>

Names can be adjusted (resize cannot be used instead of create as vector already has a resize method, and UFCS prefers methods over free functions).

@filipsajdak
Copy link
Contributor

I have just realized that syntax for unnamed objects

: std::vector = (5, 1)

Follows the same syntax as unnamed functions (aka lambdas):

: (i:int) -> int = { return i+1; }

@hsutter
Copy link
Owner

hsutter commented Jan 8, 2023

Re syntax: Yes, in both cases it's the identical syntax as a named object/function, but without the name.

I think we've bottomed out on this for now. Thanks, everyone!

Azmah-Bad pushed a commit to Azmah-Bad/cppfront that referenced this issue Feb 24, 2023
…thread for hsutter#193

This commit paves the way for in the future (not now) possibly experimenting with allowing expression-list to have braces, if it's important to provide explicit initializer-list expression syntax (mostly for function arguments)
Azmah-Bad pushed a commit to Azmah-Bad/cppfront that referenced this issue Feb 24, 2023
…er#193 comment thread

Sample test case:

```
print: (x:_) = {
    for x do :(elem:_) = {
        std::cout << elem << " ";
    }
}

main: ()->int = {
    print( : std::vector = (5,1); );
}
```
@JohelEGP
Copy link
Contributor

#284 (comment) means it's possible to call std::vector<int>'s (int, int) constructor: https://cpp2.godbolt.org/z/Pvj8fj1he.

using std::vector;
main: () = {
  z := 0.vector(0);
}

@AbhinavK00
Copy link

That's a nice hack.
But the fact that it took this long to come up with a solution shows we need some kind of change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests