-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathp2961.tex
1671 lines (1186 loc) · 90.6 KB
/
p2961.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\input{wg21common}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%TABLE OF CONTENTS SETTINGS
\usepackage{titlesec}
\usepackage{tocloft}
% Custom ToC layout because the default sucks
\cftsetindents{section}{0in}{0.24in}
\cftsetindents{subsection}{0.24in}{0.34in}
\cftsetindents{subsubsection}{0.58in}{0.44in}
% Needed later to reduce the ToC depth mid document
\newcommand{\changelocaltocdepth}[1]{%
\addtocontents{toc}{\protect\setcounter{tocdepth}{#1}}%
\setcounter{tocdepth}{#1}%
}
\setcounter{tocdepth}{3}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}
\title{A natural syntax for Contracts}
\author{ Timur Doumler \small(\href{mailto:[email protected]}{[email protected]})\\
Jens Maurer \small(\href{mailto:[email protected]}{[email protected]})}
\date{}
\maketitle
\begin{tabular}{ll}
Document \#: & P2961R2 \\
Date: &2023-11-08\\
Project: & Programming Language C++ \\
Audience: & SG21
\end{tabular}
\begin{abstract}
We propose a syntax for Contracts that naturally fits into existing C++, does not overlap with the design space of other C++ features such as attributes or lambdas, is intuitive, lightweight and elegant, and designed to aid readability by visually separating the different syntactic parts of a contract check. The proposed syntax removes the problems of attribute-like and closure-based syntax while maintaining full compatibility and extensibility in line with the SG21 requirements for a Contracts syntax.
\end{abstract}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\tableofcontents*
%\vspace{5mm}
\section*{Revision history}
\addcontentsline{toc}{section}{\protect\numberline{}Revision history}
\addtocontents{toc}{\protect\numberline{}} % Blank line in ToC
\textbf{Revision 2}, 2023-11-05:
\begin{itemize}
\item Chose keyword \tcode{contract_assert} for assertions
\item Added more rationale why using keyword \tcode{assert} for contract assertions is not viable
\item Added section about implementation experience
\item Updated discussion of viability for the C language with results from SG22 meeting
\end{itemize}
\textbf{Revision 1}, 2023-10-09:
\begin{itemize}
\item Made \emph{pre-or-postcondition} precede \emph{pure-specifier} for consistency with \tcode{= default} and \tcode{= delete}
\item Clarified that naming the return value in a postcondition is optional
\item Added discussion of keyword alternatives to \tcode{assert}; changed proposed keyword alternative from \tcode{assrt} to a shortlist with two candidates: \tcode{contract_assert} and \tcode{assertexpr}
\item Added discussion of viability for the C language
\item Added discussion of a possible post-MVP syntax for class invariants
\item Added discussion of three new syntax requirements from \cite{P2885R3} raised after the SG21 electronic poll on syntax requirements was conducted
\end{itemize}
\textbf{Revision 0}, 2023-09-17:
\begin{itemize}
\item Initial version.
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{The issues with attribute-like syntax}
\label{sec:motivation}
SG21 is currently working on standardising a first version of a Contracts facility --- the so-called \emph{Contracts MVP} (see \cite{P2900R0}). According to our roadmap \cite{P2695R1}, the last remaining major design decision is the choice of syntax (see \cite{P2885R3}). The longest-standing proposal for a Contracts syntax is the so-called attribute-like syntax (\cite{P2935R3}). While attribute-like syntax has its strengths --- longer implementation experience (see \cite{P1680R0}) and the possibility to lean on existing standard attribute grammar rather than inventing new grammar --- it also has considerable weaknesses.
Attribute-like syntax uses \tcode{[[...]]} delimiter tokens around every contract check. This has been called a ``heavy'' syntax and is perceived as ``ugly'' by some users. The \tcode{[[...]]} syntax was designed for attributes so that they can appertain to many different kinds of entities (functions, classes, variables, types, statements, etc.) in many different contexts. Contracts, on the other hand, have a well-defined syntactic place and completely different design requirements. Further, attributes are intended to be used sparingly, while we expect that Contracts will be used widely, and in particular in many public-facing APIs, which calls for a more lightweight, natural-looking syntax. Finally, preconditions and postconditions are an integral part of a function declaration and therefore should have a syntax consistent with other parts of such a declaration, such as \tcode{requires}, \tcode{noexcept}, etc., all of which are introduced via a proper C++ keyword rather than delimiter tokens.
Technically speaking, the attribute-like syntax for contract checks does not fully conform to the grammar for attributes (due to the presence of the single colon which is not allowed for attributes), but it is syntactically very close, with a significant overlap:
\begin{codeblock}
[[ pre(foo(x)) ]] // attribute
[[ pre: foo(x) ]] // contract
[[ pre:: foo(x) ]] // attribute
\end{codeblock}
As a result, we expect that with this syntax, most users will perceive contract checks as a kind of attribute. However, contracts are not attributes and do not behave as such. Attributes are ignorable (see \cite{P2552R3}) while contracts are not (except in a program with no contract violations or if the contract semantic has been explicitly set to \emph{ignore} by the user). A contract check may even create an entirely new code path out of a function --- for assertions even out of the middle of a function body --- e.g. via a throwing violation handler. This is functionality that standard attributes were never meant to allow. There are many other differences between contracts and standard attributes; see \cite{P2487R1} for an analysis. It is particularly instructive to refer to the original paper \cite{N2761} that introduced attributes to the C++ language, specifically section 7 ``Guidance on when to use/reuse a keyword and when to use an attribute'': Contracts satisfy almost none of the properties listed for a feature that should be an attribute, and almost all of the properties listed for a feature that should be a keyword\footnote{The only listed property for a feature that should be a keyword that Contracts do \emph{not} satisfy is interaction with the type system; however, \cite{P2935R3} suggests a possible future extension where they do, namely that contract checks could be placed on function types, enabling e.g. \tcode{std::function} to only accept functions with a certain contract.}.
Re-using the syntactic position of attributes for contract checks is problematic. Attributes always syntactically appertain to some other entity such as a type, a variable, or a statement. For preconditions and postconditions, attribute-like syntax uses the position of attributes appertaining to a function type, which means they need to go before any trailing return type, before virtual specifiers such as \tcode{override} and \tcode{final}, and before the \tcode{requires} clause. This goes against the natural reading order of a function declaration and requires delayed parsing of postconditions. For assertions, attribute-like syntax uses the position of attributes appertaining to a null statement, which means they cannot be used freely as an expression and cannot serve as a complete drop-in replacement for the existing \tcode{assert} macro.
\cite{P2935R3} also considers an alternative position for attribute-like preconditions and postconditions at the end of a function declaration, and for attribute-like assertions to be expressions rather than statements. But in both cases, we would need to place attribute-like entities at a novel syntactic position that is not currently supported for attributes, and that we do not have implementation experience with, thereby throwing away two major advantages of attribute-like syntax and creating several new problems (listed in \cite{P2935R3} section 4.2).
The internal grammar of attribute-like syntax is also problematic. The basic syntactic structure is:
\phantom{~~~}\tcode{[[\emph{contract-kind} *** : \emph{predicate} ]] }
Everything else that is not the contract-kind (\tcode{pre}, \tcode{post}, \tcode{assert}) or the predicate --- which includes the name for the return value and all possible post-MVP additions (labels, captures, structured bindings, \tcode{requires} clauses appertaining to the contract check, etc.) --- has only one possible syntactic position: the one marked by \tcode{***} above. This means that the different parts of the contract check are not visually separated from each other, making more complex contracts hard to read and understand. It also gives rise to several syntactic ambiguities.
For example, there is an ambiguity between a label and the name of the return value before the colon (as both are identifiers); adding a new standard label can break existing code:
\begin{codeblock}
int f(int x)
[[ post foo: x > foo ]]; // Is \tcode{foo} a label or the return value?
\end{codeblock}
One suggested workaround is to require extra parentheses around the return value:
\begin{codeblock}
int f(int x)
[[ post (foo): x > foo]]; // \tcode{foo} is the return value
\end{codeblock}
But that makes it syntactically ambiguous with an argument of the preceding label:
\begin{codeblock}
int f(int x)
[[ post foo (bar): x > bar ]]; // is \tcode{bar} the return value or an argument of label \tcode{foo}?
\end{codeblock}
As another example, there is an ambiguity between a capture and a structured binding for the return value (both desirable post-MVP extensions), as both use square brackets:
\begin{codeblock}
std::pair<int, int> f(int x, int y)
[[ post [x, y]: x != y ]]; // is \tcode{[x, y]} a capture or a structured binding?
\end{codeblock}
\cite{P2935R3} suggests awkward workarounds: to allow only init-captures but not other types of captures, and to mandate an extra pair of parentheses around the structured binding.
Finally, we cannot have standard attributes appertaining to an attribute-like contract check or specify certain other post-MVP extensions such as procedural interfaces without introducing novel and awkward grammar for attributes nested inside attributes.
In this paper, we propose a new natural, lightweight, and intuitive syntax for Contracts that solves all of the above problems.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Prior work on alternative syntaxes}
The first proposed alternative to attribute-like syntax for the Contracts MVP was closure-based syntax \cite{P2461R1}. Closure-based syntax remedies many of the issues with attribute-like syntax, but creates other issues of its own. It places the predicate between curly braces, which is awkward: normally, in C++ we place statements between braces but expressions between parentheses, and the predicate is an expression. Furthermore, it makes a contract check look very much like a lambda, even though contracts and lambdas are completely orthogonal language features.
The second proposed alternative was condition-centric syntax \cite{P2737R0}. It was not a complete proposal, as it did not consider any post-MVP features such as captures, \tcode{requires} clauses, and labels on contract checks. But it was the first proposal to use the basic syntactic structure that we also use in this paper:
\phantom{~~~}\tcode{\emph{contract-kind} ( \emph{predicate} ) }
Along with this basic syntactic structure, \cite{P2737R0} proposed a series of other design choices orthogonal to the choice of syntax, in particular:
\begin{itemize}
\item to rename ``assertion'' to the newly coined term ``incondition'',
\item to use \tcode{precond}, \tcode{postcond}, and \tcode{incond}, instead of \tcode{pre}, \tcode{post}, and \tcode{assert}, respectively,
\item to make all three of the above \emph{full} keywords rather than contextual keywords,
\item to use a predefined identifier \tcode{result} for the return value of a function instead of letting the user introduce their own name.
\end{itemize}
The above design choices have been poorly received in SG21. Instead of abandoning these additional design choices and instead focusing on the basic syntactic structure, which was received with interest, the author chose to abandon the whole proposal.
In this paper, we improve upon both closure-based and condition-centric syntax. We adopt many of the ideas of closure-based syntax, in particular the lack of delimiter tokens around the contract check and the syntax for captures. However, instead of using the problematic curly braces, we use parentheses around the predicate, following the basic syntactic structure of the condition-centric syntax \cite{P2737R0} but without adopting any of the other design choices from that paper.
Building on these ideas, we developed a complete syntax proposal that is fit for purpose in the Contracts MVP and accommodates all relevant post-MVP extensions such as captures, \tcode{requires} clauses, and labels on contract checks.
The author of the closure-based syntax has reviewed our proposal and decided to discontinue the closure-based syntax in favour of the natural syntax as it subsumes the ideas of the closure-based syntax and improves upon it. We are therefore left with attribute-like syntax and the natural syntax as the only two still active proposals for a Contracts syntax.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Design goals}
\label{sec:design}
We focus on the following design goals, which we believe are not sufficiently met by attribute-like or closure-based syntax:
\begin{itemize}
\item The syntax should fit naturally into existing C++. The intent should be intuitively understandable by users unfamiliar with contract checks without creating any confusion.
\item A contract check should not resemble an attribute, a lambda, or any other pre-existing C++ construct. It should sit in its own, instantly recognisable design space.
\item The syntax should feel elegant and lightweight. It should not use more tokens and characters than necessary.
\item To aid readability, the syntax should visually separate the different syntactic parts of a contract check. It should be possible to distinguish at a glance the contract kind, the predicate, the name for the return value, and (post MVP) the captures and labels. All of these should have their own distinct position in the syntax.
\end{itemize}
At the same time, we maintain all the other desirable properties that attribute-like and closure-based syntax offer, such as compatibility (no parsing ambiguities, no breakage or change in meaning of existing C++ code) and extensibility (a natural path for evolution in the post-MVP directions that SG21 considers relevant).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{The proposal}
\subsection{Grammar}
\label{subsec:grammar}
We propose the following additions to the C++ grammar for the Contracts MVP:
\begin{adjustwidth}{0.5cm}{0.5cm}
\emph{init-declarator:} \\
\phantom{~~~}\emph{declarator} \emph{initializer}$_{opt}$ \\
\phantom{~~~}\emph{declarator} \emph{requires-clause} \\
\phantom{~~~}\added{\emph{declarator} \emph{requires-clause}$_{opt}$ \emph{pre-or-post-condition-seq}}
\emph{member-declarator:} \\
\phantom{~~~}\emph{declarator} \emph{virt-specifier}$_{opt}$ \added{\emph{pre-or-post-condition-seq}$_{opt}$} \emph{pure-specifier}$_{opt}$\\
\phantom{~~~}\emph{declarator} \emph{requires-clause} \\
\phantom{~~~}\added{\emph{declarator} \emph{requires-clause}$_{opt}$ \emph{pre-or-post-condition-seq}} \\
\phantom{~~~}\emph{declarator} \emph{brace-or-equal-initializer}$_{opt}$ \\
\phantom{~~~}\emph{identifier}$_{opt}$ \emph{attribute-specifier-seq}$_{opt}$ \tcode{:} \emph{brace-or-equal-initializer}$_{opt}$
\emph{function-definition:} \\
\phantom{~~~}\emph{attribute-specifier-seq}$_{opt}$ \emph{decl-specifier-seq}$_{opt}$ \emph{declarator} \emph{virt-specifier-seq}$_{opt}$ \\
\phantom{~~~~~~}\added{\emph{pre-or-post-condition-seq}$_{opt}$} \emph{function-body}\\
\phantom{~~~}\emph{attribute-specifier-seq}$_{opt}$ \emph{decl-specifier-seq}$_{opt}$ \emph{declarator} \emph{requires-clause} \\
\phantom{~~~~~~}\added{\emph{pre-or-post-condition-seq}$_{opt}$} \emph{function-body}
\emph{lambda-declarator:} \\
\phantom{~~~}\emph{lambda-specifier-seq} \emph{noexcept-specifier}$_{opt}$ \emph{attribute-specifier-seq}$_{opt}$ \\
\phantom{~~~~~~}\emph{trailing-return-type}$_{opt}$ \added{\emph{pre-or-post-condition-seq}$_{opt}$}\\
\phantom{~~~}\emph{noexcept-specifier} \emph{attribute-specifier-seq}$_{opt}$ \emph{trailing-return-type}$_{opt}$ \added{\emph{pre-or-post-condition-seq}$_{opt}$} \\
\phantom{~~~}\emph{trailing-return-type}$_{opt}$ \added{\emph{pre-or-post-condition-seq}$_{opt}$}\\
\phantom{~~~}\tcode{(} \emph{parameter-declaration-clause} \tcode{)} \emph{lambda-specifier-seq}$_{opt}$ \emph{noexcept-specifier}$_{opt}$ \\ \phantom{~~~~~~}\emph{attribute-specifier-seq}$_{opt}$ \emph{trailing-return-type}$_{opt}$ \emph{requires-clause}$_{opt}$ \added{\emph{pre-or-post-condition-seq}$_{opt}$}
\emph{unary-expression:} \\
\phantom{~~~}\emph{postfix-expression} \\
\phantom{~~~}\emph{unary-operator cast-expression} \\
\phantom{~~~}\tcode{++} \emph{cast-expression} \\
\phantom{~~~}\tcode{--} \emph{cast-expression} \\
\phantom{~~~}\emph{await-expression} \\
\phantom{~~~}\tcode{sizeof} \emph{unary-expression} \\
\phantom{~~~}\tcode{sizeof (} \emph{type-id} \tcode{)} \\
\phantom{~~~}\tcode{sizeof ... (} \emph{identifier} \tcode{)} \\
\phantom{~~~}\tcode{alignof (} \emph{type-id} \tcode{)} \\
\phantom{~~~}\emph{noexcept-expression} \\
\phantom{~~~}\emph{new-expression} \\
\phantom{~~~}\emph{delete-expression} \\
\phantom{~~~}\added{\emph{assert-expression}}
\added{\emph{pre-or-post-condition:}} \\
\phantom{~~~}\added{\tcode{pre} \emph{contract}} \\
\phantom{~~~}\added{\tcode{post} \emph{contract}}
\added{\emph{pre-or-post-condition-seq:}} \\
\phantom{~~~}\added{\emph{pre-or-post-condition} \emph{pre-or-post-condition-seq}$_{opt}$}
\added{\emph{assert-expression:}} \\
\phantom{~~~}\added{\tcode{contract}}\added{_}\added{\tcode{assert} \emph{contract}}
\added{\emph{contract:}} \\
\phantom{~~~}\added{\emph{contract-condition}} \phantom{~~~}\emph{// can be expanded post-MVP, see section 6}
\added{\emph{contract-condition:}} \\
\phantom{~~~}\added{\tcode{(} \emph{return-name}$_{opt}$ \emph{conditional-expression} \tcode{)}}
\added{\emph{return-name:}} \\
\phantom{~~~}\added{\emph{identifier} \tcode{:}}
\end{adjustwidth}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Preconditions and postconditions}
To add a precondition (or postcondition) to a function declaration, we simply write \tcode{pre} (or \tcode{post}), followed by the predicate in parentheses:
%\vspace{2mm}
\begin{codeblock}
float sqrt(float x)
pre (x >= 0);
\end{codeblock}
%\vspace{2mm}
This is a very natural syntax, as it is using parentheses in the same way as other language constructs that have a predicate: \tcode{if (\emph{expr})}, \tcode{while (\emph{expr})}, etc.
There may be any number of preconditions or postconditions, in any order; preconditions and postconditions can be intermingled arbitrarily.
In a postcondition, a user-defined name for the return value of a function can be introduced via an identifier placed before the predicate, with a colon in between:
%\vspace{2mm}
\begin{codeblock}
int f(int x)
pre (x >= 0)
post (r: r > x); // \tcode{r} is the return value
\end{codeblock}
%\vspace{2mm}
Naming the return value in a postcondition is optional:
\begin{codeblock}
void clear()
post (empty()); // OK
\end{codeblock}
\tcode{pre} and \tcode{post} are contextual keywords. As we will see in section \ref{subsec:noambig}, this breaks no existing code and you can still use \tcode{pre} and \tcode{post} as an identifier everywhere this is well-formed today.
Preconditions and postconditions are positioned at the very end of a function declaration, immediately before the semicolon (or, if the declaration is a definition, the function body):
%\vspace{2mm}
\begin{codeblock}
void f(int i) override final
pre(i >= 0);
template <typename T>
auto g(T x) -> bool
requires std::integral<T>
pre (x > 0);
\end{codeblock}
%\vspace{2mm}
This order is consistent with the natural order of reading a function declaration: typically, the reader will first want to see the full function signature, then any compile-time constraints (the \tcode{requires} clause), and finally any runtime constraints (the contract checks).
The only exception to this is the \emph{pure-specifier} \tcode{= 0}, which comes \emph{after} preconditions and postconditions to create consistency with \tcode{= default} and \tcode{= delete}, which are function bodies:
\begin{codeblock}
struct X {
X() pre(a) = default;
X(const X&) pre(b) = delete;
virtual void f() pre(c) = 0;
};
\end{codeblock}
Note that you can never execute a contract check on a deleted function, but you can do so on a defaulted function.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Assertions}
\label{subsec:assertions}
Assertions use the same natural syntax of a keyword followed by a parenthesised predicate:
\begin{codeblock}
void f() {
int i = get_i();
contract_assert(i >= 0);
use_i(i);
};
\end{codeblock}
This syntax will look instantly familiar to C++ developers as it is the same basic syntax that one would use today to write a macro-based assertion (but without any of the limitations of a macro-based solution).
However, unlike \tcode{pre} and \tcode{post}, we need to claim a full keyword for the contract kind because assertions appear at block scope and could otherwise not be disambiguated from a function call\footnote{Or at least, not without post-MVP syntactic additions such as captures that are not allowed on function calls.}. This keyword cannot be \tcode{assert} due to the name clash with the existing \tcode{assert} macro from header \tcode{cassert} (see section \ref{subsec:assrt}).
After having carefully considered a large number of possible alternative keywords (see section \ref{subsec:keyword} for a detailed discussion), we chose the alternative keyword \tcode{contract_assert}.
With this syntax, assertions are expressions, not statements. Consequently, assertions can be used not only as statements inside a function body, but actually anywhere one could use an expression, and in particular, anywhere one could use an \tcode{assert} macro today:
\begin{lstlisting}
class X {
int* _p;
public:
X(int* p)
: _p(contract_assert(p), p)) // works
{}
};
\end{lstlisting}
Therefore, contract assertions as proposed here can act as full drop-in replacements for \tcode{assert} macros, and that replacement is easily toolable (search and replace \tcode{assert} with \tcode{contract_assert}).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Discussion}
\subsection{No parsing ambiguities with \tcode{pre} and \tcode{post}}
\label{subsec:noambig}
It has been suggested that the natural syntax for preconditions and postconditions might create parsing ambiguities with the other parts of a function declaration, such as a trailing return type or \tcode{requires} clause, if \tcode{pre} or \tcode{post} are used as identifiers for variables, functions, or types; but this is not actually the case.
The grammar for a trailing return type is \tcode{->} \emph{type-id}, and we can unambiguously tell when that \emph{type-id} ends and a \emph{pre-or-postcondition} begins:
%\vspace{2mm}
\begin{codeblock}
auto f() -> pre pre(a); // OK, \tcode{pre} is the return type, \tcode{pre(a)} the precondition
auto g() -> pre<post> pre(a); // OK, \tcode{pre<post>} is the return type, \tcode{pre(a)} the precondition
\end{codeblock}
%\vspace{2mm}
Further, note that \tcode{requires} clauses use a restricted grammar where the expression following the \tcode{requires} keyword must be a \emph{primary-expression} or a sequence of \emph{primary-expression}s combined with the \tcode{\&\&} or \tcode{||} operators. Any other type of expression, such as a mathematical expression, a cast, or a function call, must be surrounded by parentheses, otherwise the program is ill-formed:
%\vspace{2mm}
\begin{codeblock}
template <typename T>
void g() requires pre(a); // ill-formed today
template <typename T>
void h() requires (pre(a)); // OK
template <typename T>
void j() requires (b)pre(a); // ill-formed today
template <typename T>
void k() requires ((b)pre(a)); // OK
template <typename T>
void l() requires a < b > pre(c); // ill-formed today
template <typename T>
void m() requires (a < b > pre(c)); // OK
\end{codeblock}
%\vspace{2mm}
Therefore, just like with the trailing return type, we can unambiguously tell when the expression ends and a \emph{pre-or-postcondition} begins:
%\vspace{2mm}
\begin{codeblock}
template <typename T>
void f() requires (b) pre(a); // OK, \tcode{pre(a)} is the precondition
template <typename T>
void g() requires a < b > pre(c); // OK, \tcode{a<b>} is a variable template, \tcode{pre(c)} is the precondition
\end{codeblock}
%\vspace{2mm}
There are further no parsing ambiguities when any given precondition (or postcondition) ends and the next one begins, as the predicate must always be surrounded by parentheses. Therefore, it is also fine to use \tcode{pre} and \tcode{post} as identifiers inside the predicate. They are parsed as keywords only in the syntactic place where they act as such. Everywhere else the usual grammar rules apply:
%\vspace{2mm}
\begin{codeblock}
void f(bool pre, bool post)
pre(pre) pre(post); // OK
\end{codeblock}
%\vspace{2mm}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{The \tcode{assert} name clash}
\label{subsec:assrt}
With the syntax we propose here, the best possible keyword for assertions would be \tcode{assert}. This is the keyword used in virtually all contracts-related proposals so far, including attribute-like syntax; the keyword that most users would intuitively expect; and the keyword used in the vast majority of programming languages (Python, Rust, Scala, Swift, Kotlin, etc.) for this purpose. In short, the keyword \tcode{assert} is well-established practice and superior to any keyword that is not \tcode{assert}.
Unfortunately, it also creates a name clash with the existing \tcode{assert} macro from header \tcode{cassert}:
%\vspace{2mm}
\begin{codeblock}
#include <cassert>
void f() {
int i = get_i();
assert(i >= 0); // identical syntax for contract assert and macro assert!
use_i(i);
}
\end{codeblock}
%\vspace{2mm}
There are in principle three ways to resolve this name clash while keeping the natural syntax:
\begin{enumerate}
\item Remove support for header \tcode{cassert} from C++ entirely, making it ill-formed to \tcode{\#include} it;
\item Do not make \tcode{\#include <cassert>} ill-formed (perhaps deprecate it), but make \tcode{assert} a keyword rather than a macro, and silently change the behaviour to being a contract assertion instead of an invocation of the macro;
\item Let \tcode{assert} be the macro from header \tcode{cassert} when that header is included, and a contract assertion otherwise;
\item Use a keyword other than \tcode{assert} for contract assertions to avoid the name clash.
\end{enumerate}
Option 1 seems too draconian as it would break huge amounts of existing code, including code shared between C and C++.
Option 2 looks very compelling. The default behaviour of macro \tcode{assert} is actually identical to the default behaviour of a contract assertion: print a diagnostic, then terminate the program. Contract-specific extensions like a user-defined violation handler will not affect pre-existing code. Replacing macro \tcode{assert} with a full keyword would also solve all the current issues with \tcode{assert} being a macro: it cannot be exported from a Standard Library module; it does not work with matched brackets syntax for brackets other than parentheses, such as \tcode{<...>}, \tcode{\{...\}}, and \tcode{[...]}, and as a result, many C++ constructs such as \tcode{assert(X\{1, 2\})} or \tcode{assert(Y<int, int>)} are ill-formed today; etc. (see also \cite{P2264R5} and \cite{P2884R0}).
However, there is a significant problem with this approach: the behaviour of macro \tcode{assert} is is tied to whether \tcode{NDEBUG} is defined. To mimic the existing behaviour, we would have to change the default contract semantic\footnote{The status quo in the Contracts MVP is that the contract semantic of any contract check is implementation-defined; the recommended default contract semantic is \emph{enforce} regardless of whether \tcode{NDEBUG} is defined, and this recommendation is not normative (see \cite{P2877R0}).} to always be \emph{ignore} when \tcode{NDEBUG} is defined, and \emph{enforce} otherwise. But this would not be enough: \emph{ignore} would still parse and odr-use the predicate, even if it is not evaluated, whereas \tcode{assert} just macroes out all the tokens. Therefore, common programming patterns like the following would break if we switch \tcode{assert} from macro to contract:
\begin{codeblock}
#ifndef NDEBUG
DebugThingy dbg;
#endif
void f() {
assert(dbg.checkSomething()); // OK with macro, syntax error with contract if \tcode{NDEBUG} not defined
// ...
}
\end{codeblock}
On the one hand, this would lead to an unacceptably high amount of breakage in existing C++ code. On the other hand, changing contracts to mimic the existing behaviour of macro \tcode{assert} with \tcode{NDEBUG} defined is not possible because the choice of contract semantic (\emph{enforce}, \emph{observe} or \emph{ignore}) is in general not known at compile time. It is therefore impossible to parse code differently, or take a different code path at compile time, depending on whether a contract is checked or ignored (see \cite{P2877R0} and \cite{P2834R1}).
The situation is complicated further by the fact that the meaning of the \tcode{assert} macro depends not only on whether \tcode{NDEBUG} is defined or not, but also on whether --- and where --- the \tcode{cassert} header is included or re-included after \tcode{NDEBUG} is defined, undefined, or redefined --- all of which can happen at any arbitrary point in the code.
Option 3 has been suggested as another way out of this dilemma. Unlike Options 1 or 2, it would not break any existing code, as \tcode{assert} would not change its meaning if the \tcode{cassert} header is included. However, this approach is not viable either, for several reasons. First, it would render contract assertions unusable anywhere the \tcode{cassert} header is included, which would limit their use; second, it would require special rules as header \tcode{cassert} would effectively redefine a reserved keyword, which is undefined behaviour under the current rules; third, and most importantly, the same expression --- \tcode{assert(x)} --- would mean different things depending on whether the \tcode{cassert} header is included somewhere in the program, making it impossible to tell in general which one of the two possible meanings it has (consider, for example, writing code in a library header, not knowing whether someone might \tcode{\#include <cassert>} before including that library header).
Considering all of the above, we consider Option 4 to be the only possible path forward. This direction was confirmed during the SG21 teleconference on 2023-09-21:
\begin{adjustwidth}{0.5cm}{0.5cm}
\textbf{Poll:} If we adopt the syntax in P2961R0 for the Contracts MVP, we should use the keyword \tcode{assert} for contract assertions, replacing macro \tcode{assert}.
%\hspace{6mm}
\begin{tabular}{lllll}
SF & F & N & A & SA \\
0 & 2 & 3 & 6 & 5
\end{tabular}
%\hspace{5mm}
\textbf{Result:} Consensus against
\end{adjustwidth}
Therefore, unless we settle for attribute-like syntax, we must find a viable alternative keyword that is not \tcode{assert}. Note that such a keyword may look weird and unfamiliar initially, but once it is standardised users tend to get used to it very quickly (see \tcode{co_yield}, \tcode{co_await}, etc.).
\subsection{The search for a keyword alternative to \tcode{assert}}
\label{subsec:keyword}
\subsubsection{Design goals}
\label{subsubsec:keyword_goals}
A large number of alternative keywords have been suggested since the initial version of this paper was released. Before we pick one, we need to agree on a set of design goals that will guide our decision. Our design goals, ordered by priority, are:
\begin{enumerate}
\item The keyword should not break existing code.
\item The keyword should not be offensive or inappropriate.
\item The keyword should not be misleading or confusing in its meaning.
\item The keyword should fit in with the other parts of the Contracts MVP (\tcode{pre}, \tcode{post}, identifier for assertions in \mbox{\tcode{std::contracts::contract_kind}}).
\item The keyword should be sufficiently consistent with the rest of C++.
\item The keyword should be easy to remember.
\item The keyword should not be too long or cumbersome to type out.
\item The keyword should not be too difficult to read and pronounce.
\item The keyword should not look too much like a typo of a common English word.
\item The keyword should also work for the C language.
\end{enumerate}
Note that these design goals are not absolute; they can be broken if there is a good engineering reason for it. Examples of existing keywords that go against some of these goals are: \tcode{requires}, claimed as a full keyword in C++20 that broke existing code; \tcode{alignas}, commonly treated as a typo for ``aligns'' by spell checkers; and \tcode{reinterpret_cast}, a long and ugly keyword. The latter is well-motivated: \tcode{reinterpret_cast} is an unsafe feature that should be used sparingly and therefore should be hard to type. Conversely, contracts exist to improve safety and correctness and their use should be encouraged. We therefore expect the new keyword to be typed frequently, so it should be easy to type.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{table}
\centering
\begin{tabular}{| l | p{2.2cm} | p{2.2cm} | p{2.2cm} |}
\hline
Keyword & ACTCD19 & grep.app & Sourcegraph \\ \hline
\tcode{contractassert} & 0 & 0 & 0 \\
\tcode{mustexpr} & 0 & 0 & 0 \\
\tcode{dyn_assert} & 0 & 0 & 0 \\
\tcode{musthold} & 0 & 0 & 0 \\
\tcode{asrtexpr} & 0 & 0 & 0 \\
\tcode{stdassert} & 0 & 0 & 1 \\
\tcode{truexpr} & 0 & 0 & 2 \\
\tcode{co_assert} & 0 & 0 & 7 \\
\tcode{ccassert} & 0 & 0 & 11 \\
\tcode{contract_assert} & 0 & 0 & 11 \\
\tcode{std_assert} & 0 & 0 & 11 \\
\tcode{dyn_check} & 0 & 0 & 18 \\
\tcode{mustbetrue} & 0 & 0 & 48 \\
\tcode{assertexpr} & 0 & 0 & 49 \\
\tcode{assertion_check} & 0 & 0 & 71 \\
\tcode{cppassert} & 0 & 1 & 103 \\
\tcode{dynamic_assert} & 0 & 20 & 631 \\
\tcode{cca_assert} & 0 & 0 & 0 \\
\tcode{assrt} & 2 & 5 & 1191 \\
\tcode{runtime_assert} & 5 & 68 & 1090 \\
\tcode{_Assert} & 8 & 43 & 3014 \\
\tcode{xpct} & 9 & 0 & 328 \\
\tcode{assert_check} & 20 & 13 & 667 \\
\tcode{assert2} & 40 & 40 & 3604 \\
\tcode{cpp_assert} & 59 & 31 & 2037 \\
\tcode{affirm} & 65 & 26 & 842 \\
\tcode{__assert} & 98 & 717 & 16256 \\
\tcode{assess} & 163 & 358 & 3814* \\
\tcode{insist} & 174 & 422 & 18081* \\
\tcode{asrt} & 256 & 28 & 1844 \\
\tcode{cassert} & 380 & 65348*** & 1024935*,*** \\
\tcode{aver} & 427 & 56 & 3310* \\
\tcode{posit} & 617 & 74 & 9837* \\
\tcode{enforce} & 1230 & 8347** & 375985*,** \\
\tcode{audit} & 1619 & 1268 & 161814* \\
\tcode{claim} & 2800 & 21582** & 784227*,**\\
\tcode{ass} & 4145 & 602 & 14193* \\
\tcode{must} & 4263 & 403923** & 15899541*,** \\
\tcode{confirm} & 4341 & 4716** & 183121*,**\\
\tcode{assertion} & 5433 & 16239** & 903896*,**\\
\tcode{ensure} & 8149 & 67176** & 819402*,**\\
\tcode{chk} & 11783 & 1329 & 211023* \\
\tcode{verify} & 20727 & 33283** & 1767686*,** \\
\tcode{expect} & 43725 & 30710** & 769879*,** \\
\tcode{check} & 147315 & 228702** & 5409830*,** \\
\hline
\end{tabular}
\vspace{3mm}
\caption{\label{table:codesearch}Number of usages of candidate keywords as identifiers in existing open-source C++ code according to three different code search engines, sorted by matches in ACTCD19 (877 MLoC). \\ * The given number is a lower bound as the search engine hit a match limit.\\ ** There is a significant number of false positives due to matches in code comments. \\ *** There is a significant number of false positives due to matches in include directives.}
\end{table}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Code search in existing C++ code}
Having formulated our design goals, we can now evaluate how well our candidate keywords satisfy our first design goal: no code breakage. We ran all proposed candidate keywords through three different code search engines; the results are summarised in Table \ref{table:codesearch}.
The first code search engine we consulted is codesearch.isocpp.org. It is based on the ACTCD19 dataset and was created specifically for studying existing practice of C++. The advantage of this engine is that it is token-based, that is, it finds usages of a given keyword as a user-declared identifier (which is exactly what we are after, as this is the usage that would break existing code) while excluding all other usages of the same keyword. The disadvantage of this engine is that the dataset is large, but not huge (877 MLoc --- but at least we know the size of the dataset), predates C++20 (March 2019), and is somewhat skewed as it was taken from the source packages of the third party software package repository of a particular Linux distribution (Debian Sid).
The second code search engine we consulted is grep.app, which searches ``over a half million public repositories on GitHub''; judging by the amount of matches, the total amount of available C++ code seems to be of the same order of magnitude as the ACTCD19 dataset. This engine is text-based, not token-based. We restricted the search to C++ and used the ``Whole word'' and ``Code sensitive'' options but could not figure out how to exclude usages of the keyword as a string literal and as a single word inside a comment. Due to the latter, there is a high number of false positives for words that are commonly used in English sentences, such as ``must'' and ``check''. However, the number of false positives is negligible for keywords that are not English words. In addition, this method also matches usages of the keyword as a header name in an include directive. This leads to a high number of false positives for \tcode{cassert} but is otherwise negligible.
The third code search engine we consulted is sourcegraph.com, which has the largest dataset of the three engines as it searches a much larger set of public GitHub repositories than grep.app (we do not know how large). This engine is also text-based but has a regex option. We restricted the search to C++ and used the regex \tcode{\textbackslash b\emph{keyword}\textbackslash b} which only matches if the keyword is found as a whole word, case sensitive. Unfortunately, we hit a problem where the engine refuses to process more complicated regular expressions that could exclude more cases. Therefore, we have the same kinds of false positives as with grep.app.
We have not conducted any studies to evaluate usage of our candidate keywords in closed-source C++ code.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Choosing the best candidate}
With the results in Table \ref{table:codesearch}, we can narrow down our pool of candidates. While we certainly would not exclude an otherwise great keyword due to a handful of matches, we would like to avoid a keyword that has hundreds or even thousands of matches in existing open-source repositories. The quality of the matches also matters: matches in relatively unknown repositories have less significance than matches in highly popular, foundational C++ libraries such as Boost, Qt, Clang, or the Unreal Engine.
With that in mind, we can go through all candidate keywords from top to bottom and evaluate how well they satisfy our design goals formulated in \ref{subsubsec:keyword_goals}.
The first keyword candidate, \tcode{contractassert}, is arguably harder to read than the spelling variant \mbox{\tcode{contract_assert}}.
The next candidate, \tcode{mustexpr} as well as other similar candidates not derived from the word ``assert'' such as \tcode{musthold}, \tcode{truexpr}, etc. are arguably not consistent with the existing community knowledge about Contracts. The terminology ``assert'' and the contract kind ``assertion'' are established terms of art and the keyword should show at least some connection to this terminology. Having a keyword not derived from ``assert'' also suffers from another problem: how should we name the corresponding enum value in \mbox{\tcode{std::contracts::contract_kind}}? Should we leave it at \tcode{assert}, thereby having a keyword completely inconsistent with the enum, or should we change it to match the keyword, thereby having an enum that is likewise inconsistent with established terminology? Note also that if we choose the latter, we cannot use the exact spelling of the keyword (a full keyword cannot be used as a name for an enum value), but need to come up with some alternative spelling of that keyword. None of these options seem particularly appealing.
Candidates such as \tcode{dyn_assert}, or any other keyword that contains variations of the words ``dynamic'' or ``runtime'', are misleading as a contract is not necessarily dynamic or checked at runtime (e.g. axiom contracts used for static analysis).
\tcode{asrtexpr}, \tcode{asrt}, and \tcode{assrt} are too hard to spell correctly as it is difficult to remember which letters need to be skipped. In addition, \tcode{asrt} and \tcode{assrt} might be interpreted as a typo and auto-corrected to ``assert''. Similar problems exist with \tcode{xpct} and \tcode{chk}.
\tcode{stdassert} and \tcode{std_assert} are somewhat misleading because the prefix \tcode{std} is usually used for features in the C++ Standard Library, whereas contracts are a core language feature.
\tcode{co_assert} is misleading as the prefix \tcode{co_} is used today exclusively for keywords that turn a function into a coroutine: \tcode{co_yield}, \tcode{co_await}, and \tcode{co_return}, but the keyword for contract assertions has nothing to do with coroutines.
\tcode{ccassert} is confusing as it is not clear at all with the \tcode{cc} stands for. We assume that it is intended as a clever portmanteau of ``CCA'' (contract-checking-annotation) and ``assert'', but this is a bit too clever for a standard C++ keyword. Besides, many users will not know what ``CCA'' stands for, which is why we also do not like \tcode{cca_assert}.
\tcode{contract_assert} and \tcode{assertexpr} do not seem to violate any of our design goals. Neither of them are perfect: \tcode{contract_assert} is a bit too long, but only two characters longer than the common keyword \tcode{static_assert} and therefore still seems viable; and \tcode{assertexpr} has the \tcode{expr} suffix which currently is only used for \tcode{constexpr}, a keyword used for declarations, which might be seen as a slight inconsistency. However, none of these properties are serious problems, and both keywords also have important advantages: \mbox{\tcode{contract_assert}} is extremely clear and descriptive, while \tcode{assertexpr} emphasises that a contract assertion is, in fact, an expression (and not a macro, a statement, or an attribute-like thing appertaining to a statement) and can be used everywhere an expression can be used. \tcode{assertexpr} is also nicely consistent with the corresponding grammar term \emph{assert-expression} (see section \ref{subsec:grammar}). All things considered, we believe that \tcode{contract_assert} and \tcode{assertexpr} are the two least bad options and therefore we would like to propose these two as our two preferred candidates.
Further down the list, there is \tcode{assertion_check}, which seems unnecessarily verbose; \tcode{_Assert} and \tcode{__assert}, which are reserved for implementations of the C++ Standard Library (and used in multiple such implementations); and various candidates that are very similar to the ones already rejected and suffer from the same problems.
At this point in the list, the number of existing usages as a user-defined identifier quickly starts to increase to an unacceptable level. We therefore did not include any further keyword candidates to the shortlist besides \tcode{contract_assert} and \tcode{assertexpr}.
The choice between these two candidates was made by poll at the SG21 telecon on 2023-10-26. The results were very clear:
\begin{adjustwidth}{0.5cm}{0.5cm}
\textbf{Poll 1:} In case we choose to adopt the ``natural syntax'' for Contracts as proposed in
P2961, we should use \tcode{assertexpr} as the keyword for assertions.
%\hspace{6mm}
\begin{tabular}{lllll}
SF & F & N & A & SA \\
1 & 3 & 8 & 7 & 0
\end{tabular}
%\hspace{5mm}
\textbf{Result:} No consensus
\end{adjustwidth}
\begin{adjustwidth}{0.5cm}{0.5cm}
\textbf{Poll 2:} In case we choose to adopt the ``natural syntax'' for Contracts as proposed in
P2961, we should use \tcode{contract_assert} as the keyword for assertions.
%\hspace{6mm}
\begin{tabular}{lllll}
SF & F & N & A & SA \\
9 & 11 & 0 & 0 & 1
\end{tabular}
%\hspace{5mm}
\textbf{Result:} Consensus
\end{adjustwidth}
The decisive argument was as follows. It was acknowledged that \tcode{assertexpr} and \tcode{contract_assert} are both equally good candidates if we look at C++ in isolation. However, if we look at the broader programming language community, where practically every other programming language uses \tcode{assert} for this purpose, \tcode{contract_assert} seems a lot more explainable than \tcode{assertexpr} because the \tcode{expr} suffix looks very unusual outside of the C++ context.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Implementation experience}
\label{subsec:implxp}
A fully-functional experimental implementation of the natural syntax proposed here has been developed in GCC by Ville Voutilainen. It is currently publicly available on Compiler Explorer (\href{https://godbolt.org}{https://godbolt.org}) by selecting ``"x86-64 gcc (contracts natural syntax)`` from the compiler selection drop-down menu and passing the flags \tcode{-fcontracts -fcontracts-nonattr} to it. The first enables contracts in general, the second enables the natural syntax. A working test suite can be found at \href{https://godbolt.org/z/qPKharqqv}{https://godbolt.org/z/qPKharqqv}.
In addition, an implementation of the natural syntax proposed here has been developed in the cppfront compiler by Herb Sutter. Herb reported that this took him under an hour to convert his existing Contracts implementation from the attribute-like syntax to this paper’s syntax, with some post-MVP extra features such as labels (aka ``contract_group names''), including to update the uses of Contracts in the tests and the compiler itself to the new syntax. cppfront is also available on Compiler Explorer.
Neither of the two have reported significant challenges with implementing the natural syntax.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Viability for the C language}
\label{subsec:clang}
We consulted with SG22 (C/C++ Liaison Group) and requested their opinion on 1) whether there is interest in standardising Contracts for C in a way compatible with C++ and usable in code shared between C and C++; 2) whether they would prefer the natural syntax presented in this paper or the attribute-like syntax \cite{P2935R3} for this purpose; 3) whether there are C-specific technical concerns with either syntax. We received extensive feedback both on the SG22 reflector discussion and at a formal SG22 meeting that was convened for this purpose. In this section we summarise the feedback we received.
Regarding the first question, SG22 expressed interest in standardising Contracts for C in a way compatible with C++, and noted that it is possible they will end up standardising just a subset of the C++ facility.
Regarding the second question, SG22 expressed a preference for the natural syntax we propose in this paper over attribute-like syntax. The poll results were as follows:
\begin{adjustwidth}{0.5cm}{0.5cm}
\textbf{Poll 1:} Would SG22 find it acceptable if SG21 adopts the attribute-like syntax as proposed in P2935 for Contracts in C++?
%\hspace{6mm}
\begin{tabular}{l lllll}
&SF & F & N & A & SA \\
WG14 members: & 0 & 0 & 0 & 4 & 2 \\
WG21 members: & 0 & 2 & 2 & 6 & 0
\end{tabular}
\end{adjustwidth}
\begin{adjustwidth}{0.5cm}{0.5cm}
\textbf{Poll 2:} Would SG22 find it acceptable if SG21 adopts the natural syntax as proposed in P2961 for Contracts in C++?
%\hspace{6mm}
\begin{tabular}{l lllll}
&SF & F & N & A & SA \\
WG14 members: & 3 & 1 & 2 & 0 & 0 \\
WG21 members: & 3 & 6 & 1 & 0 & 0
\end{tabular}
\end{adjustwidth}
Regarding the third question, the only technical concern raised with the natural syntax is that C does not have conditional keywords and they would like to avoid introducing them. At the same time, \tcode{pre} and \tcode{post} are too frequent in existing C code to be claimed as full keywords. However, there is a straightforward workaround. We could define the keywords in C as \tcode{_Pre} and \tcode{_Post}, respectively, and then provide convenience macros \tcode{pre} and \tcode{post}, respectively, in a new C header \tcode{<stdcontracts.h>} that expand to the C keywords. The same technique has been successfully used for numerous earlier keywords such as \tcode{_Bool}, \tcode{_Static_assert}, \tcode{_Alignas}, \tcode{_Thread_local}, etc.
The SG22 discussion revealed that \emph{backwards-compatibility}, i.e. the ability to add contract checks to existing code and to have this code still compile with an old compiler that does not know anything about Contracts, is a much more important concern in C than in C++\footnote{According to the SG21 electronic poll on syntax requirements in \cite{P2885R3}, such backwards-compatibility was deemed \emph{irrelevant} for Contracts in C++ by a majority of respondents.}. The natural syntax provides such backwards-compatibility because it makes all contract checks syntactically compatible with function-like macros:
\begin{codeblock}
#if COMPILER_UNDERSTANDS_CONTRACTS
#define pre(x) _Pre(x)
#else
#define pre(x)
#endif
// same for \tcode{post} and \tcode{assert}
\end{codeblock}
This property was stated as an important reason for supporting natural syntax over attribute-like syntax. However, the above technique will stop working if we introduce post-MVP extensions that add syntactic elements outside of the parentheses, such as captures and labels,
which might also be relevant for C. Adding such elements inside the top-level parentheses would keep this technique working, especially if such elements themselves look like plain function invocations. Essentially, instead of
\lstset{emph={labels,captures,predicate}, emphstyle=\itshape}
\begin{codeblock}
pre <labels> [captures] (predicate)
\end{codeblock}
we would have to do something along the lines of
\begin{codeblock}
pre (labels, captures, predicate)
\end{codeblock}
\lstset{emph={}, emphstyle=\itshape}
It was further noted that many of the keywords starting with underscore + capital letter were later evolved to all lowercase keywords identical to C++ (\tcode{bool}, \tcode{static_assert}, etc.) which is desirable for Contracts too but is most likely not possible if we stick with the keywords \tcode{pre} and \tcode{post} given how common they are today as identifiers.
Regarding attribute-like syntax, an important difference between C and C++ is that the notion of ignorability of attributes is much stronger in C. In C++, attributes may be semantically optional but still need to be fully parsed, and syntax errors diagnosed (also known as the First Ignorability Rule for attributes, see \cite{P2552R3}). On the other hand, in C, everything inside the \tcode{[[...]]} of an attribute may be treated as balanced token soup and skipped entirely. This is due to specific C compilers (outside of the three major ones) that choose to not implement attributes but wish to remain conforming. Using attribute-like syntax for Contracts in C suggests that contract checks may also be token-ignored in the same way; most people agreed that this is an undesirable property for contract checks.
Note that unlike the function-like macro approach, the allowance in C to token-ignore attributes does not actually provide backwards-compatibility of Contracts with older compilers in practice: all major C compilers treat attribute-like contract checks as syntax errors today because of the colon. Note further that regardless of which syntax we choose for Contracts, backwards-compatibility with older compilers can always be achieved by wrapping the contract checks themselves into macros rather than the keywords used for them; however, it was noted that from the C perspective it might be preferable to avoid this.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Post-MVP extensions}
The natural syntax proposal provides a natural path for evolution into all of the post-MVP directions that have been suggested so far. In this section, we discuss several possible post-MVP extensions that are of interest to SG21 according to the electronic poll results in \cite{P2885R3} or that have been brought up in discussion since the poll results were published.
\subsection{Captures}
\label{subsec:captures}
The grammar of the natural syntax can be extended as follows to allow captures on contract checks:
\begin{adjustwidth}{0.5cm}{0.5cm}
\emph{pre-or-post-condition:} \\
\phantom{~~~}\tcode{pre} \emph{contract} \\
\phantom{~~~}\tcode{post} \emph{contract}
\emph{assert-expression:} \\
\phantom{~~~}\tcode{contract_assert} \emph{contract}
\emph{contract:} \\
\phantom{~~~}\added{\emph{contract-capture}$_{opt}$} \emph{contract-condition}
\added{\emph{contract-capture:}} \\
\phantom{~~~}\added{\tcode{[} \emph{capture-list} \tcode{]}}
\end{adjustwidth}
Here is a code example:
%\vspace{2mm}
\begin{codeblock}
void vector::push_back(const T& v)
post [old_size = size()] ( size() == old_size + 1 ); // init-capture
\end{codeblock}
%\vspace{2mm}
Note that with natural syntax, a contract check with a capture looks very similar to closure-based syntax, except that the predicate is in parentheses instead of braces. This is the natural choice and avoids making the contract check look like a lambda (an entirely different construct). Instead, the syntax looks exactly like the thing that it is: a capture followed by a predicate using that capture. It is a new syntax for a new type of construct, yet it immediately looks familiar and intuitive.
Note further that with natural syntax, we have the same design freedom as closure-based syntax \cite{P2461R1} to allow the full capture syntax from lambdas, including default-captures:
%\vspace{2mm}
\begin{codeblock}
int min(int x, int y)
post [x, y] (r: r <= x && r <= y ); // possible with natural syntax
\end{codeblock}
%\vspace{2mm}
Of course we could also choose to restrict ourselves to init-captures as \cite{P2935R3} does.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Destructuring the return value}
\label{subsec:struct}
We can easily and naturally extend the natural syntax to add support for destructing the return value of a function with a structured binding when specifying a name for that value:
\emph{contract-condition:} \\
\phantom{~~~}\tcode{(} \emph{return-name}$_{opt}$ \emph{conditional-expression} \tcode{)}
\emph{return-name:}\\
\phantom{~~~}\emph{identifier} \tcode{:} \\
\phantom{~~~}\added{\tcode{[} \emph{identifier-list} \tcode{] :}}
This can be very useful in the postcondition of a function that returns a value of a tuple-like type:
%\vspace{2mm}
\begin{codeblock}
std::tuple<int, int, int> f()
post ([x, y, z] : x != y && y != z);
\end{codeblock}
%\vspace{2mm}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{\tcode{requires}-clauses on contracts}
\label{subsec:requires}
The grammar of the natural syntax can be extended to allow a \tcode{requires} clause that appertains to an individual contract check. There are at least two possible syntactic positions for such a \tcode{requires} clause. We could place it at the end, after the predicate:
\begin{adjustwidth}{0.5cm}{0.5cm}
\emph{contract:} \\
\phantom{~~~}\emph{contract-capture}$_{opt}$ \emph{contract-condition} \added{\emph{requires-clause}$_{opt}$}
\end{adjustwidth}
In code:
%\vspace{2mm}
\begin{codeblock}
template <typename T>
void f(T x)
pre (x > 0) requires std::integral<T>;
\end{codeblock}
%\vspace{2mm}
Alternatively, if we want to make the \tcode{requires} clause more visually prominent, we could place it at the beginning of the contract check, right after the \tcode{pre} or \tcode{post} contextual keyword:
\begin{adjustwidth}{0.5cm}{0.5cm}
\emph{contract:} \\
\phantom{~~~}\added{\emph{requires-clause}$_{opt}$} \emph{contract-capture}$_{opt}$ \emph{contract-condition}
\end{adjustwidth}
In code:
%\vspace{2mm}
\begin{codeblock}
template <typename T>
void f(T x)
pre requires std::integral<T> (x > 0);
\end{codeblock}
%\vspace{2mm}
Note that neither option creates any parsing ambiguities, for the same reasons as discussed in section \ref{subsec:noambig}. Note further that both options allow for \tcode{requires} clauses appertaining to individual contract checks to coexist with a \tcode{requires} clause appertaining to the function itself:
%\vspace{2mm}
\begin{codeblock}
template <typename T>
void f(T x)
requires std::copyable<T>
pre (x > 0) requires std::integral<T>;
\end{codeblock}
%\vspace{2mm}
or, alternatively,
%\vspace{2mm}
\begin{codeblock}
template <typename T>
void f(T x)
requires std::copyable<T>
pre requires std::integral<T> (x > 0);
\end{codeblock}
%\vspace{2mm}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Attributes appertaining to contracts}
\label{subsec:attr}
Although not covered in \cite{P2885R3}, it has been argued that any new syntactic construct in C++, including contract checks, should allow for the possibility of standard attributes appertaining to it. Some meta-annotations that might be added to contracts post MVP could potentially be expressed as attributes appertaining to a contract check.
Support for attributes appertaining to a contract check is easy to accommodate with natural syntax. Since attributes are optional, ignorable information and are thus not part of a contract's primary information, we believe that it makes most sense to place them at the end of the contract check:
\begin{adjustwidth}{0.5cm}{0.5cm}
\emph{contract:} \\
\phantom{~~~}\emph{contract-capture}$_{opt}$ \emph{contract-condition} \added{\emph{attribute-specifier-seq}$_{opt}$}
\end{adjustwidth}
In code:
%\vspace{2mm}
\begin{codeblock}
void f(int x)
pre (x > 0) [[deprecated]];
\end{codeblock}
%\vspace{2mm}
However, just like with \tcode{requires} clauses, it is also possible to place the \emph{attribute-specifier-seq} at the beginning, right after the \tcode{pre} or \tcode{post} contextual keyword in case a more prominent syntactic position is desired:
\begin{adjustwidth}{0.5cm}{0.5cm}
\emph{contract:} \\
\phantom{~~~}\added{\emph{attribute-specifier-seq}$_{opt}$} \emph{contract-capture}$_{opt}$ \emph{contract-condition}
\end{adjustwidth}
In code:
%\vspace{2mm}
\begin{codeblock}
void f(int x)
pre [[deprecated]] (x > 0);
\end{codeblock}
%\vspace{2mm}
Note that in either case, there is no grammar ambiguity with the \emph{attribute-specifier-seq} appertaining to any other part of the function declaration, such as the function itself, the function type, or the trailing return type, because all other possible positions for the \emph{attribute-specifier-seq} precede the \emph{pre-or-postcondition}.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Labels}
\label{subsec:labels}
It has been suggested that post-MVP, we will need meta-annotations on a contract, so-called \emph{labels}, that should not be spelled as attributes because they are not ignorable. The only currently known use case for this is to specify or constrain the possible contract semantic (observe, ignore, enforce) for a given contract; other use cases might be discovered in the future. There are many ways in which the natural syntax could be extended to accommodate such labels.
If we want to consider such labels secondary information, we can place them at the end of the contract check. In order to keep the grammar unambiguous, we need to surround the sequence of labels with delimiter tokens. We cannot use \tcode{[[ ... ]]} because these are reserved for attributes (see section \ref{subsec:attr}), but we can use pretty much any other set of delimiter tokens, such as \mbox{\tcode{[ ...]}}, \mbox{\tcode{<...>}}, \mbox{\tcode{\{...\}}}, and so forth, or mark the labels by special characters such as \tcode{@}, depending on SG21's preference:
%\vspace{2mm}
\begin{codeblock}
void f(int x)
pre (x > 0) [audit]; // or \tcode{<audit>}, or \tcode{\{audit\}}, or \tcode{[\{audit\}]}, or \tcode{@audit} ...
\end{codeblock}
%\vspace{2mm}
On the other hand, if we want to consider such labels primary information, we can place them at the beginning of the contract check, right after the \tcode{pre} or \tcode{post} contextual keyword. In this case, we cannot use \mbox{\tcode{[...]}} as the delimiters anymore, as it would be ambiguous with the \emph{contract-capture} (see section \ref{subsec:captures}), but we can use any of the other options:
%\vspace{2mm}
\begin{codeblock}
void f(int x)
pre <audit> (x > 0); // or \tcode{\{audit\}}, or \tcode{[\{audit\}]}, or \tcode{@audit} ...
\end{codeblock}
%\vspace{2mm}
We could also allow both the leading and the trailing position. The natural syntax places no restrictions on the internal grammar for these labels. They can be specified to be any kind of token sequence, depending on the design direction we choose post MVP.
One interesting possibility is to specify that the label, or set of labels, shall be a constant expression that evaluates to a compile-time value defining the desired per-contract configuration, perhaps to a value of some new type \tcode{std::contract_traits} similar to \tcode{std::coroutine_traits}. Such a grammar opens up the power of constant expressions (i.e. almost the full language) for abstracting the computation of the per-contract configuration. The syntax with labels in leading position, right after \tcode{pre} or \tcode{post} and delimited by \mbox{\tcode{<...>}}, seems appealing for this design direction, as the contract check will resemble a template that is ``templated'' on its configuration (which acts as a non-type template parameter), and the constant expression acts as a template argument that ``instantiates'' (configures) the contract check. The grammar for this could look as follows:
\begin{adjustwidth}{0.5cm}{0.5cm}
\emph{contract:} \\
\phantom{~~~}\added{\emph{contract-eval-specifier}$_{opt}$} \emph{contract-capture}$_{opt}$ \emph{contract-condition}
\added{\emph{contract-eval-specifier:}} \\
\phantom{~~~}\added{\tcode{<} \emph{constant-expression} \tcode{>}}
\end{adjustwidth}
However, this is only one possible direction. With the natural syntax proposal, we are not cutting off any other directions. The main difference to labels in attribute-like syntax is that in the natural syntax, the label sequence goes between delimiter tokens, whereas in attribute-like syntax it goes between the \tcode{pre} or \tcode{post} keyword and the colon. Arguably, the natural syntax actually leaves more syntactic freedom for labels than attribute-like syntax does. In attribute-like syntax, the label sequence can syntactically clash with anything else that goes between the \tcode{pre} or \tcode{post} keyword and the colon, such as the name for the return value. On the other hand, in natural syntax, labels are guaranteed to not clash with anything else because they are separated from all other parts of the contract check by their own delimiter tokens.
Note also that with natural syntax, we can support both standard attributes and non-attribute labels appertaining to the same contract check simultaneously, for example:
%\vspace{2mm}
\begin{codeblock}
void f(int x)
pre <audit> (x > 0) [[ deprecated ]];
\end{codeblock}
%\vspace{2mm}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Class invariants}
\label{subsec:invariants}
In principle, we can use the same natural syntax at class scope to express class invariants:
\begin{codeblock}
class sorted_vector {
invariant(is_sorted());
// members and member functions…
};
\end{codeblock}
However, note that we cannot use a contextual keyword for the contract kind \tcode{invariant} as the grammar at class scope is already very crowded and there are multiple cases where this syntax is valid C++ code today:
\begin{codeblock}
struct invariant {
invariant(int()); // Case 1: Constructor taking a function pointer
};
bool b = true;
struct X {
invariant(b); // Case 2: Member declarator with extra parentheses
};
\end{codeblock}
We could attempt to fix the first case by introducing a new type of vexing parse for disambiguation, and the second case by banning the superfluous parentheses around a member declarator. A cleaner solution would be to claim a full keyword instead of a contextual one. However, the identifier \tcode{invariant} has 7379 matches in the ACTCD19 dataset which suggests that claiming it as a keyword would break too much existing code. If we want to introduce class invariants with this syntax, we would therefore have to get creative with the choice of keyword, similar to what we have to do for assertions (section \ref{subsec:keyword}). It has been suggested that we could even re-use the same keyword as for assertions, which would take on a different meaning at class scope to designate class invariants.
\subsection{Procedural interfaces}
\label{subsec:interfaces}
With procedural interfaces, we can express a much richer set of contracts than with preconditions and postconditions alone. The idea was first published by Lisa Lippincott in her paper \cite{P0465R0}. More recently, \cite{P2885R3} and \cite{P2935R3} mentioned the idea of integrating such procedural interfaces into a Contracts facility post-MVP.
With the natural syntax, we can support procedural interfaces with an interface block delimited by curly braces. This is the natural syntax in C++ for a block containing a list of statements, and very close to Lisa Lippincott's original notation in \cite{P0465R0}. Here is a code example in this syntax --- a procedural interface expressing the contract that a function should not throw an exception:
%\vspace{2mm}
\begin{codeblock}
void f(int x)
interface {
try {
implementation;
}
catch (...) {
contract_assert(false);
}
};