-
Notifications
You must be signed in to change notification settings - Fork 164
/
Copy pathspec.md
4278 lines (3285 loc) · 137 KB
/
spec.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!--
This file TOC is generated
Use `bazel run spec_md_gen` to regenerate it in place.
-->
# Starlark Language Specification
Starlark is a dialect of Python intended for use as a configuration
language. A Starlark interpreter is typically embedded within a larger
application, and this application may define additional
domain-specific functions and data types beyond those provided by the
core language. For example, Starlark is embedded within (and was
originally developed for) the [Bazel build tool](https://bazel.build).
This document was derived from the [description of the Go
implementation](https://github.com/google/starlark-go/blob/master/doc/spec.md)
of Starlark.
It was influenced by the Python specification,
Copyright 1990–2017, Python Software Foundation,
and the Go specification, Copyright 2009–2017, The Go Authors. It is
now maintained by the Bazel team.
## Overview
Starlark is an untyped dynamic language with high-level data types,
first-class functions with lexical scope, and automatic memory
management or _garbage collection_.
Starlark is strongly influenced by Python, and is almost a subset of
that language. In particular, its data types and syntax for
statements and expressions will be very familiar to any Python
programmer.
However, Starlark is intended not for writing applications but for
expressing configuration: its programs are short-lived and have no
external side effects and their main result is structured data or side
effects on the host application.
Starlark is intended to be simple. There are no user-defined types, no
inheritance, no reflection, no exceptions, no explicit memory management.
Execution is finite. The language does not allow recursion or unbounded loops.
Starlark is suitable for use in highly parallel applications. An application may
invoke the Starlark interpreter concurrently from many threads, without the
possibility of a data race, because shared data structures become immutable due
to _freezing_.
The language is deterministic and hermetic. Executing the same file with the
same interpreter leads to the same result. By default, user code cannot
interact with the environment.
## Contents
* [Overview](#overview)
* [Contents](#contents)
* [Lexical elements](#lexical-elements)
* [String literals](#string-literals)
* [Bytes literals](#bytes-literals)
* [Data types](#data-types)
* [None](#none)
* [Booleans](#booleans)
* [Integers](#integers)
* [Floating-point numbers](#floating-point-numbers)
* [Strings](#strings)
* [Bytes](#bytes)
* [Lists](#lists)
* [Tuples](#tuples)
* [Dictionaries](#dictionaries)
* [Functions](#functions)
* [Built-in functions](#built-in-functions)
* [Name binding and variables](#name-binding-and-variables)
* [Value concepts](#value-concepts)
* [Identity and mutation](#identity-and-mutation)
* [Freezing a value](#freezing-a-value)
* [Hashing](#hashing)
* [Sequence types](#sequence-types)
* [Indexing](#indexing)
* [Expressions](#expressions)
* [Identifiers](#identifiers)
* [Literals](#literals)
* [Parenthesized expressions](#parenthesized-expressions)
* [Dictionary expressions](#dictionary-expressions)
* [List expressions](#list-expressions)
* [Unary operators](#unary-operators)
* [Binary operators](#binary-operators)
* [Conditional expressions](#conditional-expressions)
* [Comprehensions](#comprehensions)
* [Function and method calls](#function-and-method-calls)
* [Dot expressions](#dot-expressions)
* [Index expressions](#index-expressions)
* [Slice expressions](#slice-expressions)
* [Lambda expressions](#lambda-expressions)
* [Statements](#statements)
* [Pass statements](#pass-statements)
* [Assignments](#assignments)
* [Augmented assignments](#augmented-assignments)
* [Function definitions](#function-definitions)
* [Return statements](#return-statements)
* [Expression statements](#expression-statements)
* [If statements](#if-statements)
* [For loops](#for-loops)
* [Break and Continue](#break-and-continue)
* [Load statements](#load-statements)
* [Module execution](#module-execution)
* [Built-in constants and functions](#built-in-constants-and-functions)
* [None](#none)
* [True and False](#true-and-false)
* [any](#any)
* [all](#all)
* [bool](#bool)
* [bytes](#bytes)
* [dict](#dict)
* [dir](#dir)
* [enumerate](#enumerate)
* [float](#float)
* [fail](#fail)
* [getattr](#getattr)
* [hasattr](#hasattr)
* [hash](#hash)
* [int](#int)
* [len](#len)
* [list](#list)
* [max](#max)
* [min](#min)
* [print](#print)
* [range](#range)
* [repr](#repr)
* [reversed](#reversed)
* [sorted](#sorted)
* [str](#str)
* [tuple](#tuple)
* [type](#type)
* [zip](#zip)
* [Built-in methods](#built-in-methods)
* [bytes·elems](#bytes·elems)
* [dict·get](#dict·get)
* [dict·items](#dict·items)
* [dict·keys](#dict·keys)
* [dict·pop](#dict·pop)
* [dict·popitem](#dict·popitem)
* [dict·setdefault](#dict·setdefault)
* [dict·update](#dict·update)
* [dict·values](#dict·values)
* [list·append](#list·append)
* [list·clear](#list·clear)
* [list·extend](#list·extend)
* [list·index](#list·index)
* [list·insert](#list·insert)
* [list·pop](#list·pop)
* [list·remove](#list·remove)
* [string·capitalize](#string·capitalize)
* [string·count](#string·count)
* [string·elems](#string·elems)
* [string·endswith](#string·endswith)
* [string·find](#string·find)
* [string·format](#string·format)
* [string·index](#string·index)
* [string·isalnum](#string·isalnum)
* [string·isalpha](#string·isalpha)
* [string·isdigit](#string·isdigit)
* [string·islower](#string·islower)
* [string·isspace](#string·isspace)
* [string·istitle](#string·istitle)
* [string·isupper](#string·isupper)
* [string·join](#string·join)
* [string·lower](#string·lower)
* [string·lstrip](#string·lstrip)
* [string·partition](#string·partition)
* [string·removeprefix](#string·removeprefix)
* [string·removesuffix](#string·removesuffix)
* [string·replace](#string·replace)
* [string·rfind](#string·rfind)
* [string·rindex](#string·rindex)
* [string·rpartition](#string·rpartition)
* [string·rsplit](#string·rsplit)
* [string·rstrip](#string·rstrip)
* [string·split](#string·split)
* [string·splitlines](#string·splitlines)
* [string·startswith](#string·startswith)
* [string·strip](#string·strip)
* [string·title](#string·title)
* [string·upper](#string·upper)
* [Grammar reference](#grammar-reference)
## Lexical elements
A Starlark program consists of one or more modules. Each module is defined by a
single UTF-8-encoded text file.
Starlark grammar is introduced gradually throughout this document as shown below,
and a [complete Starlark grammar reference](#grammar-reference) is provided at the end.
Grammar notation:
```text
- lowercase and 'quoted' items are lexical tokens.
- Capitalized names denote grammar productions.
- (...) implies grouping.
- x | y means either x or y.
- [x] means x is optional.
- {x} means x is repeated zero or more times.
- The end of each declaration is marked with a period.
```
The contents of a Starlark file are broken into a sequence of tokens of
five kinds: white space, punctuation, keywords, identifiers, and literals.
Each token is formed from the longest sequence of characters that
would form a valid token of each kind.
```text
File = {Statement | newline} eof .
```
*White space* consists of spaces (U+0020), tabs (U+0009), carriage
returns (U+000D), and newlines (U+000A). Within a line, white space
has no effect other than to delimit the previous token, but newlines,
and spaces at the start of a line, are significant tokens.
*Comments*: A hash character (`#`) appearing outside of a string or bytes
literal marks the start of a comment; the comment extends to the end
of the line, not including the newline character.
Comments are treated like other white space.
*Punctuation*: The following punctuation characters or sequences of
characters are tokens:
```text
+ - * // % **
~ & | ^ << >>
. , = ; :
( ) [ ] { }
< > >= <= == !=
+= -= *= //= %=
&= |= ^= <<= >>=
```
*Keywords*: The following tokens are keywords and may not be used as
identifiers:
```text
and else load
break for not
continue if or
def in pass
elif lambda return
```
The tokens below also may not be used as identifiers although they do not
appear in the grammar; they are reserved as possible future keywords:
<!-- and to remain a syntactic subset of Python -->
```text
as import
assert is
class nonlocal
del raise
except try
finally while
from with
global yield
```
*Identifiers*: an identifier is a sequence of Unicode letters, decimal
digits, and underscores (`_`), not starting with a digit.
Identifiers are used as names for values.
Examples:
```text
None True len
x index starts_with arg0
```
*Literals*: literals are tokens that denote specific values. Starlark
has integer, floating-point, string, and bytes literals.
```text
0 # int
123 # decimal int
0x7f # hexadecimal int
0o755 # octal int
0.0 0. .0 # float
1e10 1e+10 1e-10
1.1e10 1.1e+10 1.1e-10
"hello" 'hello' # string
'''hello''' """hello""" # triple-quoted string
r'hello' r"hello" # raw string literal
b"hello" b'hello' # bytes
b'''hello''' b"""hello""" # triple-quoted bytes
rb'hello' br"hello" # raw bytes literal
```
Integer and floating-point literal tokens are defined by the following grammar:
```text
int = decimal_lit | octal_lit | hex_lit | 0 .
decimal_lit = ('1' … '9') {decimal_digit} .
octal_lit = '0' ('o' | 'O') octal_digit {octal_digit} .
hex_lit = '0' ('x' | 'X') hex_digit {hex_digit} .
float = decimals '.' [decimals] [exponent]
| decimals exponent
| '.' decimals [exponent]
.
decimals = decimal_digit {decimal_digit} .
exponent = ('e'|'E') ['+'|'-'] decimals .
decimal_digit = '0' … '9' .
octal_digit = '0' … '7' .
hex_digit = '0' … '9' | 'A' … 'F' | 'a' … 'f' .
```
It is a static error if a floating-point literal denotes a value whose
magnitude is too large to be represented as a finite `float` value.
### String literals
A Starlark string literal denotes a string value.
In its simplest form, it consists of the desired text
surrounded by matching single- or double-quotation marks:
```python
"abc"
'abc'
```
Literal occurrences of the chosen quotation mark character must be
escaped by a preceding backslash. So, if a string contains several
of one kind of quotation mark, it may be convenient to quote the string
using the other kind, as in these examples:
```python
'Have you read "To Kill a Mockingbird?"'
"Yes, it's a classic."
"Have you read \"To Kill a Mockingbird?\""
'Yes, it\'s a classic.'
```
#### String escapes
Within a string literal, the backslash character `\` indicates the
start of an _escape sequence_, a notation for expressing things that
are impossible or awkward to write directly.
The following *traditional escape sequences* represent the ASCII control
codes 7-13:
```
\a \x07 alert or bell
\b \x08 backspace
\f \x0C form feed
\n \x0A line feed
\r \x0D carriage return
\t \x09 horizontal tab
\v \x0B vertical tab
```
A *literal backslash* is written using the escape `\\`.
An *escaped newline*---that is, a backslash at the end of a line---is ignored,
allowing a long string to be split across multiple lines of the source file.
```python
"abc\
def" # "abcdef"
```
An *octal escape* encodes a single string element using its octal value.
It consists of a backslash followed by one, two, or three octal digits [0-7].
Simiarly, a *hexadecimal escape* encodes a single string element using its hexadecimal value.
It consists of `\x` followed by two hexadecimal digits [0-9a-fA-F].
It is an error if the value of an octal or hexadecimal escape is greater than decimal 127.
```python
'\0' # "\x00" a string containing a single NUL element
'\12' # "\n" octal 12 = decimal 10
'\101-\132' # "A-Z"
'\119' # "\t9" = "\11" + "9"
'\x00' # "\x00" a string containing a single NUL element
'\x0A' # "\n" hexadecimal A = decimal 10
"\x41-\x5A" # "A-Z"
```
A *Unicode escape* denotes the UTF-K encoding of a single, valid Unicode code point,
where K is the implementation-defined number of bits in each string element
(see [strings](#strings)).
The `\uXXXX` form, with exactly four hexadecimal digits,
denotes a 16-bit code point, and the `\UXXXXXXXX`,
with exactly eight digits, denotes a 32-bit code point.
It is an error if the value lies in the surrogate range (U+D800 to U+DFFF)
or is greater than U+10FFFF.
```python
'\u0041' # "A", an ASCII letter (U+0041)
'\u0414' # "Д", a Cyrillic capital letter (U+0414)
'\u754c # "界", a Chinese character (U+754C)
'\U0001F600' # "😀", an Emoji (U+1F600)
```
The length of the encoding of a single Unicode code point may vary
based on the implementation's value of K:
```python
len("A") # 1
len("Д") # 2 (UTF-8) or 1 (UTF-16)
len("界") # 3 (UTF-8) or 1 (UTF-16)
len("😀") # 4 (UTF-8) or 2 (UTF-16)
```
Although string values may be capable of representing any sequence elements,
string _literals_ can denote only sequences of UTF-K code
units that are valid encodings of text.
(Any literal syntax capable of representing arbitrary element sequences
would inherently be non-portable across implementations.)
Consequently, when the `repr` function is applied to a string
containing an invalid encoding, its result is not a valid string literal.
An ordinary string literal may not contain an unescaped newline,
but a *multiline string literal* may spread over multiple source lines.
It is denoted using three quotation marks at start and end.
Within it, unescaped newlines and quotation marks (or even pairs of
quotation marks) have their literal meaning, but three quotation marks
end the literal. This makes it easy to quote large blocks of text with
few escapes.
```
haiku = '''
Yesterday it worked.
Today it is not working.
That's computers. Sigh.
'''
```
Regardless of the platform's convention for text line endings---for
example, a linefeed (\n) on UNIX, or a carriage return followed by a
linefeed (\r\n) on Microsoft Windows---an unescaped line ending in a
multiline string literal always denotes a line feed (\n).
Starlark also supports *raw string literals*, which look like an
ordinary single- or double-quotation preceded by `r`. Within a raw
string literal, there is no special processing of backslash escapes,
other than an escaped quotation mark (which denotes a literal
quotation mark), or an escaped newline (which denotes a backslash
followed by a newline). This form of quotation is typically used when
writing strings that contain many quotation marks or backslashes (such
as regular expressions or shell commands) to reduce the burden of
escaping:
```python
"a\nb" # "a\nb" = 'a' + '\n' + 'b'
r"a\nb" # "a\\nb" = 'a' + '\\' + 'n' + 'b'
"a\
b" # "ab"
r"a\
b" # "a\\\nb"
```
It is an error for a backslash to appear within a string literal other
than as part of one of the escapes described above.
### Bytes literals
A Starlark bytes literal denotes a bytes value,
and looks like a string literal, in any of its various forms
(single-quoted, double-quoted, triple-quoted, raw)
preceded by the letter `b`.
```python
b"abc" b'abc'
b"""abc""" b'''abc'''
br"abc" br'abc'
rb"abc" rb'abc'
```
A raw bytes literal may be indicated by either a `br` or `rb` prefix.
Non-escaped text within a bytes literal denotes the UTF-8 encoding of that text.
Bytes literals support the same escape sequences as text strings,
with the following differences:
- Octal and hexadecimal escapes may specify any byte value from
zero (`\000` or `\x00`) to 255 (`\377` or `\xFF`).
- A Unicode escape `\uXXXX` or `\UXXXXXXXX` denotes the byte
sequence of the UTF-8 encoding of the specified 16- or 32-bit code point.
(As with text strings, the code point value must not lie in the surrogate range.)
Any valid string literal that, with a `b` prefix, is also a
valid bytes literal is equivalent in the sense that
the bytes value is the UTF-8 encoding of the string value.
TODO: define indent, outdent, semicolon, newline, eof
## Data types
These are the main data types built in to the interpreter:
```text
NoneType # the type of None
bool # True or False
int # a signed integer of arbitrary magnitude
float # an IEEE 754 double-precision floating-point number
string # a text string, with Unicode encoded as UTF-8 or UTF-16
bytes # a byte string
list # a fixed-length sequence of values
tuple # a fixed-length sequence of values, unmodifiable
dict # a mapping from values to values
function # a function
```
Some functions, such as the `range` function, return instances of
special-purpose types that don't appear in this list.
Additional data types may be defined by the host application into
which the interpreter is embedded, and those data types may
participate in basic operations of the language such as arithmetic,
comparison, indexing, and function calls.
<!-- We needn't mention the stringIterable type here. -->
Some operations can be applied to any Starlark value. For example,
every value has a type string that can be obtained with the expression
`type(x)`, and any value may be converted to a string using the
expression `str(x)`, or to a Boolean truth value using the expression
`bool(x)`. Other operations apply only to certain types. For
example, the indexing operation `a[i]` works only with strings, bytes values, lists,
and tuples, and any application-defined types that are _indexable_.
The [_value concepts_](#value-concepts) section explains the groupings of
types by the operators they support.
### None
`None` is a distinguished value used to indicate the absence of any other value.
For example, the result of a call to a function that contains no return statement is `None`.
`None` is equal only to itself. Its [type](#type) is `"NoneType"`.
The truth value of `None` is `False`.
### Booleans
There are two Boolean values, `True` and `False`, representing the
truth or falsehood of a predicate. The [type](#type) of a Boolean is `"bool"`.
Boolean values are typically used as conditions in `if`-statements,
although any Starlark value used as a condition is implicitly
interpreted as a Boolean.
For example, the values `None`, `0`, and the empty sequences
`""`, `()`, `[]`, and `{}` have a truth value of `False`, whereas non-zero
numbers and non-empty sequences have a truth value of `True`.
Application-defined types determine their own truth value.
Any value may be explicitly converted to a Boolean using the built-in `bool`
function.
```python
1 + 1 == 2 # True
2 + 2 == 5 # False
if 1 + 1:
print("True")
else:
print("False")
```
True and False may be converted to the values 1 and 0 using the `int` function,
but Booleans are not numbers.
### Integers
The Starlark integer type represents integers. Its [type](#type) is `"int"`.
Integers may be positive or negative, and arbitrarily large.
Integer arithmetic is exact.
Integers are totally ordered; comparisons follow mathematical
tradition.
The `+` and `-` operators perform addition and subtraction, respectively.
The `*` operator performs multiplication.
The `//` and `%` operations on integers compute floored division and
remainder of floored division, respectively.
If the signs of the operands differ, the sign of the remainder `x % y`
matches that of the divisor, `y`.
For all finite x and y (y ≠ 0), `(x // y) * y + (x % y) == x`.
The `/` operator implements floating-point division, and
yields a `float` result even when its operands are both of type `int`.
Integers, including negative values, may be interpreted as bit vectors.
Negative values use two's complement representation.
The `|`, `&`, and `^` operators implement bitwise OR, AND, and XOR,
respectively. The unary `~` operator yields the bitwise inversion of its
integer argument. The `<<` and `>>` operators shift the first argument
to the left or right by the number of bits given by the second argument.
Any bool, number, or string may be interpreted as an integer by using
the `int` built-in function.
An integer used in a Boolean context is considered true if it is
non-zero.
```python
100 // 5 * 9 + 32 # 212
3 // 2 # 1
111111111 * 111111111 # 12345678987654321
int("0xffff", 16) # 65535
```
### Floating-point numbers
The Starlark floating-point data type represents an IEEE 754
double-precision floating-point number.
Its [type](#type) is `"float"`.
Arithmetic on floats using the `+`, `-`, `*`, `/`, `//`, and `%`
operators follows the IEEE 754 standard.
However, computing the division or remainder of division by zero is a dynamic error.
An arithmetic operation applied to a mixture of `float` and `int`
operands works as if the `int` operand were first converted to a
`float`. For example, `3.141 + 1` is equivalent to `3.141 +
float(1)`. The implicit conversion fails if the `int` value is too
large to be represented as a `float`.
There are two floating-point division operators:
`x / y ` yields the floating-point quotient of `x` and `y`,
whereas `x // y` yields `floor(x / y)`, that is, the largest
representable integer value not greater than `x / y`.
Although the resulting number is integral, it is represented as a
`float` if either operand is a `float`.
The `%` operation computes the remainder of floored division.
As with the corresponding operation on integers,
if the signs of the operands differ, the sign of the remainder `x % y`
matches that of the divisor, `y`.
All float values are ordered, so they may be compared
using operators such as `==` and `<`, and sorted using `sorted`.
IEEE 754 defines two zero values, +0.0 and -0.0.
They compare equal to each other.
IEEE 754 defines two infinite float values `+Inf` and `-Inf`,
which represent numbers greater/less than all finite float values.
IEEE 754 defines many "not a number" (NaN) values.
They are non-finite, and represent the results of dubious operations
such as `Inf / Inf`. All NaN values compare equal to each other,
but greater than any non-NaN `float` value.
(Starlark does not follow the IEEE 754 standard for NaN comparisons,
which requires that all comparisons with NaN are false, except NaN != NaN.)
<!--
This choice greatly simplifies the logic for float arithmetic by
ensuring many standard identities and invariants such as:
- float < float (also < <= == => >) are transitive relations
- float < float is a strict weak order: the relation eq is transitive,
where eq(x, y) = not (x < y) and not (y < x).
- not (float < float) <=> (float >= float)
- sorting a list of values that includes NaN is stable.
Furthermore, implementations may assume that identical objects
are equal, a useful optimization. Python in many cases exploits
this optimization without the necessary invariant, leading to
inconsistencies such as this:
>>> nan = float('nan')
>>> nan == nan
False
>>> nan is nan
True
>>> float('nan') == float('nan')
False
>>> float('nan') is float('nan')
False
>>> (nan,) == (nan,)
True
>>> (float('nan'),) == (float('nan'),)
False
-->
A comparison operation may be applied to a mixture of int and float values.
The result of such comparisons is mathematically exact, even if neither operand
can be exactly represented by the type of the other.
```python
(type(1.0), type(1)) # ("float", "int")
1.0 == 1 # True
big = (1<<53)+1 # first int not exactly representable as float
(big + 0.0) == big # False (addition caused rounding down)
(big + 0.0) - big # 0.0 (both operands subject to rounding down)
```
Any bool, number, or string may be interpreted as a floating-point
number by using the `float` built-in function.
A float used in a Boolean context is considered true if it is
non-zero (not equal to 0.0 or -0.0). A NaN value is thus considered true.
```python
1.23e45 * 1.23e45 # 1.5129e+90
1.111111111111111 * 1.111111111111111 # 1.23457
3.0 / 2 # 1.5
3 / 2.0 # 1.5
float(3) / 2 # 1.5
3.0 // 2.0 # 1.0
```
### Strings
A string is an immutable sequence of elements that encode Unicode text.
The [type](#type) of a string is `"string"`.
For reasons of efficiency and interoperability with the host language,
the number of bits in each string element, which we call K,
is specified to be either 8 or 16, depending on the implementation.
For example, in the Go and Rust implementations,
each string element is an 8-bit value (a byte) and Unicode text is encoded as UTF-8,
whereas in the Java implementation,
string elements are 16-bit values (Java `char`s) and Unicode text is encoded as UTF-16.
An implementation may permit strings to hold arbitrary values of the element type,
including sequences that do not denote encode valid Unicode text;
or, it may disallow invalid sequences, and operations that would form them.
The built-in `len` function returns the number of elements in a string.
Strings may be concatenated with the `+` operator.
The substring expression `s[i:j]` returns the substring of `s` from
element index `i` up to index `j`.
<!-- TODO: The Rust implementation of s[i:j] may fail if it cuts a
UTF-8 sequence in half. Need to accommodate that here. -->
The index expression `s[i]` returns the
1-element substring `s[i:i+1]`.
Strings are hashable, and thus may be used as keys in a dictionary.
Strings are totally ordered lexicographically, so strings may be
compared using operators such as `==` and `<`.
(Beware that the UTF-16 string encoding is not order-preserving
with respect to code point values.)
Strings are _not_ iterable sequences, so they cannot be used as the operand of
a `for`-loop, list comprehension, or any other operation than requires
an iterable sequence. One must expliitly call a method of a string value
to obtain an iterable view.
Any value may formatted as a string using the `str` or `repr` built-in
functions, the `str % tuple` operator, or the `str.format` method.
A string used in a Boolean context is considered true if it is
non-empty.
Strings have several built-in methods:
* [`capitalize`](#string·capitalize)
* [`count`](#string·count)
* [`elems`](#string·elems)
* [`endswith`](#string·endswith)
* [`find`](#string·find)
* [`format`](#string·format)
* [`index`](#string·index)
* [`isalnum`](#string·isalnum)
* [`isalpha`](#string·isalpha)
* [`isdigit`](#string·isdigit)
* [`islower`](#string·islower)
* [`isspace`](#string·isspace)
* [`istitle`](#string·istitle)
* [`isupper`](#string·isupper)
* [`join`](#string·join)
* [`lower`](#string·lower)
* [`lstrip`](#string·lstrip)
* [`partition`](#string·partition)
* [`removeprefix`](#string·removeprefix)
* [`removesuffix`](#string·removesuffix)
* [`replace`](#string·replace)
* [`rfind`](#string·rfind)
* [`rindex`](#string·rindex)
* [`rpartition`](#string·rpartition)
* [`rsplit`](#string·rsplit)
* [`rstrip`](#string·rstrip)
* [`split`](#string·split)
* [`splitlines`](#string·splitlines)
* [`startswith`](#string·startswith)
* [`strip`](#string·strip)
* [`title`](#string·title)
* [`upper`](#string·upper)
### Bytes
A _bytes_ is an immutable sequence of values in the range 0-255.
The [type](#type) of a bytes is `"bytes"`.
Unlike a string, which is intended for text, a bytes may represent binary data,
such as the contents of an arbitrary file, without loss.
The built-in `len` function returns the number of elements (bytes) in a `bytes`.
Two bytes values may be concatenated with the `+` operator.
The slice expression `b[i:j]` returns the subsequence of `b`
from index `i` up to but not including index `j`.
The index expression `b[i]` returns the int value of the ith element.
The `in` operator may be used to test for the presence of one bytes
as a subsequence of another, or for the presence of a single `int` byte value.
Like strings, bytes values are hashable, totally ordered, and not iterable,
and are considered True if they are non-empty.
A bytes value has these methods:
* [`elems`](#bytes·elems)
```
TODO(https://github.com/bazelbuild/starlark/issues/112)
- more methods: likely the same as string (minus those concerned with text):
join
{start,end}with
{r,}{find,index,partition,split,strip}
replace
TODO: encode, decode methods?
TODO: ord, chr.
TODO: string.elems(), string.elem_ords(), string.codepoint_ords()
```
### Lists
A list is a mutable sequence of values.
The [type](#type) of a list is `"list"`.
Lists are indexable sequences: the elements of a list may be iterated
over by `for`-loops, list comprehensions, and various built-in
functions.
List may be constructed using bracketed list notation:
```python
[] # an empty list
[1] # a 1-element list
[1, 2] # a 2-element list
```
Lists can also be constructed from any iterable sequence by using the
built-in `list` function.
The built-in `len` function applied to a list returns the number of elements.
The index expression `list[i]` returns the element at index i,
and the slice expression `list[i:j]` returns a new list consisting of
the elements at indices from i to j.
List elements may be added using the `append` or `extend` methods,
removed using the `remove` method, or reordered by assignments such as
`list[i] = list[j]`.
The concatenation operation `x + y` yields a new list containing all
the elements of the two lists x and y.
For most types, `x += y` is equivalent to `x = x + y`, except that it
evaluates `x` only once, that is, it allocates a new list to hold
the concatenation of `x` and `y`.
However, if `x` refers to a list, the statement does not allocate a
new list but instead mutates the original list in place, similar to
`x.extend(y)`.
Lists are not hashable, so may not be used in the keys of a dictionary.
A list used in a Boolean context is considered true if it is
non-empty.
A [_list comprehension_](#comprehensions) creates a new list whose elements are the
result of some expression applied to each element of another sequence.
```python
[x*x for x in [1, 2, 3, 4]] # [1, 4, 9, 16]
```
A list value has these methods:
* [`append`](#list·append)
* [`clear`](#list·clear)
* [`extend`](#list·extend)
* [`index`](#list·index)
* [`insert`](#list·insert)
* [`pop`](#list·pop)
* [`remove`](#list·remove)
### Tuples
A tuple is an immutable sequence of values.
The [type](#type) of a tuple is `"tuple"`.
Tuples are constructed using parenthesized list notation:
```python
() # the empty tuple
(1,) # a 1-tuple
(1, 2) # a 2-tuple ("pair")
(1, 2, 3) # a 3-tuple
```
Observe that for the 1-tuple, the trailing comma is necessary to
distinguish it from the parenthesized expression `(1)`.
1-tuples are seldom used.
Starlark, unlike Python, does not permit a trailing comma to appear in
an unparenthesized tuple expression:
```python
for k, v, in dict.items(): pass # syntax error at 'in'
_ = [(v, k) for k, v, in dict.items()] # syntax error at 'in'
sorted(3, 1, 4, 1,) # ok
[1, 2, 3, ] # ok
{1: 2, 3:4, } # ok
```
Any iterable sequence may be converted to a tuple by using the
built-in `tuple` function.
Like lists, tuples are indexed sequences, so they may be indexed and
sliced. The index expression `tuple[i]` returns the tuple element at
index i, and the slice expression `tuple[i:j]` returns a subsequence
of a tuple.
Tuples are iterable sequences, so they may be used as the operand of a
`for`-loop, a list comprehension, or various built-in functions.
Unlike lists, tuples cannot be modified.
However, the mutable elements of a tuple may be modified.
Tuples are hashable (assuming their elements are hashable),
so they may be used as keys of a dictionary.
Tuples may be concatenated using the `+` operator.
A tuple used in a Boolean context is considered true if it is
non-empty.
### Dictionaries
A dictionary is a mutable mapping from keys to values.
The [type](#type) of a dictionary is `"dict"`.
Dictionaries provide constant-time operations to insert an element, to
look up the value for a key, or to remove an element. Dictionaries
are implemented using hash tables, so keys must be hashable. Hashable
values include `None`, Booleans, numbers, strings, and bytes, and tuples
composed from hashable values. Most mutable values, such as lists
and dictionaries, are not hashable, unless they are frozen.
Attempting to use a non-hashable value as a key in a dictionary
results in a dynamic error.
A [dictionary expression](#dictionary-expressions) specifies a
dictionary as a set of key/value pairs enclosed in braces:
```python
coins = {
"penny": 1,
"nickel": 5,
"dime": 10,
"quarter": 25,
}
```
The expression `d[k]`, where `d` is a dictionary and `k` is a key,
retrieves the value associated with the key. If the dictionary
contains no such item, the operation fails:
```python
coins["penny"] # 1
coins["dime"] # 10
coins["silver dollar"] # error: key not found
```
The number of items in a dictionary `d` is given by `len(d)`.
A key/value item may be added to a dictionary, or updated if the key
is already present, by using `d[k]` on the left side of an assignment:
```python
len(coins) # 4
coins["shilling"] = 20
len(coins) # 5, item was inserted
coins["shilling"] = 5
len(coins) # 5, existing item was updated
```
A dictionary can also be constructed using a [dictionary