python · iritkatriel · Dec 1, 2024 · Dec 1, 2024 · Dec 1, 2024
diff --git a/InternalDocs/README.md b/InternalDocs/README.md
@@ -1,4 +1,3 @@
-
 # CPython Internals Documentation
 
 The documentation in this folder is intended for CPython maintainers.

diff --git a/InternalDocs/adaptive.md b/InternalDocs/adaptive.md
@@ -96,6 +96,7 @@ quality of specialization and keeping the overhead of specialization low.
 Specialized instructions must be fast. In order to be fast,
 specialized instructions should be tailored for a particular
 set of values that allows them to:
+
 1. Verify that incoming value is part of that set with low overhead.
 2. Perform the operation quickly.
 
@@ -107,9 +108,11 @@ For example, `LOAD_GLOBAL_MODULE` is specialized for `globals()`
 dictionaries that have a keys with the expected version.
 
 This can be tested quickly:
+
 * `globals->keys->dk_version == expected_version`
 
 and the operation can be performed quickly:
+
 * `value = entries[cache->index].me_value;`.
 
 Because it is impossible to measure the performance of an instruction without
@@ -122,10 +125,11 @@ base instruction.
 ### Implementation of specialized instructions
 
 In general, specialized instructions should be implemented in two parts:
+
 1. A sequence of guards, each of the form
-  `DEOPT_IF(guard-condition-is-false, BASE_NAME)`.
+   `DEOPT_IF(guard-condition-is-false, BASE_NAME)`.
 2. The operation, which should ideally have no branches and
-  a minimum number of dependent memory accesses.
+   a minimum number of dependent memory accesses.
 
 In practice, the parts may overlap, as data required for guards
 can be re-used in the operation.

diff --git a/InternalDocs/changing_grammar.md b/InternalDocs/changing_grammar.md
@@ -32,7 +32,7 @@ Below is a checklist of things that may need to change.
   [`Include/internal/pycore_ast.h`](../Include/internal/pycore_ast.h) and
   [`Python/Python-ast.c`](../Python/Python-ast.c).
 
-* [`Parser/lexer/`](../Parser/lexer/) contains the tokenization code.
+* [`Parser/lexer/`](../Parser/lexer) contains the tokenization code.
   This is where you would add a new type of comment or string literal, for example.
 
 * [`Python/ast.c`](../Python/ast.c) will need changes to validate AST objects
@@ -60,4 +60,4 @@ Below is a checklist of things that may need to change.
   to the tokenizer.
 
 * Documentation must be written! Specifically, one or more of the pages in
-  [`Doc/reference/`](../Doc/reference/) will need to be updated.
+  [`Doc/reference/`](../Doc/reference) will need to be updated.
diff --git a/InternalDocs/compiler.md b/InternalDocs/compiler.md
@@ -1,4 +1,3 @@
-
 Compiler design
 ===============
 
@@ -7,8 +6,8 @@ Abstract
 
 In CPython, the compilation from source code to bytecode involves several steps:
 
-1. Tokenize the source code [Parser/lexer/](../Parser/lexer/)
-   and [Parser/tokenizer/](../Parser/tokenizer/).
+1. Tokenize the source code [Parser/lexer/](../Parser/lexer)
+   and [Parser/tokenizer/](../Parser/tokenizer).
 2. Parse the stream of tokens into an Abstract Syntax Tree
    [Parser/parser.c](../Parser/parser.c).
 3. Transform AST into an instruction sequence
@@ -134,9 +133,8 @@ this case) a `stmt_ty` struct with the appropriate initialization.  The
 `FunctionDef()` constructor function sets 'kind' to `FunctionDef_kind` and
 initializes the *name*, *args*, *body*, and *attributes* fields.
 
-See also
-[Green Tree Snakes - The missing Python AST docs](https://greentreesnakes.readthedocs.io/en/latest)
- by Thomas Kluyver.
+See also [Green Tree Snakes - The missing Python AST docs](
+https://greentreesnakes.readthedocs.io/en/latest) by Thomas Kluyver.
 
 Memory management
 =================
@@ -260,33 +258,33 @@ manually -- `generic`, `identifier` and `int`.  These types are found in
 [Include/internal/pycore_asdl.h](../Include/internal/pycore_asdl.h).
 Functions and macros for creating `asdl_xx_seq *` types are as follows:
 
-`_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)`
-        Allocate memory for an `asdl_generic_seq` of the specified length
-`_Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *)`
-        Allocate memory for an `asdl_identifier_seq` of the specified length
-`_Py_asdl_int_seq_new(Py_ssize_t, PyArena *)`
-        Allocate memory for an `asdl_int_seq` of the specified length
+* `_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)`:
+  Allocate memory for an `asdl_generic_seq` of the specified length
+* `_Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *)`:
+  Allocate memory for an `asdl_identifier_seq` of the specified length
+* `_Py_asdl_int_seq_new(Py_ssize_t, PyArena *)`:
+  Allocate memory for an `asdl_int_seq` of the specified length
 
 In addition to the three types mentioned above, some ASDL sequence types are
 automatically generated by [Parser/asdl_c.py](../Parser/asdl_c.py) and found in
 [Include/internal/pycore_ast.h](../Include/internal/pycore_ast.h).
 Macros for using both manually defined and automatically generated ASDL
 sequence types are as follows:
 
-`asdl_seq_GET(asdl_xx_seq *, int)`
-        Get item held at a specific position in an `asdl_xx_seq`
-`asdl_seq_SET(asdl_xx_seq *, int, stmt_ty)`
-        Set a specific index in an `asdl_xx_seq` to the specified value
+* `asdl_seq_GET(asdl_xx_seq *, int)`:
+  Get item held at a specific position in an `asdl_xx_seq`
+* `asdl_seq_SET(asdl_xx_seq *, int, stmt_ty)`:
+  Set a specific index in an `asdl_xx_seq` to the specified value
 
-Untyped counterparts exist for some of the typed macros.  These are useful
+Untyped counterparts exist for some of the typed macros. These are useful
 when a function needs to manipulate a generic ASDL sequence:
 
-`asdl_seq_GET_UNTYPED(asdl_seq *, int)`
-        Get item held at a specific position in an `asdl_seq`
-`asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty)`
-        Set a specific index in an `asdl_seq` to the specified value
-`asdl_seq_LEN(asdl_seq *)`
-        Return the length of an `asdl_seq` or `asdl_xx_seq`
+* `asdl_seq_GET_UNTYPED(asdl_seq *, int)`:
+  Get item held at a specific position in an `asdl_seq`
+* `asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty)`:
+  Set a specific index in an `asdl_seq` to the specified value
+* `asdl_seq_LEN(asdl_seq *)`:
+  Return the length of an `asdl_seq` or `asdl_xx_seq`
 
 Note that typed macros and functions are recommended over their untyped
 counterparts.  Typed macros carry out checks in debug mode and aid
@@ -379,33 +377,33 @@ arguments to a node that used the '*' modifier).
 
 Emission of bytecode is handled by the following macros:
 
-* `ADDOP(struct compiler *, location, int)`
-    add a specified opcode
-* `ADDOP_IN_SCOPE(struct compiler *, location, int)`
-    like `ADDOP`, but also exits current scope; used for adding return value
-    opcodes in lambdas and closures
-* `ADDOP_I(struct compiler *, location, int, Py_ssize_t)`
-    add an opcode that takes an integer argument
-* `ADDOP_O(struct compiler *, location, int, PyObject *, TYPE)`
-    add an opcode with the proper argument based on the position of the
-    specified PyObject in PyObject sequence object, but with no handling of
-    mangled names; used for when you
-    need to do named lookups of objects such as globals, consts, or
-    parameters where name mangling is not possible and the scope of the
-    name is known; *TYPE* is the name of PyObject sequence
-    (`names` or `varnames`)
-* `ADDOP_N(struct compiler *, location, int, PyObject *, TYPE)`
-    just like `ADDOP_O`, but steals a reference to PyObject
-* `ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE)`
-    just like `ADDOP_O`, but name mangling is also handled; used for
-    attribute loading or importing based on name
-* `ADDOP_LOAD_CONST(struct compiler *, location, PyObject *)`
-    add the `LOAD_CONST` opcode with the proper argument based on the
-    position of the specified PyObject in the consts table.
-* `ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *)`
-    just like `ADDOP_LOAD_CONST_NEW`, but steals a reference to PyObject
-* `ADDOP_JUMP(struct compiler *, location, int, basicblock *)`
-    create a jump to a basic block
+* `ADDOP(struct compiler *, location, int)`:
+  add a specified opcode
+* `ADDOP_IN_SCOPE(struct compiler *, location, int)`:
+  like `ADDOP`, but also exits current scope; used for adding return value
+  opcodes in lambdas and closures
+* `ADDOP_I(struct compiler *, location, int, Py_ssize_t)`:
+  add an opcode that takes an integer argument
+* `ADDOP_O(struct compiler *, location, int, PyObject *, TYPE)`:
+  add an opcode with the proper argument based on the position of the
+  specified PyObject in PyObject sequence object, but with no handling of
+  mangled names; used for when you
+  need to do named lookups of objects such as globals, consts, or
+  parameters where name mangling is not possible and the scope of the
+  name is known; *TYPE* is the name of PyObject sequence
+  (`names` or `varnames`)
+* `ADDOP_N(struct compiler *, location, int, PyObject *, TYPE)`:
+  just like `ADDOP_O`, but steals a reference to PyObject
+* `ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE)`:
+  just like `ADDOP_O`, but name mangling is also handled; used for
+  attribute loading or importing based on name
+* `ADDOP_LOAD_CONST(struct compiler *, location, PyObject *)`:
+  add the `LOAD_CONST` opcode with the proper argument based on the
+  position of the specified PyObject in the consts table.
+* `ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *)`:
+  just like `ADDOP_LOAD_CONST_NEW`, but steals a reference to PyObject
+* `ADDOP_JUMP(struct compiler *, location, int, basicblock *)`:
+  create a jump to a basic block
 
 The `location` argument is a struct with the source location to be
 associated with this instruction. It is typically extracted from an
@@ -433,7 +431,7 @@ Finally, the sequence of pseudo-instructions is converted into actual
 bytecode. This includes transforming pseudo instructions into actual instructions,
 converting jump targets from logical labels to relative offsets, and
 construction of the [exception table](exception_handling.md) and
-[locations table](locations.md).
+[locations table](code_objects.md#source-code-locations).
 The bytecode and tables are then wrapped into a `PyCodeObject` along with additional
 metadata, including the `consts` and `names` arrays, information about function
 reference to the source code (filename, etc). All of this is implemented by
@@ -453,7 +451,7 @@ in [Python/ceval.c](../Python/ceval.c).
 Important files
 ===============
 
-* [Parser/](../Parser/)
+* [Parser/](../Parser)
 
   * [Parser/Python.asdl](../Parser/Python.asdl):
     ASDL syntax file.
@@ -534,7 +532,7 @@ Important files
   * [Python/instruction_sequence.c](../Python/instruction_sequence.c):
     A data structure representing a sequence of bytecode-like pseudo-instructions.
 
-* [Include/](../Include/)
+* [Include/](../Include)
 
   * [Include/cpython/code.h](../Include/cpython/code.h)
     : Header file for [Objects/codeobject.c](../Objects/codeobject.c);
@@ -556,7 +554,7 @@ Important files
     : Declares `_PyAST_Validate()` external (from [Python/ast.c](../Python/ast.c)).
 
   * [Include/internal/pycore_symtable.h](../Include/internal/pycore_symtable.h)
-    :  Header for [Python/symtable.c](../Python/symtable.c).
+    : Header for [Python/symtable.c](../Python/symtable.c).
     `struct symtable` and `PySTEntryObject` are defined here.
 
   * [Include/internal/pycore_parser.h](../Include/internal/pycore_parser.h)
@@ -570,7 +568,7 @@ Important files
     by
     [Tools/cases_generator/opcode_id_generator.py](../Tools/cases_generator/opcode_id_generator.py).
 
-* [Objects/](../Objects/)
+* [Objects/](../Objects)
 
   * [Objects/codeobject.c](../Objects/codeobject.c)
     : Contains PyCodeObject-related code.
@@ -579,7 +577,7 @@ Important files
     : Contains the `frame_setlineno()` function which should determine whether it is allowed
     to make a jump between two points in a bytecode.
 
-* [Lib/](../Lib/)
+* [Lib/](../Lib)
 
   * [Lib/opcode.py](../Lib/opcode.py)
     : opcode utilities exposed to Python.
@@ -591,7 +589,7 @@ Important files
 Objects
 =======
 
-* [Locations](locations.md): Describes the location table
+* [Locations](code_objects.md#source-code-locations): Describes the location table
 * [Frames](frames.md): Describes frames and the frame stack
 * [Objects/object_layout.md](../Objects/object_layout.md): Describes object layout for 3.11 and later
 * [Exception Handling](exception_handling.md): Describes the exception table

diff --git a/InternalDocs/exception_handling.md b/InternalDocs/exception_handling.md
@@ -87,10 +87,10 @@ offset of the raising instruction should be pushed to the stack.
 Handling an exception, once an exception table entry is found, consists
 of the following steps:
 
- 1. pop values from the stack until it matches the stack depth for the handler.
- 2. if `lasti` is true, then push the offset that the exception was raised at.
- 3. push the exception to the stack.
- 4. jump to the target offset and resume execution.
+1. pop values from the stack until it matches the stack depth for the handler.
+2. if `lasti` is true, then push the offset that the exception was raised at.
+3. push the exception to the stack.
+4. jump to the target offset and resume execution.
 
 
 Reraising Exceptions and `lasti`
@@ -107,13 +107,12 @@ Format of the exception table
 -----------------------------
 
 Conceptually, the exception table consists of a sequence of 5-tuples:
-```
-    1. `start-offset` (inclusive)
-    2. `end-offset` (exclusive)
-    3. `target`
-    4. `stack-depth`
-    5. `push-lasti` (boolean)
-```
+
+1. `start-offset` (inclusive)
+2. `end-offset` (exclusive)
+3. `target`
+4. `stack-depth`
+5. `push-lasti` (boolean)
 
 All offsets and lengths are in code units, not bytes.
 
@@ -123,18 +122,19 @@ For it to be searchable quickly, we need to support binary search giving us log(
 Binary search typically assumes fixed size entries, but that is not necessary, as long as we can identify the start of an entry.
 
 It is worth noting that the size (end-start) is always smaller than the end, so we encode the entries as:
-    `start, size, target, depth, push-lasti`.
+`start, size, target, depth, push-lasti`.
 
 Also, sizes are limited to 2**30 as the code length cannot exceed 2**31 and each code unit takes 2 bytes.
 It also happens that depth is generally quite small.
 
 So, we need to encode:
+
 ```
-    `start` (up to 30 bits)
-    `size` (up to 30 bits)
-    `target` (up to 30 bits)
-    `depth` (up to ~8 bits)
-    `lasti` (1 bit)
+start   (up to 30 bits)
+size    (up to 30 bits)
+target  (up to 30 bits)
+depth   (up to ~8 bits)
+lasti   (1 bit)
 ```
 
 We need a marker for the start of the entry, so the first byte of entry will have the most significant bit set.
@@ -145,29 +145,32 @@ The 8 bits of a byte are (msb left) SXdddddd where S is the start bit. X is the
 In addition, we combine `depth` and `lasti` into a single value, `((depth<<1)+lasti)`, before encoding.
 
 For example, the exception entry:
+
 ```
-    `start`:  20
-    `end`:    28
-    `target`: 100
-    `depth`:  3
-    `lasti`:  False
+start:              20
+end:                28
+target:             100
+depth:              3
+lasti:              False
 ```
 
 is encoded by first converting to the more compact four value form:
+
 ```
-    `start`:         20
-    `size`:          8
-    `target`:        100
-  `depth<<1+lasti`:  6
+start:              20
+size:               8
+target:             100
+depth<<1+lasti:     6
 ```
 
 which is then encoded as:
+
 ```
-    148 (MSB + 20 for start)
-    8   (size)
-    65  (Extend bit + 1)
-    36  (Remainder of target, 100 == (1<<6)+36)
-    6
+148     (MSB + 20 for start)
+8       (size)
+65      (Extend bit + 1)
+36      (Remainder of target, 100 == (1<<6)+36)
+6
 ```
 
 for a total of five bytes.

diff --git a/InternalDocs/frames.md b/InternalDocs/frames.md
@@ -27,6 +27,7 @@ objects, so are not allocated in the per-thread stack. See `PyGenObject` in
 ## Layout
 
 Each activation record is laid out as:
+
 * Specials
 * Locals
 * Stack

diff --git a/InternalDocs/garbage_collector.md b/InternalDocs/garbage_collector.md
@@ -1,4 +1,3 @@
-
 Garbage collector design
 ========================
 
@@ -117,7 +116,7 @@ general, the collection of all objects tracked by GC is partitioned into disjoin
 doubly linked list.  Between collections, objects are partitioned into "generations", reflecting how
 often they've survived collection attempts.  During collections, the generation(s) being collected
 are further partitioned into, for example, sets of reachable and unreachable objects.  Doubly linked lists
-support moving an object from one partition to another, adding a new object,  removing an object
+support moving an object from one partition to another, adding a new object, removing an object
 entirely (objects tracked by GC are most often reclaimed by the refcounting system when GC
 isn't running at all!), and merging partitions, all with a small constant number of pointer updates.
 With care, they also support iterating over a partition while objects are being added to - and

diff --git a/InternalDocs/generators.md b/InternalDocs/generators.md
@@ -1,4 +1,3 @@
-
 Generators
 ========== 
 

diff --git a/InternalDocs/interpreter.md b/InternalDocs/interpreter.md
@@ -1,4 +1,3 @@
-
 The bytecode interpreter
 ========================
Original file line number	Diff line number	Diff line change
		@@ -1,4 +1,3 @@

		# CPython Internals Documentation

		The documentation in this folder is intended for CPython maintainers.
Expand Down
Original file line number	Diff line number	Diff line change
		@@ -1,4 +1,3 @@

		The bytecode interpreter
		========================

Expand Down