Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-119786: cleanup internal docs and fix internal links #127485

Merged
merged 2 commits into from
Dec 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion InternalDocs/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

# CPython Internals Documentation

The documentation in this folder is intended for CPython maintainers.
Expand Down
8 changes: 6 additions & 2 deletions InternalDocs/adaptive.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ quality of specialization and keeping the overhead of specialization low.
Specialized instructions must be fast. In order to be fast,
specialized instructions should be tailored for a particular
set of values that allows them to:

1. Verify that incoming value is part of that set with low overhead.
2. Perform the operation quickly.

Expand All @@ -107,9 +108,11 @@ For example, `LOAD_GLOBAL_MODULE` is specialized for `globals()`
dictionaries that have a keys with the expected version.

This can be tested quickly:

* `globals->keys->dk_version == expected_version`

and the operation can be performed quickly:

* `value = entries[cache->index].me_value;`.

Because it is impossible to measure the performance of an instruction without
Expand All @@ -122,10 +125,11 @@ base instruction.
### Implementation of specialized instructions

In general, specialized instructions should be implemented in two parts:

1. A sequence of guards, each of the form
`DEOPT_IF(guard-condition-is-false, BASE_NAME)`.
`DEOPT_IF(guard-condition-is-false, BASE_NAME)`.
2. The operation, which should ideally have no branches and
a minimum number of dependent memory accesses.
a minimum number of dependent memory accesses.

In practice, the parts may overlap, as data required for guards
can be re-used in the operation.
Expand Down
4 changes: 2 additions & 2 deletions InternalDocs/changing_grammar.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ Below is a checklist of things that may need to change.
[`Include/internal/pycore_ast.h`](../Include/internal/pycore_ast.h) and
[`Python/Python-ast.c`](../Python/Python-ast.c).

* [`Parser/lexer/`](../Parser/lexer/) contains the tokenization code.
* [`Parser/lexer/`](../Parser/lexer) contains the tokenization code.
This is where you would add a new type of comment or string literal, for example.

* [`Python/ast.c`](../Python/ast.c) will need changes to validate AST objects
Expand Down Expand Up @@ -60,4 +60,4 @@ Below is a checklist of things that may need to change.
to the tokenizer.

* Documentation must be written! Specifically, one or more of the pages in
[`Doc/reference/`](../Doc/reference/) will need to be updated.
[`Doc/reference/`](../Doc/reference) will need to be updated.
112 changes: 55 additions & 57 deletions InternalDocs/compiler.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

Compiler design
===============

Expand All @@ -7,8 +6,8 @@ Abstract

In CPython, the compilation from source code to bytecode involves several steps:

1. Tokenize the source code [Parser/lexer/](../Parser/lexer/)
and [Parser/tokenizer/](../Parser/tokenizer/).
1. Tokenize the source code [Parser/lexer/](../Parser/lexer)
and [Parser/tokenizer/](../Parser/tokenizer).
2. Parse the stream of tokens into an Abstract Syntax Tree
[Parser/parser.c](../Parser/parser.c).
3. Transform AST into an instruction sequence
Expand Down Expand Up @@ -134,9 +133,8 @@ this case) a `stmt_ty` struct with the appropriate initialization. The
`FunctionDef()` constructor function sets 'kind' to `FunctionDef_kind` and
initializes the *name*, *args*, *body*, and *attributes* fields.

See also
[Green Tree Snakes - The missing Python AST docs](https://greentreesnakes.readthedocs.io/en/latest)
by Thomas Kluyver.
See also [Green Tree Snakes - The missing Python AST docs](
https://greentreesnakes.readthedocs.io/en/latest) by Thomas Kluyver.

Memory management
=================
Expand Down Expand Up @@ -260,33 +258,33 @@ manually -- `generic`, `identifier` and `int`. These types are found in
[Include/internal/pycore_asdl.h](../Include/internal/pycore_asdl.h).
Functions and macros for creating `asdl_xx_seq *` types are as follows:

`_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)`
Allocate memory for an `asdl_generic_seq` of the specified length
`_Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *)`
Allocate memory for an `asdl_identifier_seq` of the specified length
`_Py_asdl_int_seq_new(Py_ssize_t, PyArena *)`
Allocate memory for an `asdl_int_seq` of the specified length
* `_Py_asdl_generic_seq_new(Py_ssize_t, PyArena *)`:
Allocate memory for an `asdl_generic_seq` of the specified length
* `_Py_asdl_identifier_seq_new(Py_ssize_t, PyArena *)`:
Allocate memory for an `asdl_identifier_seq` of the specified length
* `_Py_asdl_int_seq_new(Py_ssize_t, PyArena *)`:
Allocate memory for an `asdl_int_seq` of the specified length

In addition to the three types mentioned above, some ASDL sequence types are
automatically generated by [Parser/asdl_c.py](../Parser/asdl_c.py) and found in
[Include/internal/pycore_ast.h](../Include/internal/pycore_ast.h).
Macros for using both manually defined and automatically generated ASDL
sequence types are as follows:

`asdl_seq_GET(asdl_xx_seq *, int)`
Get item held at a specific position in an `asdl_xx_seq`
`asdl_seq_SET(asdl_xx_seq *, int, stmt_ty)`
Set a specific index in an `asdl_xx_seq` to the specified value
* `asdl_seq_GET(asdl_xx_seq *, int)`:
Get item held at a specific position in an `asdl_xx_seq`
* `asdl_seq_SET(asdl_xx_seq *, int, stmt_ty)`:
Set a specific index in an `asdl_xx_seq` to the specified value

Untyped counterparts exist for some of the typed macros. These are useful
Untyped counterparts exist for some of the typed macros. These are useful
when a function needs to manipulate a generic ASDL sequence:

`asdl_seq_GET_UNTYPED(asdl_seq *, int)`
Get item held at a specific position in an `asdl_seq`
`asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty)`
Set a specific index in an `asdl_seq` to the specified value
`asdl_seq_LEN(asdl_seq *)`
Return the length of an `asdl_seq` or `asdl_xx_seq`
* `asdl_seq_GET_UNTYPED(asdl_seq *, int)`:
Get item held at a specific position in an `asdl_seq`
* `asdl_seq_SET_UNTYPED(asdl_seq *, int, stmt_ty)`:
Set a specific index in an `asdl_seq` to the specified value
* `asdl_seq_LEN(asdl_seq *)`:
Return the length of an `asdl_seq` or `asdl_xx_seq`

Note that typed macros and functions are recommended over their untyped
counterparts. Typed macros carry out checks in debug mode and aid
Expand Down Expand Up @@ -379,33 +377,33 @@ arguments to a node that used the '*' modifier).

Emission of bytecode is handled by the following macros:

* `ADDOP(struct compiler *, location, int)`
add a specified opcode
* `ADDOP_IN_SCOPE(struct compiler *, location, int)`
like `ADDOP`, but also exits current scope; used for adding return value
opcodes in lambdas and closures
* `ADDOP_I(struct compiler *, location, int, Py_ssize_t)`
add an opcode that takes an integer argument
* `ADDOP_O(struct compiler *, location, int, PyObject *, TYPE)`
add an opcode with the proper argument based on the position of the
specified PyObject in PyObject sequence object, but with no handling of
mangled names; used for when you
need to do named lookups of objects such as globals, consts, or
parameters where name mangling is not possible and the scope of the
name is known; *TYPE* is the name of PyObject sequence
(`names` or `varnames`)
* `ADDOP_N(struct compiler *, location, int, PyObject *, TYPE)`
just like `ADDOP_O`, but steals a reference to PyObject
* `ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE)`
just like `ADDOP_O`, but name mangling is also handled; used for
attribute loading or importing based on name
* `ADDOP_LOAD_CONST(struct compiler *, location, PyObject *)`
add the `LOAD_CONST` opcode with the proper argument based on the
position of the specified PyObject in the consts table.
* `ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *)`
just like `ADDOP_LOAD_CONST_NEW`, but steals a reference to PyObject
* `ADDOP_JUMP(struct compiler *, location, int, basicblock *)`
create a jump to a basic block
* `ADDOP(struct compiler *, location, int)`:
add a specified opcode
* `ADDOP_IN_SCOPE(struct compiler *, location, int)`:
like `ADDOP`, but also exits current scope; used for adding return value
opcodes in lambdas and closures
* `ADDOP_I(struct compiler *, location, int, Py_ssize_t)`:
add an opcode that takes an integer argument
* `ADDOP_O(struct compiler *, location, int, PyObject *, TYPE)`:
add an opcode with the proper argument based on the position of the
specified PyObject in PyObject sequence object, but with no handling of
mangled names; used for when you
need to do named lookups of objects such as globals, consts, or
parameters where name mangling is not possible and the scope of the
name is known; *TYPE* is the name of PyObject sequence
(`names` or `varnames`)
* `ADDOP_N(struct compiler *, location, int, PyObject *, TYPE)`:
just like `ADDOP_O`, but steals a reference to PyObject
* `ADDOP_NAME(struct compiler *, location, int, PyObject *, TYPE)`:
just like `ADDOP_O`, but name mangling is also handled; used for
attribute loading or importing based on name
* `ADDOP_LOAD_CONST(struct compiler *, location, PyObject *)`:
add the `LOAD_CONST` opcode with the proper argument based on the
position of the specified PyObject in the consts table.
* `ADDOP_LOAD_CONST_NEW(struct compiler *, location, PyObject *)`:
just like `ADDOP_LOAD_CONST_NEW`, but steals a reference to PyObject
* `ADDOP_JUMP(struct compiler *, location, int, basicblock *)`:
create a jump to a basic block

The `location` argument is a struct with the source location to be
associated with this instruction. It is typically extracted from an
Expand Down Expand Up @@ -433,7 +431,7 @@ Finally, the sequence of pseudo-instructions is converted into actual
bytecode. This includes transforming pseudo instructions into actual instructions,
converting jump targets from logical labels to relative offsets, and
construction of the [exception table](exception_handling.md) and
[locations table](locations.md).
[locations table](code_objects.md#source-code-locations).
The bytecode and tables are then wrapped into a `PyCodeObject` along with additional
metadata, including the `consts` and `names` arrays, information about function
reference to the source code (filename, etc). All of this is implemented by
Expand All @@ -453,7 +451,7 @@ in [Python/ceval.c](../Python/ceval.c).
Important files
===============

* [Parser/](../Parser/)
* [Parser/](../Parser)

* [Parser/Python.asdl](../Parser/Python.asdl):
ASDL syntax file.
Expand Down Expand Up @@ -534,7 +532,7 @@ Important files
* [Python/instruction_sequence.c](../Python/instruction_sequence.c):
A data structure representing a sequence of bytecode-like pseudo-instructions.

* [Include/](../Include/)
* [Include/](../Include)

* [Include/cpython/code.h](../Include/cpython/code.h)
: Header file for [Objects/codeobject.c](../Objects/codeobject.c);
Expand All @@ -556,7 +554,7 @@ Important files
: Declares `_PyAST_Validate()` external (from [Python/ast.c](../Python/ast.c)).

* [Include/internal/pycore_symtable.h](../Include/internal/pycore_symtable.h)
: Header for [Python/symtable.c](../Python/symtable.c).
: Header for [Python/symtable.c](../Python/symtable.c).
`struct symtable` and `PySTEntryObject` are defined here.

* [Include/internal/pycore_parser.h](../Include/internal/pycore_parser.h)
Expand All @@ -570,7 +568,7 @@ Important files
by
[Tools/cases_generator/opcode_id_generator.py](../Tools/cases_generator/opcode_id_generator.py).

* [Objects/](../Objects/)
* [Objects/](../Objects)

* [Objects/codeobject.c](../Objects/codeobject.c)
: Contains PyCodeObject-related code.
Expand All @@ -579,7 +577,7 @@ Important files
: Contains the `frame_setlineno()` function which should determine whether it is allowed
to make a jump between two points in a bytecode.

* [Lib/](../Lib/)
* [Lib/](../Lib)

* [Lib/opcode.py](../Lib/opcode.py)
: opcode utilities exposed to Python.
Expand All @@ -591,7 +589,7 @@ Important files
Objects
=======

* [Locations](locations.md): Describes the location table
* [Locations](code_objects.md#source-code-locations): Describes the location table
* [Frames](frames.md): Describes frames and the frame stack
* [Objects/object_layout.md](../Objects/object_layout.md): Describes object layout for 3.11 and later
* [Exception Handling](exception_handling.md): Describes the exception table
Expand Down
65 changes: 34 additions & 31 deletions InternalDocs/exception_handling.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,10 +87,10 @@ offset of the raising instruction should be pushed to the stack.
Handling an exception, once an exception table entry is found, consists
of the following steps:

1. pop values from the stack until it matches the stack depth for the handler.
2. if `lasti` is true, then push the offset that the exception was raised at.
3. push the exception to the stack.
4. jump to the target offset and resume execution.
1. pop values from the stack until it matches the stack depth for the handler.
2. if `lasti` is true, then push the offset that the exception was raised at.
3. push the exception to the stack.
4. jump to the target offset and resume execution.


Reraising Exceptions and `lasti`
Expand All @@ -107,13 +107,12 @@ Format of the exception table
-----------------------------

Conceptually, the exception table consists of a sequence of 5-tuples:
```
1. `start-offset` (inclusive)
2. `end-offset` (exclusive)
3. `target`
4. `stack-depth`
5. `push-lasti` (boolean)
```

1. `start-offset` (inclusive)
2. `end-offset` (exclusive)
3. `target`
4. `stack-depth`
5. `push-lasti` (boolean)

All offsets and lengths are in code units, not bytes.

Expand All @@ -123,18 +122,19 @@ For it to be searchable quickly, we need to support binary search giving us log(
Binary search typically assumes fixed size entries, but that is not necessary, as long as we can identify the start of an entry.

It is worth noting that the size (end-start) is always smaller than the end, so we encode the entries as:
`start, size, target, depth, push-lasti`.
`start, size, target, depth, push-lasti`.

Also, sizes are limited to 2**30 as the code length cannot exceed 2**31 and each code unit takes 2 bytes.
It also happens that depth is generally quite small.

So, we need to encode:

```
`start` (up to 30 bits)
`size` (up to 30 bits)
`target` (up to 30 bits)
`depth` (up to ~8 bits)
`lasti` (1 bit)
start (up to 30 bits)
size (up to 30 bits)
target (up to 30 bits)
depth (up to ~8 bits)
lasti (1 bit)
```

We need a marker for the start of the entry, so the first byte of entry will have the most significant bit set.
Expand All @@ -145,29 +145,32 @@ The 8 bits of a byte are (msb left) SXdddddd where S is the start bit. X is the
In addition, we combine `depth` and `lasti` into a single value, `((depth<<1)+lasti)`, before encoding.

For example, the exception entry:

```
`start`: 20
`end`: 28
`target`: 100
`depth`: 3
`lasti`: False
start: 20
end: 28
target: 100
depth: 3
lasti: False
```

is encoded by first converting to the more compact four value form:

```
`start`: 20
`size`: 8
`target`: 100
`depth<<1+lasti`: 6
start: 20
size: 8
target: 100
depth<<1+lasti: 6
```

which is then encoded as:

```
148 (MSB + 20 for start)
8 (size)
65 (Extend bit + 1)
36 (Remainder of target, 100 == (1<<6)+36)
6
148 (MSB + 20 for start)
8 (size)
65 (Extend bit + 1)
36 (Remainder of target, 100 == (1<<6)+36)
6
```

for a total of five bytes.
Expand Down
1 change: 1 addition & 0 deletions InternalDocs/frames.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ objects, so are not allocated in the per-thread stack. See `PyGenObject` in
## Layout

Each activation record is laid out as:

* Specials
* Locals
* Stack
Expand Down
3 changes: 1 addition & 2 deletions InternalDocs/garbage_collector.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

Garbage collector design
========================

Expand Down Expand Up @@ -117,7 +116,7 @@ general, the collection of all objects tracked by GC is partitioned into disjoin
doubly linked list. Between collections, objects are partitioned into "generations", reflecting how
often they've survived collection attempts. During collections, the generation(s) being collected
are further partitioned into, for example, sets of reachable and unreachable objects. Doubly linked lists
support moving an object from one partition to another, adding a new object, removing an object
support moving an object from one partition to another, adding a new object, removing an object
entirely (objects tracked by GC are most often reclaimed by the refcounting system when GC
isn't running at all!), and merging partitions, all with a small constant number of pointer updates.
With care, they also support iterating over a partition while objects are being added to - and
Expand Down
1 change: 0 additions & 1 deletion InternalDocs/generators.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

Generators
==========

Expand Down
1 change: 0 additions & 1 deletion InternalDocs/interpreter.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

The bytecode interpreter
========================

Expand Down
Loading
Loading