Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPEC 12: Formatting mathematical expressions #326

Open
wants to merge 23 commits into
base: main
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
125 changes: 125 additions & 0 deletions spec-0012/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
title: "SPEC 12 — Formatting mathematical expressions"
number: 12
date: 2024-06-06
author:
- "Pamphile Roy <[email protected]>"
discussion: https://discuss.scientific-python.org/t/spec-12-formatting-mathematical-expressions
endorsed-by:
---

## Description

It is known that the PEP8 and other established styling documents are missing
guidelines about mathematical expressions. This leads to people coming with
their own interpretation and style. Standardizing the way we represent maths
would lead to the same benefits seen with "normal" code. It brings consistency
in the ecosystem improving the collaborative efforts.

This SPEC standardize the formatting of mathematical expressions.

## Implementation

The following rules must be followed.
These rules respect and complement the PEP8 (relevant sections includes
[id20](https://www.python.org/dev/peps/pep-0008/#id20) and
[id20](https://www.python.org/dev/peps/pep-0008/#id28)).

We define a _group_ as a collection of operators having the same priority.
e.g. `a + b + c` is a single group, `a + b * c` is composed of two groups `a`
and `b * c`. A group is also a collection delimited with parenthesis.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`(a + b * c)` is a group. And the whole expression by itself is a
group.

- There a space before and after `-` and `+`. Except if
the operator is used to define the sign of the number;

```
a + b
-a
```

- Within a group, if operators with different priorities are used, add whitespace around the operators with the lowest priority(ies).
Copy link
Contributor

@mdhaber mdhaber Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the definition of "group" above, it is not possible for operators within a group to have different priority.

I believe the definition of "group" was supposed to be something more like a sequence of operations that relies on implicit order of operations rules. Examples include:

  • a logical line
  • operations within parentheses
  • the expression and for/if clauses of a list comprehension


```
a + b*c
```

- There is no space before and after `**`.

```
a**b
```

- There is no space before and after operators `*` and `/`. Only exception is if the expression consist of a single operator linking two groups with more than one
element.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be another exception below.
https://github.com/scientific-python/specs/pull/326/files#r1631763770

According to "Only exception is if the expression consist of a single operator linking two groups", there is no exception for:

(a*b)*(c*d)*(e*f)

because there are three groups? Or do you mean that you need a space when any binary operator is linking two explicit "groups" enclosed by parentheses?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your example it means no spaces because we have 3 groups.

And if we have 2 groups then each group must have at least one operator in it (ie not just a variable or single number).


```
a*b
(a*b) * (c*d)
```

- Operators within a group are ordered from the lowest to the highest priority.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure "group" here is being used in the same sense as elsewhere. I remember discussing in person the idea that a/d*b**c is preferable to a*b**c/d unless there are explicit parentheses, like (a*b**c)/d, but it would not be wrong to do a/d*b**c + e/f*g**h, yet the plus comes after higher priority operators.

If this is technically an issue (e.g. restriction on the AST), add
parenthesis or spaces.
Copy link
Contributor

@mdhaber mdhaber Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule appears to conflict with the previous rule about space around operators * and /.

I think the reason for an exception - if there needs to be one - should be more explicit. I remember discussing in person that linting tools might check the AST to ensure that it is not modified by an auto-correction, but this is not something that the user will necessarily be thinking abou. A user might be ordering sequences of operations in a particular way to get floating point arithmetic to do what they want. If they are tempted to break the rules to do so:

  • the linter does not have to be able to make the correction automatically
  • the user is welcome to use parentheses
  • the user is welcome to declare an exception to the rule with noqa

I think the rules need to be more complete before we can assess whether there a need for an exception, though.

Copy link
Member Author

@tupui tupui Jun 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general thought, rules could conflict if we ask them to be applied one after the order in a strict order.

The principle of these rules is that users should not care at all and even learn them. The linters are here for that. If a user does something in a certain order for arithmetic reasons and there is a reordering happening, then we can ask tools to either not reorder or provide a skip for a given rule. See the last point in the notes.


```
a/d*b**c
a*(b**c)/d
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear to follow the rule.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so?

a*b**c / d
a * b**c / d
```

- When splitting an equation, new lines should start with the operator linking
the previous and next logical block. Single digit on a line are forbidden.

```
(
a/b
+ c*d
)
Copy link
Contributor

@mdhaber mdhaber Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should show an example where splitting the line makes sense. We would not want to suggest that a / b + c*d should be split across four lines. Consider referring to PEP8 "Should a Line Break Before or After a Binary Operator".

I'm not sure if the use of the term "logical block" is correct. This is a single logical line split across multiple physical lines.

```

### Examples

```python
# good
i = i + 1
submitted += 1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a + b) * (a - b)
dfdx = sign*(-2*x + 2*y + 2)
result = 2*x**2 + 3*x**(2/3)
y = 4*x**2 + 2*x + 1
c_i1j = (
1./n**2.
*np.prod(
0.5*(2. + abs(z_ij[i1, :]) + abs(z_ij) - abs(z_ij[i1, :] - z_ij)), axis=1
)
)
```

```python
# bad
i = i + 1
submitted += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these bad?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are good, for now o just copy pasted the whole block in Black. But yes we should remove the correct ones from there.

x = x * 2 - 1
hypot2 = x * x + y * y
c = (a + b) * (a - b)
dfdx = sign * (-2 * x + 2 * y + 2)
result = 2 * x ** 2 + 3 * x ** (2 / 3)
y = 4 * x ** 2 + 2 * x + 1
c_i1j = (
1.0
/ n ** 2.0
* np.prod(
0.5 * (2.0 + abs(z_ij[i1, :]) + abs(z_ij) - abs(z_ij[i1, :] - z_ij)), axis=1
)
)
```

## Notes

These formatting rules do not make any consideration in terms of performances
nor precision. The scope is limited to styling.
mdhaber marked this conversation as resolved.
Show resolved Hide resolved