-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce integer division operator //
#2426
Comments
Ref #1712 |
In lieu of the full framework at #1712 (comment), I think this could be a method on the
I don't think it would be a huge PR — it would require:
|
Wait, before we continue, what should be the result of this query: from_text format:json '[{ "int": 13, "float": 13.0 }]'
select [
int / 5, int / 5.0, float / 5, float / 5.0,
int // 5, int // 5.0, float // 5, float // 5.0,
] I think that:
|
from_text format:json '[{ "i": 13, "f": 13.0 }]'
select [
i/5, i/5.0, f/5, f/5.0,
] is returns
on playground now. This behavior can also be seen in the DuckDB web shell. duckdb> select 13 / 5;
┌──────────┐
│ (13 / 5) │
╞══════════╡
│ 2 │
└──────────┘
Elapsed: 3 ms
duckdb> select 13.0 / 5;
┌────────────┐
│ (13.0 / 5) │
╞════════════╡
│ 2.6 │
└────────────┘
Elapsed: 7 ms
duckdb> select 13 / 5.0;
┌────────────┐
│ (13 / 5.0) │
╞════════════╡
│ 2.6 │
└────────────┘
Elapsed: 7 ms This is the same type as Python and others, but differs in that Python's In [1]: 13 // 5
Out[1]: 2
In [2]: 13.0 // 5
Out[2]: 2.0
In [3]: 13 // 5.0
Out[3]: 2.0 |
Do you mean for PRQL or for the DB? Are there DBs that don't coerce there? |
I have looked to see if this could be implemented, perhaps apache/datafusion-sqlparser-rs#868 needs to be merged? |
Yes, probably:
|
I noticed that MySQL's DIV operator always returns an integer. Could this be an important difference on PRQL? |
@eitsupi Yes, that is an important difference. We should inject a cast into the appropriate type after the operation - if that dialect does not handle the types correctly. So this is the table we are going with?
This would mean that we compile like this:
|
@max-sixty Yes, I wanted to say I we may want to produce an error for division of non-matching types:
I don't have a clear argument here, but I've heard a lot of stories about implicit type casting causing unexpected bugs. If we want to double down on "throwing errors early", this is the way to go. If we decide to go with this, I would also consider
A bit inconvenient, but we can provide really good errors around this. |
Thanks for the summary. The Maybe?
I'm just not sure that returning a Float is the best thing to do here, could it be that an integer return behavior like MySQL would be more reasonable? |
Yes, my compilation examples were about currently agreed-on behavior. We can implement any behavior we decide on, so the real question is "what is the behavior we want?".
I'd say it is. This is aligned with my proposal in the last comment: |
Since the integer division is easily computed by using something like
|
PEP 238 – Changing the Division Operator
How about this? |
Oh, I see. At first I didn't understand what you meant with "maybe err". Now I assume you meant "error during compilation saying 'unsupported operation in this dialect'". With Python's impl notes, you now found how we could implement it in all dialects. If I understand correctly, this is a table you are now proposing:
I like this due to one key fact: the result type of an operator is always the same. This is nice because it means we don't need generic functions or operator overloading in the language. This makes the semantics of the language simpler. But the main benefit is that operators in PRQL are simpler. You want result to be an int? Use Sidenote: PRQL should test what the output of |
Another sidenote: I just want to complain about how SQLite works: with a as (
select 5 as b union all select 5.0 as b
)
select 13 / b from a;
Edit: Postgres and DuckDB both return |
Another thing I learned while looking into this is that integer division in Python, R, etc. is "Floor division", which produces different results than MySQL's DIV operator...... Like: >>> 13 // -5
-3
>>> 13 // 5
2 If we were to mimic this, we would use the floor function, which would look something like these:
|
//
//
@max-sixty @snth Are we sure we want this behavior?
Python agrees, but Rust says it's |
Closes #2426 Also makes sure that float division is always producing floats. This is done by casting to float before division. This can be removed when match is implemented and we can check types and use only the appropriate operation.
Hmm, just looking at the last question and not having read the rest of the thread, I don't know whether rounding towards zero or taking the FLOOR is preferred. Here's what ChatGPT has to say about it: The behavior of integer division, also known as truncating division or floored division, can vary depending on the programming language or mathematical context. In some programming languages, integer division rounds towards zero, while in others it takes the floor function of the result. Rounding towards zero means that the quotient of the division is rounded to the nearest integer towards zero. For example:
To determine the behavior of integer division in a specific programming language, it is recommended to consult the language's documentation or specifications. In relational databases that use SQL, the behavior of integer division is typically defined to take the floor function of the result. This means that the quotient is rounded down to the nearest integer. The SQL standard specifies the behavior of integer division using the floor function. For example, in the SQL language, the division operator ("/") performs integer division and returns the floor of the division result. Here are some examples:
It's worth noting that different database systems or SQL dialects may have slight variations in their behavior, so it's always a good practice to consult the specific documentation or reference material for the database system you are using to confirm the behavior of integer division in that context. I don't know if we should take its word about the SQL Standard definition. I just tried to google for that and didn't come up with anything. I would try to think of some examples. The truncating / rounding to zero definition seems more intuitive because then apart from the sign difference, you get the same result whether you're dealing with positive or negative numbers. |
Floor division seems to be related to Floor modulo.
The Wikipedia page has a list of behaviors for each programing language. |
Looking at the modulo Wikipedia article, it would seem we want this property to hold:
|
It turns out that this holds iff |
What's up?
DuckDB seems to introduce a new division operator
/
and the current division operator/
will be moved to//
(duckdb/duckdb#7082).I'm wondering if the same thing could be done with PRQL.
In my opinion, PRQL should not do the division itself as it currently does, but should convert
//
to/
for targets that do not have two division operators.Note that My SQL seems to require the
DEV
operator for integer division.The text was updated successfully, but these errors were encountered: