Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport fixes to 2.0.x branch #3208

Closed

Conversation

weiznich
Copy link
Member

This essentially prepares the final 2.0.0 release by:

  • Backporting all relevant fixes
  • Bumping the version numbers
  • Adding a Cargo.lock file as this hopefully allows us to checkout this branch at any later point in time and it will continue to build. This should not affect the release to crates.io in any way.

Heliozoa and others added 30 commits June 16, 2022 15:31
This commit fixes potential name collisions in code generated by the
`table!` macro by prefixing function argument names with
`__diesel_internal`. The underlying problem here is that rust accepts
patterns as part of the function signature. This combined with the fact
that each zero sized struct (so all column/table structs) are type +
value at the same time results in problems as soon as the users tries to
create a column/table with the same name as one of the used arguments.
So for example if the users has a table that contains an `alias` column,
that would have caused a name collision inside of the following function
signature:

```rust
fn map(column: C, alias: &$crate::query_source::Alias<S>) -> Self::Out
```

Rustc would have interpreted `alias` here as zero sized struct literal,
which of course has a different type than the corresponding RHS. That
results in a compiler error about mismatched types.

This commit fixes this problem by prefixing the corresponding variables
with `__diesel_internal`, which is hopefully reasonably uncommon, so
that no one uses that as column name prefix.
Co-authored-by: Christopher Fleetwood <[email protected]>
Allow `ORDER BY` clauses to be used in any combination subquery by
either put parenthesis around the subqueries (Postgres/Mysql) or adding
a wrapping subquery (Sqlite).
Our `R2D2Connection::is_broken` implementations where broken in such a
way that they marked any valid connection as broken as soon as it was
checked into the pool again. This resulted in the pool opening new
connections everytime a connection was checked out of the pool, which
obviously removes the possibility of reusing the same connection again
and again. This commit fixes that issue and adds some tests to ensure
that we do not break this again in the future.
This was triggered by the fact that the pool tries to open a new
connection after the old one was closed. As this happens in a background
thread this happens concurrently to our test code. That means if
establishing a new connection is fast that will increase the
acquire_count before we checked it. We observed this happening for
sqlite. I've opted into just not checking the acquire count there as
this is not really relevant anyway at that points.
This commit replaces some recursive macros with simpler implementations
by generating non recursive where clauses by just forwarding to the next
smaller impl on the corresponding tuple.
According to my measurements this reduces the build time for a `cargo
check --no-default-features --features "sqlite 64-column-table"` build
from ~1min 20s to ~1min 05s.
Technically this seems to be a breaking change, but as we haven't
released 2.0 yet this is fine.
This is kind of required as otherwise custom connection implementations
cannot provide their own transaction manager implementation.
This is a tricky one. It seems like the behaviour described in that
issue should work out of the box, but it doesn't. I've spend some time
to investigate various solutions to make this work, but I came to the
conclusion that the current behaviour is the correct one.

The underlying issue here is that such an query promotes the inner
`Nullable<_>` of the field onto the outer `Queryable` wrapper. Without
`Selectable` that would require a select clause like
`.select((table::column.assume_not_null(),).nullable())`. This is
technically a safe pattern, but requires the usage of the "advanced"
`assume_not_null()` method to forcibly convert types to their not null representation.

Possible solutions tried to make the enable constructs shown in that
issue:

* I've tried to make the `impl Selectable for Option<T>` return the
following `SelectExpression`:
`dsl::Nullable<dsl::AssumeNotNull<T::SelectExpression>>`
where `AssumeNotNull` converts each tuple element to the corresponding
not nullable expression, while `Nullable` wraps the tuple itself into a
nullable type wrapper.
* I've tried to apply a similar approach like that one above, but only
for derived impls by manipulating the generated code for a optional
field with `#[diesel(embed)]`

Both solutions require changes to our sql type system, as for example
allowing to load a non nullable value into a `Option<T>` to enable their
usage in a more general scope as the presented example case.
(See the added test cases for this). That by itself would be fine in my
opinion, as this is always a safe operation. Unfortunately the
`AssumeNotNull` transformation would be applied recursively for all
sub-tuples, which in turn would cause trouble with nested joins (again
see the examples). We would be able to workaround this issue by allowing
the `FromSql<ST, DB> for Option<T>` impl for non-nullable types to catch
null values, which in turn really feels like a bad hack. (You would like
to get an error there instead, but nested nullable information are
gone.)
All in all this lead me to the conclusion that the current behaviour is
correct.

This PR adds a few additional tests (an adjusted version of the test
from the bug report + more tests around nested joins) and does move
around some code bits that I noticed while working on this.

I think the official answer for the bug report would be: Either wrap the
inner type also in an `Option` or provide a manual `Selectable` impl
that does the "right" thing there.
This is to prevent things like diesel-rs#3182 to slip through again
Unfortunately we cannot add a test for json, as the behaviour there
diverges between mariadb and mysql.
Step-by-step:
1. `cargo +nightly doc --manifest-path diesel/Cargo.toml`

Expected result:
No warnings

Actual result:
Three warnings
@weiznich weiznich requested a review from a team June 16, 2022 13:50
@weiznich
Copy link
Member Author

@diesel-rs/core I propose to do a final stable diesel 2.0 release next week. Please leave a comment here if you have any objections. I will fill a PR against the web repository next week so that we also update the webpage to diesel 2.0 + include the migration guide, the changelog and a small release post there.

@Ten0
Copy link
Member

Ten0 commented Jun 20, 2022

I propose to do a final stable diesel 2.0 release next week. Please leave a comment here if you have any objections.

The two things I have in mind that slightly scare me with regards to backwards compatibility upon solving are the following two comments:

  1. Add DatabaseErrorKind variants for isolation failures #1612 (comment)

  2. Return a iterator from Connection::load #2799 (review)
    I'm not super confident with the current approach of Return a iterator from Connection::load #2799 (comment) because that introduces:

    a. Unnecessary lifetime bounds on the deserialization part
    b. A somewhat-global config at the connection level, whereas whether you want to use one or the other would more typically depend on the amount of data you're querying and what you're doing with it afterwards (unless row-by-row is always ~faster) - which appears to be fundamentally more fine-grained than connection-level.

    If however row-by-row is always ~faster, having just that version is always ideal and consistent across all backends so we could probably keep just that.

Unfortunately I have been very busy with other topics and couldn't work on these before.

  • I've opened draft More straightforward transaction error handling #3211 to experiment on 1, but I'll need a tiny bit of help there.
  • Would you agree it would be useful to have the benchmark suggested in 2's comment and the decision would depend on it?
  • If so, what do you think would be a reasonable timing to attempt implementation?

@weiznich
Copy link
Member Author

Return a iterator from Connection::load #2799 (review)
I'm not super confident with the current approach of

Return a iterator from Connection::load #2799 (comment) because that introduces:

a. Unnecessary lifetime bounds on the deserialization part
b. A somewhat-global config at the connection level, whereas whether you want to use one or the other would more typically depend on the amount of data you're querying and what you're doing with it afterwards (unless row-by-row is always ~faster) - which appears to be fundamentally more fine-grained than connection-level.

If however row-by-row is always ~faster, having just that version is always ideal and consistent across all backends so we could probably keep just that.

I remember that I've done some experimentation back then but I cannot find the corresponding code anymore. Will try to have another look in the next days.
What I remember is that the row by row mode was significantly slower for all of our benchmarks. I think the main issue there was that it allocates things per row again and again, where the "normal" mode reuses buffers and other things.

To address the concrete points:

a. Unnecessary lifetime bounds on the deserialization part

Yes the lifetime for Cursor is not optimal right now, but Cursor is not part of the public API. That means we can remove the lifetime if it turns out that we do not need it. That wouldn't work the other way around.

b. A somewhat-global config at the connection level, whereas whether you want to use one or the other would more typically depend on the amount of data you're querying and what you're doing with it afterwards (unless row-by-row is always ~faster) - which appears to be fundamentally more fine-grained than connection-level.

The point is there is not really an other place to configure this than the connection, because as soon as you call load_iter() or whatever the query is already executed. What we could do is to provide the connection level configuration with some documentation that basically says: Do not use this if you don't call Connection::load directly. Then we could modify RunQueryDsl to do "runtime type checking" to see if the connection is a PgConnection and just enable that mode there transparently. That would be hacky but would result in that behaviour most people would expect.

@Ten0
Copy link
Member

Ten0 commented Jun 21, 2022

The point is there is not really an other place to configure this than the connection, because as soon as you call load_iter() or whatever the query is already executed

I was thinking about the alternate design described in that comment : #2799 (review), which looks like it would solve both of these issues.

@weiznich
Copy link
Member Author

I've put up a WIP PR for row by row mode in #3213 so that we could quickly check the performance implication of enabling it generally.
That written: I would really prefer not to have backend specific methods there, but instead a way to enable it only for specific queries. Maybe even by wrapping the corresponding query into an "option" struct that signals this should be loaded via row-by-row mode. The WIP-PR contains some parts to just enable the row by row mode only for load_iter().

@weiznich
Copy link
Member Author

Closed in favor of #3237

@weiznich weiznich closed this Jul 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.