Skip to content
This repository has been archived by the owner on Jul 16, 2021. It is now read-only.

[Proposal?] UpdateOrCreate -> Performance #1090

Open
sr2ds opened this issue Mar 29, 2018 · 9 comments
Open

[Proposal?] UpdateOrCreate -> Performance #1090

sr2ds opened this issue Mar 29, 2018 · 9 comments

Comments

@sr2ds
Copy link

sr2ds commented Mar 29, 2018

I have thought about the performance of UpdateOrCreate and from what I could understand, this work seems to be more efficient if performed by the database itself.

public function updateOrInsert(array $attributes, array $values = [])
    {
        if (! $this->where($attributes)->exists()) {
            return $this->insert(array_merge($attributes, $values));
        }
        return (bool) $this->take(1)->update($values);
    }

From what I've seen, our Database\Query\Builder class performs this work by itself, that is, treating the process with two queries and validation internally.

I do not know exactly how much this can be worth considering the flexibility we need to have in the framework, but I'd like to propose to you a REPLACE and DUPLICATE KEY UPDATE reading in MYSQL queries.

https://dev.mysql.com/doc/refman/5.7/en/replace.html

https://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html

My analysis is based on the premise of working with a data volume with UpdateOrCreate routines for about 50,000 lines, a task repeated many times a day. In current form, I believe I'm running twice as many operations at the application layer because of the need for verification.

We have as reference a native functionality of MongoDB, to improve the compression, see:

https://stackoverflow.com/questions/21342747/how-to-update-if-exists-otherwise-insert-new-document

@sisve
Copy link

sisve commented Mar 29, 2018

Mysql's "INSERT ... ON DUPLICATE KEY UPDATE" (or UPSERT for short) requires a unique index or a primary key, and there's some quirks in what happens when you have multiple unique indexes. The existing Laravel solution has no dependency on any indexes, and you can check for any column values you want. This means that we cannot replicate the existing Laravel functionality with an upsert without making some large assumptions about the database structure.

@sr2ds
Copy link
Author

sr2ds commented Mar 29, 2018

I understood perfectly and what we have really is sensational.
On a large scale, do you consider it more performative if the confessions are made by the bank itself?
Sample: https://stackoverflow.com/a/1488385/6394559

Edit: https://stackoverflow.com/a/21889246/6394559

@mfn
Copy link

mfn commented Mar 3, 2019

The issue title talks about performance - but it's not just that, it's about integrity; especially with concurrency.

Not sure what I've in mind matches what the idea here is, but proper upsert support would be very handy sometimes.

I can't talk about MySQL, but Postgres has ON CONFLICT support but it also has the same requirement as @sisve pointed out: it needs a unique index and in general, Laravels model have no clue about that. Also every technical detail I refer to is Postgres-centric and I would value feedback what does/does not work for other databases.

Given the cases I encountered, I would imagine an interface like this:

  • $model->upsert(array $columnsInConflict, array $columnsToUpdate = [])
  • $columnsInConflict is required and correlates to the column names in the unique index
  • $columnsToUpdate optional list of explicit names of attributes to update on a conflict; will otherwise use all existing attributes
  • part of the functionality is to exploit the concept of Postgres … RETURNING …, i.e. in a single statement the fields are returned in its current form from the database, so the model accurately reflected its state

Problems I see:

  • I don't like the syntax in this form, but it simply matches the technical requirement for a most simple Postgres ON CONFLICT
  • I'm not sure the proposal of $columnsToUpdate is good, but I know from experience that you may not always want to write to all columns. I decided against a key=>value array here, because I believe the values should come from the model: otherwise you could propose to update columns to values not reflected in the model at all at first => feels strange? But may be useful? OTOH the idea what be, due to RETURNING, that the model would receive all fields
  • it does not support ON CONFLICT DO NOTHING: otherwise you can't use … RETURNING … but then approach would be integral to update the model correctly. This isn't very good for Postgres and it's current MMVC implementation. There are workarounds for it, but they would require additional SQL finesse == added complexity
  • it does not support ON CONFLICT UPDATE SET … WHERE… (but why would/should it?)

@mfn
Copy link

mfn commented Mar 17, 2019

Further:

  • SQLite supports UPSERT inspired by Postgres
  • I've never used MSSQL and Google gives me mixed result, the best seems to be to use the MERGE strategy but I've idea if that is a good approach
  • ->upsert() would only work on a model if it has not yet been persisted => i.e. it has no "key"

Then, how do get the last "inserted or updated (!)" id back?

  • postgres explicitly supports it via RETURNING id
    (but as mentioned this requires that the UPDATE performs something; if it doesn't, it can't return anything)
  • No idea about sqlite, mysql, mssql: is it just like calling "get last inserted id" and would it work for the UPDATE part of the upserts?

@staudenmeir
Copy link

@mfn Yes, MERGE is the equivalent on SQL Server.

I've worked on an implementation for all four databases last year. I'll dig it up so we can discuss and test it.

Related discussion: #1325

@staudenmeir
Copy link

This is my draft for the query builder implementation: staudenmeir/framework@8219335

Possible values for $update:

  • null: Update all columns from $values with $values.
  • Numeric array: Update the given columns with $values.
  • Associative array: Update the given columns with the given values or expressions.

$target contains the column(s) that should be checked for duplicates (and have a UNIQUE index). Since this parameter "only" applies to non-MySQL databases, I put it last.

What should the method return? Just a Boolean value or the number of affected rows? Is that useful?

Example:

Schema::create('stats', function ($table) {
    $table->unsignedInteger('post_id');
    $table->date('date');
    $table->unsignedInteger('views');
    $table->timestamps();
    $table->unique(['post_id', 'date']);
});

DB::table('stats')->upsert(
    [
        ['post_id' => 1, 'date' => now(), 'views' => 1, 'created_at' => now(), 'updated_at' => now()],
        ['post_id' => 2, 'date' => now(), 'views' => 1, 'created_at' => now(), 'updated_at' => now()],
    ],
    ['views' => DB::raw('stats.views + 1'), 'updated_at'],
    ['post_id', 'date']
);

sleep(1);

DB::table('stats')->upsert(
    [
        ['post_id' => 2, 'date' => now(), 'views' => 1, 'created_at' => now(), 'updated_at' => now()],
    ],
    ['views' => DB::raw('stats.views + 1'), 'updated_at'],
    ['post_id', 'date']
);

@staudenmeir
Copy link

staudenmeir commented Mar 24, 2019

What would the Eloquent implementation look like? Unlike updateOrCreate(), upsert() can process multiple datasets at once. But as @mfn noted, then we can't properly get the inserted IDs.

Are there actually real-life cases where you would use upsert() on a table with auto-incrementing IDs?

@mfn
Copy link

mfn commented Mar 27, 2019

Unfortunately I won't be able to take a closer look for the next weeks due to work+vacation, sorry!

@staudenmeir
Copy link

After laravel/framework#28420, I've decided to release the code as a package: https://github.com/staudenmeir/laravel-upsert

There is no support for $model->upsert() yet (as suggested by @mfn), as this is quite complicated to implement across all databases. But there is Model::upsert(), which handles timestamps automatically.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants