-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for INSERTing multiple records? #294
Comments
This seems a popular concern right now! I asked this question on Discord and the answer for now is this is not possible, unless you generate the query yourself (and lose a lot of the compile-time-safety features as a result). There seems to be some activity in the area though:
|
@connec I missed those related topics! Thanks for pointing them out. Yes, for me, this is the biggest need for interacting with SQL -- I can write SQL myself and make sure it is safe, and binding a few arguments isn't hard, but it's a pain to bind an arbitrary number of |
I ran into a similar issue. I think I have a solution (for now), but it is somewhat hideous:
The "trick" is that you can pass in a Vector of items to be a Postgres ARRAY type, and UNNEST that array to turn it into a set of rows. So, above, I basically split the thing I want to insert into a vec for each column of values (fortunately only 3 for me), which each get turned into a temporary table with row numbers. Once I have these, I can join them all on the row numbers to produce a final table of values, which I then pass to the INSERT. It's pretty hideous, and I'd love to find a more concise or better approach, but it's the best I have so far! I need to actually test it properly next, to make sure it does what I am hoping it does.. |
As an aside, I also tried doing a simpler approach of passing a single array of all of the values I wanted to update at once, but I hit a wall where SQLx did not understand the type of the array I was trying to provide, so that's what led me to splitting the array up and recombining it in SQL instead. |
Just a quick FYI that, at least from a postgres POV, this works as I'd like. Here's a brief example:
|
Hi, is there a way to do this i need to make a 25 inserts at once |
For Postgres, the below works, and is easy. Maybe this is good enough and nothing else is needed to support INSERTing multiple records, at least for Postgres.
|
@greglearns can i use this with |
@shiftrtech I haven't tried it, so I don't know. Let us know what you find out! |
Slightly easier (IMHO cleaner) Sql:
|
Does handling this in the application have any disadvantages? |
@itsfarseen Yes, doing it in a loop means a lot more back-and-forth between application and DB. |
Until this is fully supported, you can send an entire json object/list of
records, in one insert command and then in the insert query break the json
into rows (in postgres, it is something like "jsonb_to_rows"). This is less
efficient than being able to do it directly (using binary communication
instead of json), but much much better than doing single inserts.
…On Fri, Apr 30, 2021 at 4:24 AM Jonas Platte ***@***.***> wrote:
@itsfarseen <https://github.com/itsfarseen> Yes, doing it in a loop means
a lot more back-and-forth between application and DB.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#294 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABFXGOUGTUKBC54YKDOEE5LTLKAPJANCNFSM4MXEPZMQ>
.
|
Is doing multiple single inserts that much inefficient? |
If the're on different servers that are "far apart" (in different datacenters) then the JSON solution is most likely faster, maybe substantially faster. If they're on the same machine then you're probably right and the JSON solution is slower, but not slow enough that you would notice outside of microbenchmarks. If it really matters, benchmark it for your specific situation. |
I plan on improving the let lala = vec![("abc", true), ("xyz", false)];
let todo = sqlx::query(
r#"INSERT INTO foo (description, done)
SELECT * FROM UNNEST($1, $2)
RETURNING id, description, done"#,
)
.bind(&PgArray(lala.iter().map(|it| it.0)))
.bind(&PgArray(lala.iter().map(|it| it.1))) And yes, @shiftrtech you can rewrite this with #[derive(sqlx::FromRow)]
struct Todo {
id: i32,
description: String,
done: bool
}
let lala = vec![("abc", true), ("xyz", false)];
let mut v1: Vec<String> = Vec::new();
let mut v2: Vec<bool> = Vec::new();
lala.into_iter().for_each(|todo| {
v1.push(todo.0.into());
v2.push(todo.1);
});
let todo: Todo = sqlx::query_as(
r#"INSERT INTO foo (description, done)
SELECT * FROM UNNEST($1, $2)
RETURNING id, description, done"#,
)
// .bind(&PgArray(lala.iter().map(|it| it.0)))
// .bind(&PgArray(lala.iter().map(|it| it.1)))
.bind(&v1)
.bind(&v2)
.fetch_one(&mut tx)
.await.map_err(|e| dbg!(e) )?; For convenience, we might just add let todo: Todo = sqlx::query_as(
r#"INSERT INTO foo (description, done)
SELECT * FROM UNNEST($1, $2)
RETURNING id, description, done"#,
)
.bind_array(lala.iter().map(|it| it.0))
.bind_array(lala.iter().map(|it| it.1))
.fetch_one(&mut tx)
.await
.map_err(|e| dbg!(e) )?; |
Just realized though that |
I also need to write multiple records to the database. If I understand it correctly, there is no support for multi-inserts in sqlx yet. I have tried the following and the performance is very poor. The execution takes over 2s for 100 records. Is there a prepared statement created here and will it be reused the next time the function is called? Is the statement parsed by sqlx or is it sent directly to the database? There is an index over all 7 columns in exactly this order. Could this cause performance problems? Does anyone have experience with massive inserts into a postgres database?
|
@markose this could be because of creating 7 different vecs and lots of cloning. It might get optimized away in release builds. Not sure tho. Could you check in a release build? |
@itsfarseen Thanks for advice. The "problem" occurred in the release build. It is also not the Rust logic, but the statement itself that takes so much time. I know it because sqlx issues a [WARN] message. I have now rebuilt it so that the rows are inserted as multi-insert (INSERT INTO ... VALUES ...). This has improved it a bit. My main question is whether this statement, as it stands, is cached in the database as a prepared statement or is it re-parsed on each call? |
I'm talking about this warning, which appears very often.
It inserts 100 rows with small values (overall ~200 bytes per row) Disk iops: Test is run on Google GCE (16 Cores / 64 GB RAM) and local scratch disk. |
@markose yes the query is prepared and cached if you use the I have some suggestions regarding your pub async fn insert_index_rows(
tx: &mut Transaction<'_, Postgres>,
rows: &Vec<&IndexRecord>,
) -> Result<(), sqlx::Error> {
log::debug!("insert_index_rows: {}", rows.len());
let mut v1: Vec<&str> = Vec::with_capacity(rows.len());
let mut v2: Vec<&str> = Vec::with_capacity(rows.len());
let mut v3: Vec<&str> = Vec::with_capacity(rows.len());
let mut v4: Vec<&[u8]> = Vec::with_capacity(rows.len());
let mut v5: Vec<&str> = Vec::with_capacity(rows.len());
let mut v6: Vec<String> = Vec::with_capacity(rows.len());
let mut v7: Vec<&str> = Vec::with_capacity(rows.len());
rows.into_iter().for_each(|row| {
v1.push(&row.path);
v2.push(&row.viewname);
v3.push(&row.fieldname);
v4.push(&row.fieldvalue);
v5.push(&row.name);
v6.push(row.viewentry.read().unwrap().id.clone());
v7.push(row.get_hash());
});
sqlx::query("INSERT INTO resources_index (path, viewname, fieldname, fieldvalue, name, viewentry_id, id) SELECT * FROM UNNEST ($1,$2,$3,$4,$5,$6,$7)")
.bind(v1)
.bind(v2)
.bind(v3)
.bind(v4)
.bind(v5)
.bind(v6)
.bind(v7)
.execute(&mut *tx)
.await?;
Ok(())
} For In an async application I would recommend instead using You might also consider modeling your data differently, maybe just storing that |
@abonander Thanks a lot for your help! I will test this. Also I will change implementation to |
I've removed the hash (MD5 over all fields), which was used as ID for the row to speed up Inserts.
Unfortunately, this does not seem to be possible in the context of a transaction.
Do you have an idea for this? |
@markose you are breaking one of the fundamental rules of rust: a single mutable reference within a scope. To get around this (though I'm not sure if you'd really want to.. or of the implications in terms of a transaction) you would need to use a shared interior mutability pattern of something like a async fn run_query(tx: Rc<Mutex<Transaction<'_, Postgres>>>) {
let mut tx = tx.lock().unwrap();
query("SELECT * FROM my_table").execute(&mut *tx).await.unwrap();
}
async fn looper(tx: &Rc<Mutex<Transaction<'_, Postgres>>>) {
let mut futures = vec![];
for _ in 0..100 {
futures.push(run_query(Rc::clone(tx)))
}
futures::future::join_all(futures).await;
} |
@pauldorehill I have also thought about an exclusive lock. Thank you. I will try this. |
You don't need a Normally, if your table has a sqlx::query("DELETE FROM resources_index WHERE path = ANY($1)")
.bind(&paths_to_delete)
.execute(&mut *tx)
.await?; Or even if none of the columns are technically unique, one of them usually has a common value when you're deleting a chunk of rows. |
I keep running into this, could I take a stab at implementing it? I think the |
My ideal would be a type that implements The current |
I think streaming rows already captures any use cases I can think of for |
I had trouble inserting array fields here because Here is the slightly simplified example. INSERT INTO entries(
feed,
source_url,
categories,
source_body_html)
SELECT
($1::INT8[])[i],
NULLIF(($2::TEXT[])[i], ''),
COALESCE(
(SELECT ARRAY_AGG(e) FROM jsonb_array_elements_text(($6::JSONB[])[i]) as e),
ARRAY[]::TEXT[]),
($8::TEXT[])[i]
FROM generate_subscripts($1::INT8[], 1) AS i
RETURNING entries.id, entries.source_url I've only really tested with <50 rows but there is no obvious slowdown occurring here. It seems to be fairly fast. That being said this is an incredibly bad API:
I would love to see a good interface here. Ideally I could just insert an |
Just noting that you can now use |
@arturs706 please ask on StackOverflow, Discord (see README) or https://reddit.com/r/learnrust instead of hijacking random issues to ask unrelated questions. |
@Iron-E's answer helped and i made it work with a query builder: use sqlx::{Pool, Postgres, QueryBuilder};
struct Category(String, Option<i32>);
async fn insert_categories(categories: Vec<Category>, pool: &Pool<Postgres>) {
let mut query_builder = QueryBuilder::new("INSERT INTO product_categories (name, parent_id) ");
query_builder.push_values(categories, |mut b, new_category| {
b.push_bind(new_category.0).push_bind(new_category.1);
});
let query = query_builder.build();
query.execute(pool);
} |
In case anyone finds it useful, query builder can also be used with let mut qb: QueryBuilder<MySql> = QueryBuilder::new(
"INSERT INTO event
(key, success, description) ",
);
qb.push_values(EXPECTED_RECORDS.iter(), |mut b, rec| {
b.push_bind(rec.key)
.push_bind(rec.success)
.push_bind(rec.description);
});
qb.push(
"ON DUPLICATE KEY UPDATE
success=VALUES(success),
description=VALUES(description)
RETURNING id",
);
let mut rows = qb.build().fetch(pool);
let mut ret = BTreeMap::new();
let mut count: usize = 0;
while let Some(row) = rows.try_next().await.unwrap() {
// FIXME: it would be best to use `try_get("id")`, but this is currently broken
// <https://github.com/launchbadge/sqlx/issues/1530>
let id: u32 = row.try_get(0).unwrap();
ret.insert(EXPECTED_RECORDS[count].ty, id);
count += 1;
}
// sanity check that all rows were covered
assert_eq!(count, EXPECTED_RECORDS.len()); I think this is pretty clean, but maybe the checked macros could get some syntax to handle this too. Perhaps let rows = sqlx::query!(
"INSERT INTO event
(key, success description)
VALUES
-- This gets expanded to `(?, ?, ?)` for each item in iter
repeat! { (key, success, description) }
ON DUPLICATE KEY UPDATE
success=VALUES(success),
description=VALUES(description)
RETURNING id",
EXPECTED_RECORDS.iter()
).fetch_all(pool).unwrap();
for row in rows {
println!("id: {id}");
} |
For folks looking for a solution for sqlite (which doesn't have
Here's the result:
|
How do you do the equivalent of
By the way, this project looks great!
The text was updated successfully, but these errors were encountered: