Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InsertQueryBuilder very slow when inserting bulk data on a 'raw' table #3931

Closed
sarfata opened this issue Apr 2, 2019 · 2 comments · Fixed by #3935
Closed

InsertQueryBuilder very slow when inserting bulk data on a 'raw' table #3931

sarfata opened this issue Apr 2, 2019 · 2 comments · Fixed by #3935

Comments

@sarfata
Copy link
Contributor

sarfata commented Apr 2, 2019

Issue type:

[ ] question
[X] bug report
[ ] feature request
[ ] documentation issue

Database system/driver:

[ ] cordova
[ ] mongodb
[ ] mssql
[ ] mysql / mariadb
[ ] oracle
[X] postgres
[ ] cockroachdb
[ ] sqlite
[ ] sqljs
[ ] react-native
[ ] expo

TypeORM version:

[X] latest
[X] @next
[ ] 0.x.x (or put your version here)

Context (aka: why I am using raw queries)

One of the table in my project is a long list of timestamped events to be queried with timescaledb. There is no ID, just a timestamps and some values. I have to manage it using raw queries (typeorm does not support entities without IDs). I create the table, insert and select manually.

Steps to reproduce or a small repository showing the problem:

I noticed that raw inserts using QueryBuilder are much slower than regular inserts.

This can be exhibited by adding the following test:

test/benchmark/bulk-save-querybuilder/bulk-save-querybuilder.ts

import "reflect-metadata";
import {Connection} from "../../../src/connection/Connection";
import {closeTestingConnections, createTestingConnections, reloadTestingDatabases} from "../../utils/test-utils";
import {Document} from "../bulk-save-case2/entity/Document";

describe("benchmark > bulk-save > case-querybuilder", () => {

    let connections: Connection[];
    before(async () => connections = await createTestingConnections({ __dirname, enabledDrivers: ["postgres"] }));
    beforeEach(() => reloadTestingDatabases(connections));
    after(() => closeTestingConnections(connections));

    it("testing bulk save of 10000 objects", () => Promise.all(connections.map(async connection => {

        const documents: Document[] = [];
        for (let i = 0; i < 10000; i++) {
            const document = new Document();

            document.id = i.toString();
            document.docId = "label/" + i;
            document.context = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent vel faucibus nunc. Etiam volutpat vel urna in scelerisque. Cras a erat ipsum. ";
            document.label = "label/" + i;
            document.date = new Date();

            documents.push(document);
        }


        await connection.createQueryRunner().query(`CREATE TABLE "document" ("id" text NOT NULL, "docId" text NOT NULL, "label" text NOT NULL, "context" text NOT NULL, "date" TIMESTAMP WITH TIME ZONE NOT NULL, CONSTRAINT "PK_e57d3357f83f3cdc0acffc3d777" PRIMARY KEY ("id"))`);
        await connection.manager.createQueryBuilder()
            .insert()
            .into("document", [
              "id",
              "docId",
              "label",
              "context",
              "date",
            ])
            .values(documents)
            .execute();
    })));

});

The two other benchmark run in less than a second against a local postgres database but this one takes 3 minutes on my macbook pro (I had to increase the test timeout):

$ rimraf ./build && tsc && mocha --file ./build/compiled/test/utils/test-setup.js --bail --recursive --timeout 600000 ./build/compiled/test/benchmark


  benchmark > bulk-save > case1
    ✓ testing bulk save of 10.000 objects (628ms)

  benchmark > bulk-save > case2
    ✓ testing bulk save of 10000 objects (1096ms)

  benchmark > bulk-save > case-querybuilder
    ✓ testing bulk save of 10000 objects (186658ms)


  3 passing (3m)

✨  Done in 222.89s.

I have done some investigation and most of the time seems to be spent in the else statement of createValuesExpression(). I do not see why it would be slower when the columns metadata are not available.

I have tried the next branch too and found similar behaviour.

Opening an issue to keep track of this. I will continue to investigate but would appreciate any suggestion. Maybe I am doing something very very wrong?

@jtushman
Copy link

@sarfata -- (sorry if this is the wrong place to put this -- almost certainly is) But. I am also using typeorm and timescaledb in a project. And noticed that you were to. I wonder what patterns you put in place to manage it. It would be amazing to see an example if you have something sharable.

Have you been happy with that combination of tools

@sarfata
Copy link
Contributor Author

sarfata commented Jul 15, 2019

@jtushman why don't you open a new typeorm issue for timescale support? I cannot share my full project but ping me in this new issue and I will copy/paste some code to show how I am doing it now.

In a nutshell, I have one table with time series data that is "manually" managed with the tools provided by typeorm:

  • a custom EntityRepository that uses a manager to make custom SQL inserts or selects;
  • a few manually written migrations to create the timescaledb table
  • the rest is standard typeorm

I am pretty happy with the combination because it allows me to use typeorm which I like a lot and also access my time series with one "database" library. An alternative I considered was to use another lower level library to access the time series but that means two ways of managing connections, two pools of connections, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants