Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

queries not executing concurrently if more than 4 in parallel #189

Open
andrejs-sisojevs opened this issue Sep 17, 2018 · 2 comments
Open

Comments

@andrejs-sisojevs
Copy link

andrejs-sisojevs commented Sep 17, 2018

Hello I'm trying to troubleshoot problem of queries not executing concurrently.

I have a stored procedure in DB, which simply waits for specified number of seconds and then returns:

CREATE OR REPLACE FUNCTION test_sleep (x float) RETURNS float STABLE as $$
    from time import sleep
    sleep(x)
    return x
$$ LANGUAGE plpythonu;

select test_sleep(15);

Running test, which creates 1 connection pool with maxpoolsize: 10, executes select test_sleep(10); 4 times in parallel: test executes in 14 seconds, which is fine.

Running test, which creates 1 connection pool with maxpoolsize: 10, executes select test_sleep(10); 5 times in parallel: test executes in 22 seconds, which is not acceptable. It executes first 4 on one go, and then executes last command in a separate go.

Repeated above test multiple times - timing results are the same.

Running test, which creates 1 connection pool with maxpoolsize: 10, executes select test_sleep(10); 10 times in parallel. Here's how it executes:
[00:00] Start querying
[00:17] Query #1 wait 10 seconds returned ResulSet
[00:17] Query #2 wait 10 seconds returned ResulSet
[00:17] Query #3 wait 10 seconds returned ResulSet
[00:17] Query #4 wait 10 seconds returned ResulSet
[00:28] Query #5 wait 10 seconds returned ResulSet
[00:28] Query #6 wait 10 seconds returned ResulSet
[00:28] Query #7 wait 10 seconds returned ResulSet
[00:28] Query #8 wait 10 seconds returned ResulSet
[00:28] Results read from ResultSet #1: 1 row
[00:28] Results read from ResultSet #2: 1 row
[00:28] Results read from ResultSet #3: 1 row
[00:28] Results read from ResultSet #4: 1 row
[00:28] Results read from ResultSet #5: 1 row
[00:28] Results read from ResultSet #6: 1 row
[00:28] Results read from ResultSet #7: 1 row
[00:28] Results read from ResultSet #8: 1 row
[00:39] Query #9 wait 10 seconds returned ResulSet
[00:39] Results read from ResultSet #9: 1 row
[00:39] Query #10 wait 10 seconds returned ResulSet
[00:39] Results read from ResultSet #10: 1 row
[00:39] Finished querying

So it's already 2 problems:
(A) concurrency cap is only 4 queries
(B) resultSet.toObjectIter(...) lag caused by other concurrent queries

Running same with one connection pool per query yields same result.
Launching same JS test script from command line 10 times in parallel, 1 query each: executes fine, 14 seconds in overall.

Here is the code:

async function querySleep(db: DBClient, timeout: number): Promise<string> {
    const deferred = new Deferred();
    const resultSet = db.connectionPool.reserve((err, connObj) => {
        connObj.conn.createStatement((err, stmt) => {
            stmt.executeQuery(`select test_sleep(${timeout});`, (err, rs) => {
                global.logger.info(`Waited for ${timeout} seconds.`);

                rs.toObjectIter((err, resultSetIterator) => {
                    let row = resultSetIterator.rows.next();  // executed synchronously
                    let accum = '';
                    while (!row.done) {
                        const csvRow = ResultSetStream.csvRow_fromResultSetRow(resultSetIterator.labels, row.value);
                        accum += `${csvRow}\n`;
                        row = resultSetIterator.rows.next();  // executed synchronously
                    }

                    global.logger.info(`Results grabbed for timeout of ${timeout} seconds.`);
                    deferred.resolve(accum);
                });
            });
        });
    });

    return deferred.promise;
}

describe('DBClient', () => {
    it('should properly query cuncurrently', async () => {
        const db = await prepareDBClient();
        global.logger.info('Start concurrent waits.');

        const timeout = 10;
        const times = 10;
        const queries = [];
        for (let i = 0; i < times; i++) {
            queries.push(
                querySleep(db, timeout)
            );
        }
        const results = await Promise.all(queries);
        global.logger.info('All queries complete.');
    }).timeout(60 * 1000);
});

Connection pool initialisation options:

assert(!jinst.isJvmCreated());
jinst.addOption('-Xrs');
jinst.setupClasspath([
    env('REDSHIFT_JDBC_DRIVER')
]);
const java = jinst.getInstance();
java.findClassSync('com.amazon.redshift.jdbc.Driver');  // analogy for Java expession "Class.forName("com.mysql.jdbc.Driver");"

export function connectionPoolConfig(username: string, password: string): Object {
    const REDSHIFT_ENDPOINT = env('REDSHIFT_JDBC_ENDPOINT');
    return {
        url: REDSHIFT_ENDPOINT,
        drivername: 'com.amazon.redshift.jdbc.Driver',
        minpoolsize: 1,
        maxpoolsize: 10,
        properties: {
            user: username,
            password
        }
    };
}

Java version:

$ java -version
java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)

DB maximum concurrent connections per user: 15.

Any ideas what might have went wrong?

@andrejs-sisojevs
Copy link
Author

Further tests showed, that after some work-time connection pool fails to asynchronize randomly even for 3 concurrent queries.
So, we had to workaround this by doing node process fork, and thus handling all DB querying in subprocesses, each having own connection pool with size 1. This works, although memory/CPU consumption becomes awful. But now we don't have time to rewrite our system on Java...

@dbit-xia
Copy link

dbit-xia commented May 9, 2021

export UV_THREADPOOL_SIZE=1024
Try this variable, it should be solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants