Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add session setting support for specs #381

Merged
merged 7 commits into from
Sep 9, 2024

Conversation

mkleen
Copy link
Contributor

@mkleen mkleen commented Aug 27, 2024

This adds support for session settings for specs. The session settings will be applied to each connection before executing the spec.

@mkleen mkleen marked this pull request as ready for review August 27, 2024 12:54
@mkleen mkleen force-pushed the mkleen/session_settings branch from 7c8cf23 to abd54a2 Compare August 27, 2024 13:45
@mkleen mkleen marked this pull request as draft August 27, 2024 13:53
@mkleen mkleen force-pushed the mkleen/session_settings branch from 5c413c3 to 333108a Compare August 27, 2024 16:09
@mkleen mkleen marked this pull request as ready for review August 27, 2024 16:11
@mkleen mkleen force-pushed the mkleen/session_settings branch 2 times, most recently from fb33c98 to ab00a48 Compare August 27, 2024 20:24
This adds support for session settings for the various specs formats.
These session settings will be applied by the http/pg client before
the spec is executed.
@mkleen mkleen force-pushed the mkleen/session_settings branch from ab00a48 to 09110ba Compare August 27, 2024 20:32
@@ -63,7 +63,7 @@ def parse(self, string, name='<string>'):
class SourceBuildTest(TestCase):

def test_build_from_branch(self):
self.assertIsNotNone(get_crate('4.1'))
self.assertIsNotNone(get_crate('5.8'))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradle based builds are failing on the ci machines.

@mkleen
Copy link
Contributor Author

mkleen commented Aug 28, 2024

@mfussenegger Ready for review!

cr8/clients.py Show resolved Hide resolved
cr8/clients.py Outdated
Comment on lines 333 to 341
conn = aiohttp.TCPConnector(**self._connector_params)
self.__session = session = aiohttp.ClientSession(connector=conn)
for setting, value in self.session_settings.items():
payload = {'stmt': f'set {setting}={value}'}
await _exec(
session,
next(self.urls),
dumps(payload, cls=CrateJsonEncoder)
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TCPConnector afaik internally does some pooling, so this will have the opposite problem of the asyncpg variant in that it only ever gets called once per session initialization instead of once per connection.
It's probably necessary to subclass the TCPConnector to customize _create_connection.

Running the spec mentioned above shows:

cr> select stmt, count(*) from sys.jobs_log group by stmt order by 2 desc ;
+----------------------------------+----------+
| stmt                             | count(*) |
+----------------------------------+----------+
| select count(*) from sys.summits |     1000 |
| set timezone=UTC                 |        2 |
| set application_name=my_app      |        2 |
| ANALYZE                          |        1 |
+----------------------------------+----------+

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you mark this as resolved, there are no changes for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's resolved, see the results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The settings were set for multiple clients.

Copy link
Contributor Author

@mkleen mkleen Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do i miss something here ? What should be the expected output for the spec with concurrency 10 ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should run set .. for each connection it establishes.

With concurrency = 10 I'd expect there to be more than 1 call in the sys.jobs_log output.

You can also verify how many connections it made via:

cr> select connections from sys.nodes limit 10;
+-----------------------------------------------------------------------------------------------------------+
| connections                                                                                               |
+-----------------------------------------------------------------------------------------------------------+
| {"http": {"open": 1, "total": 12}, "psql": {"open": 0, "total": 0}, "transport": {"open": 0, "total": 0}} |
+-----------------------------------------------------------------------------------------------------------+

The numbers should add up

@mkleen
Copy link
Contributor Author

mkleen commented Aug 29, 2024

@mfussenegger Thanks for the feedback. Here are the results after the last changes.

test.toml:

[session_settings]
application_name = 'my_app'
timezone = 'UTC'

[[queries]]
name = "test"
statement = "select count(*) from sys.summits"
concurrency = 1
iterations = 1000

cr8 run-spec test.toml localhost:4200

cr> select stmt, count(*) from sys.jobs_log group by stmt order by 2 desc;
+----------------------------------+----------+
| stmt                             | count(*) |
+----------------------------------+----------+
| select count(*) from sys.summits |     1000 |
| select version from sys.nodes    |        1 |
| set timezone=UTC                 |        1 |
| set application_name=my_app      |        1 |
| ANALYZE                          |        1 |
+----------------------------------+----------+

cr8 run-spec test.toml asyncpg://localhost:5432

cr> select stmt, count(*) from sys.jobs_log group by stmt order by 2 desc;
+----------------------------------+----------+
| stmt                             | count(*) |
+----------------------------------+----------+
| select count(*) from sys.summits |     1000 |
| select version from sys.nodes    |        1 |
| set timezone=UTC                 |        1 |
| set application_name=my_app      |        1 |
| ANALYZE                          |        1 |
+----------------------------------+----------+

test.toml:

[session_settings]
application_name = 'my_app'
timezone = 'UTC'

[[queries]]
name = "test"
statement = "select count(*) from sys.summits"
concurrency = 10
iterations = 1000

cr8 run-spec test.toml localhost:4200

cr> select stmt, count(*) from sys.jobs_log group by stmt order by 2 desc;
+----------------------------------+----------+
| stmt                             | count(*) |
+----------------------------------+----------+
| select count(*) from sys.summits |     1000 |
| select version from sys.nodes    |        1 |
| set timezone=UTC                 |        1 |
| set application_name=my_app      |        1 |
| ANALYZE                          |        1 |
+----------------------------------+----------+

cr8 run-spec test.toml asyncpg://localhost:5432

cr> select stmt, count(*) from sys.jobs_log group by stmt order by 2 desc;
+----------------------------------+----------+
| stmt                             | count(*) |
+----------------------------------+----------+
| select count(*) from sys.summits |     1000 |
| set timezone=UTC                 |      120 |
| set application_name=my_app      |      120 |
| select version from sys.nodes    |        1 |
| ANALYZE                          |        1 |
+----------------------------------+----------+

Note: The asyncpg client sets with concurrency 10, 12 pools with 10 connections, therefore each session setting is only applied once.

@mkleen mkleen requested a review from mfussenegger August 29, 2024 07:34
cr8/engine.py Outdated
for setting, value in self.session_settings.items():
stmt = f'set {setting}={value}'
statements = itertools.repeat((stmt, ()), self.concurrency)
aio.run_many(f, statements, self.concurrency, self.concurrency)
Copy link
Contributor Author

@mkleen mkleen Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works. it's not so elegant. It runs in parallel with the set statement n times for n connections. I tried various approaches but this is straight forward.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfussenegger Do you have any opinion on this approach? I think this needs some cleanup, but would you be ok with this solution?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides breaking encapsulation I think this might become unrealiable with a higher concurrency. The first set statements can finish before laters are triggered - which means a connection could get re-used instead of creating a new one for each set

I think I'd rather have a minimal extra pool implementation in the client on top of the session, and setting limit=1 on the underlying connector to ensure there is no double pooling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, makes sense. Thanks for the feedback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd rather have a minimal extra pool implementation in the client on top of the session, and setting limit=1 on the underlying connector to ensure there is no double pooling.

Do you mean a pool with multiple instances of TCPConnector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean I would need to create also multiple ClientSessions, one for each TCPConnector and pool them too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A custom Connector where I maintain the connections is not an option, because I need to have a ClientSession to run HTTP operations on top of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try the option of using multiple ClientSessions each having it's own TCPConnector with limit 1 and see how the performance is.

@mkleen mkleen force-pushed the mkleen/session_settings branch from 50b02a3 to 2d008ae Compare September 3, 2024 16:44
@mkleen
Copy link
Contributor Author

mkleen commented Sep 4, 2024

@mfussenegger Here is a naive pool implementation for the http client. Surprisingly this improves throughput on the spec:

cr8 run-spec test.toml localhost:4200

master:

## Running Query:
   Name: test
   Statement:
     select count(*) from sys.summits
   Concurrency: 10
   Iterations: 1000
100%|██████████| 1000/1000 [00:00<00:00, 3174.79 requests/s]
Runtime (in ms):
    mean:    0.240 ± 0.031
    min/max: 0.123 → 6.382

with pool:

## Running Query:
   Name: test
   Statement:
     select count(*) from sys.summits
   Concurrency: 10
   Iterations: 1000
100%|██████████| 1000/1000 [00:00<00:00, 4150.77 requests/s]
Runtime (in ms):
    mean:    0.193 ± 0.021
    min/max: 0.103 → 6.738

@mkleen
Copy link
Contributor Author

mkleen commented Sep 4, 2024

However, on higher concurrency (100) this is very inefficient.

@mkleen
Copy link
Contributor Author

mkleen commented Sep 4, 2024

I don't know how to proceed here. I think it's easier not to deal with this on the client level and instead create a user and set a session default.

@mkleen
Copy link
Contributor Author

mkleen commented Sep 5, 2024

What do you think about dropping the support for session settings for the HTTP client and using the asyncpg client for the benchmarks?

@mkleen mkleen force-pushed the mkleen/session_settings branch from 54d5fbc to e341947 Compare September 5, 2024 10:07
@mfussenegger mfussenegger force-pushed the mkleen/session_settings branch from b742916 to 08ce745 Compare September 9, 2024 08:40
@mfussenegger mfussenegger force-pushed the mkleen/session_settings branch from 08ce745 to 1dcdcd3 Compare September 9, 2024 08:45
@mfussenegger mfussenegger merged commit d910d77 into mfussenegger:master Sep 9, 2024
7 checks passed
@mfussenegger
Copy link
Owner

Added a fixup for the pooling - as far as I can tell it should be working as intended now

@mkleen
Copy link
Contributor Author

mkleen commented Sep 9, 2024

Added a fixup for the pooling - as far as I can tell it should be working as intended now

Thanks for the help. Appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants