-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sql): Optimize executions #4804
base: master
Are you sure you want to change the base?
feat(sql): Optimize executions #4804
Conversation
fb4fd18
to
40fd05d
Compare
@Nonnull pipelineConfigIds: List<String>, | ||
@Nonnull criteria: ExecutionCriteria | ||
): List<String> { | ||
return primary.retrieveAndFilterPipelineExecutionIdsForApplication(application, pipelineConfigIds, criteria) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MINOR: BUT ... this ordering DOES matter on the selection criteria and don't think we have a test for this (aka primary MUST come first or it'd select previous data potentially). Just an observation that this is missing a test .. but not sure how "critical" that test really is.
var baseQueryPredicate = field("application").eq(application) | ||
.and(field("config_id").`in`(*pipelineConfigIds.toTypedArray())) | ||
|
||
var table = if (jooq.dialect() == SQLDialect.MYSQL) PIPELINE.tableName.forceIndex("pipeline_application_idx") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: DO we really want to force an index, or is another index on the combination needed? OR perhaps merging a couple of these indexes. NOT sure best solution but... we have a number of indexes with some of this data and wondering if we couldn't do this a touch cleaner. I'd EXPECT the optimizer to handle MOST of this without the need to force an index.
...rc/main/kotlin/com/netflix/spinnaker/orca/sql/pipeline/persistence/SqlExecutionRepository.kt
Outdated
Show resolved
Hide resolved
...rc/main/kotlin/com/netflix/spinnaker/orca/sql/pipeline/persistence/SqlExecutionRepository.kt
Show resolved
Hide resolved
orca-web/src/main/groovy/com/netflix/spinnaker/orca/controllers/TaskController.groovy
Outdated
Show resolved
Hide resolved
|
||
// need to define a new executor service since we want a dedicated set of threads to be made available for every | ||
// new request for performance reasons | ||
ExecutorService executorService = Executors.newFixedThreadPool( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gotta admin that having this in the request path generating a new thread pool PER request seems... dangerous though a bigger question might be what's the downstream impact on the DB, and the impact on running concurrent executions on the DB? Since these are all hitting the RW pod as a rule right now. I could see this adding a tremendous amount of load on the database as a result...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not quite a thread-per-request since the number of threads is set by a config prop (default 10).
#4803 comes next, with more to follow as we move requests to a read-only database replica.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah saw the properties but generating a thread pool on the request even WITH limits is still backed by a queue ... It's still shutdown but this could absolutely hammer a database badly because of a potential exponential call graph.
E.g. a browser can repeat calls which then would make new calls on timeouts that'd overload this badly. The RO PR should help A LOT with that however, and IN THEORY this should improve performance. It's just a concern based upon how I've seen these API's get called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. We also ended up making changes in deck to improve things. Those are gonna be harder for us to upstream, but we'll try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes, massive pipelines cause lots of load.
* <p>It can be further tuned, depending on your setup, since 150 executions work well for some | ||
* applications but a higher number may be appropriate for others. | ||
*/ | ||
int maxNumberOfPipelineExecutionsToProcess = 150; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A concern is this is probably TOO high depending upon capacity for "smaller" installations - as is probably the max execution threads. IF you allow for 3 concurrent API calls at a time, that's 50 threads that get spawned to meet this need, which is potentially 30 separate threads hitting the database. That's a LOT of concurrency against the database & databse connections & could impact pipeline operations as a result. And you don't want to make your connection pools TOO big (beyond CPU capacity) or it actually gets WORSE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a configurable value and matters only when optimizeExecutionRetrieval = true
which by default is false. The idea is that this feature is enabled only when the pipelines and/executions are high.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would still be nice if the defaults are useful for more folks. What do you suggest here @jasonmcintosh ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ponder so NORMAL recommendations on pool/concurrency size is tied to CPU's. I think most people run on smaller instances with limited cores, and with other background processes... I'd just go to smaller defaults? Probably maxExecutionRetrievalThreads set to 4 maybe. shrug it's a hard one to judge - m5.2xlarge has only 8 cpus (which... i'll ignore hyperthreading) so you could hit some concurrency issues. It's not IMPOSSIBLE... I just tend to vote "smaller, more of them" kinda stuff on this.
The whole idea seems like a good optimisation to me 👍 @dbyron-sf @kirangodishala how long have you been running this in your forks? Also could we fit the same concept of optimisations in |
.queryTimeout(queryTimeoutSeconds) // add an explicit timeout so that the query doesn't run forever | ||
.fetch() | ||
|
||
log.info("getting stage information for all the executions found so far") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem valuable enough to leave at info.
log.info("getting stage information for all the executions found so far") | |
log.debug("getting stage information for all the executions found so far") |
if (!expand) { | ||
unexpandPipelineExecutions(allPipelines) | ||
log.info("unexpanding pipeline executions") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.info("unexpanding pipeline executions") | |
log.debug("unexpanding pipeline executions") |
} | ||
|
||
return filterPipelinesByHistoryCutoff(allPipelines, limit) | ||
log.info("filtering pipelines by history") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.info("filtering pipelines by history") | |
log.debug("filtering pipelines by history") |
…perties optimize getPipelinesForApplication() by reducing the number of relevant pipeline config ids to query This is a breaking change: TaskController previously looked at these properties: `tasks.days-of-execution-history` `tasks.number-of-old-pipeline-executions-to-include` and now they live in `tasks.controller`. fix(web): give java code access to groovy to fix below error: > Task :orca-web:compileJava FAILED ../orca/orca-web/src/main/java/com/netflix/spinnaker/config/TaskControllerConfigurationProperties.java:19: error: cannot find symbol import com.netflix.spinnaker.orca.controllers.TaskController; ^ symbol: class TaskController location: package com.netflix.spinnaker.orca.controllers 1 error
…ion() further by processing multiple pipeline config ids at a time and multi-threading each config id batch
refactor function and variable names to make them more descriptive
48975d9
to
ddf724e
Compare
This PR improves performance when using sql as the backend by optimizing the way the results are obtained for the api call
/applications/{application}/pipelines?expand=false&limit=2
frequently initiated by deck.Deck makes that call to Gate. Gate further forwards that call to Orca, specifically to
/v2/applications/{application}/pipelines
endpoint.Orca calls front50 to get the pipeline and strategy config ids and then merges them. Then for each id in the merged list, it queries its own db (SQL) to handle the query parameters, such as limit=2 and filter by statuses. It is observed that these db operations are taking longer times and this PR fixes the slowness by optimizing the queries to the db.
Refactored TaskController to support externalized config properties
Optimize getPipelinesForApplication() by reducing the number of relevant pipeline config ids to query
Further optimize getPipelinesForApplication() by processing multiple pipeline config ids at a time and multi-threading each config id batch
Breaking changes:
These configuration properties
are replaced by the below configuration along with few more newly added ones: