-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement live reconfiguration in the compute_ctl, rebased #3964
Conversation
…rate. This is in preparation of using compute_ctl to launch postgres nodes in the neon_local control plane. And seems like a good idea to separate the public interfaces anyway. One non-mechanical change here is that we now use a RwLock rather than atomics to protect the ComputeNode::metrics field. We were not using atomics for performance but for convenience here, and an RwLock is now more convenient.
We use the term "endpoint" in for compute Postgres nodes in the web UI and user-facing documentation now. Adjust the nomenclature in the code. This changes the name of the "neon_local pg" command to "neon_local endpoint". Also adjust names of classes, variables etc. in the python tests accordingly. This also changes the directory structure so that endpoints are now stored in: .neon/endpoints/<endpoint id> instead of: .neon/pgdatadirs/tenants/<tenant_id>/<endpoint (node) name> The tenant ID is no longer part of the path. That means that you cannot have two endpoints with the same name/ID in two different tenants anymore. That's consistent with how we treat endpoints in the real control plane and proxy: the endpoint ID must be globally unique.
Accept spec in JSON format and request compute reconfiguration from the configurator thread. If anything goes wrong after we set the compute state to `ConfigurationPending` and / or sent spec to the configurator thread, we basically leave compute in the potentially wrong state. That said, it's control-plane's responsibility to watch compute state after reconfiguration request and to clean restart it in case of errors. It still lacks ability of starting up without spec and some validations, i.e. that live reconfiguration should be only available with `--compute-id` and `--control-plane-uri` options. Otherwise, it works fine and could be tested by running `compute_ctl` locally, then sending it a new spec: ```shell curl -d "$(cat ./compute-spec-new.json)" http://localhost:3080/spec ``` We have one configurator thread and async http server, so generally we have single consumer - multiple producers pattern here. That's why we use `mpsc` channel, not `tokio::sync::watch`. Actually, concurrency of producers is limited to one due to code logic, but we still need an ability to potentially pass `Sender` to several threads. Next, we use async `hyper` + `tokio` http server, but all the other code is completely synchronous. So we need to send data from async to sync, that's why we use `mpsc::unbounded_channel` here, not `mpsc::channel`. It doesn't make much sense to rewrite all code to async now, but we can consider doing this in the future. I think that a combination of `Mutex` and `CondVar` would work just fine too, but as we already have `tokio`, I decided to try something from it.
With this commit one can start compute with something like ```shell cargo run --bin compute_ctl -- -i no-compute \ -p http://localhost:9095 \ -D compute_pgdata \ -C "postgresql://[email protected]:5434/postgres" \ -b ./pg_install/v15/bin/postgres ``` and it will hang waiting for spec. Then send one spec ```shell curl -d "$(cat ./compute-spec.json)" http://localhost:3080/spec ``` Postgres will be started and configured. Then reconfigure it with ```shell curl -d "$(cat ./compute-spec-new.json)" http://localhost:3080/spec ``` Most of safeguards and comments are added. Some polishing especially around HTTP API is still needed.
Test results for 89cc2c5:debug build: 212 tests run: 202 passed, 0 failed, 10 (full report)release build: 212 tests run: 202 passed, 0 failed, 10 (full report) |
compute_tools/src/compute.rs
Outdated
pub state_changed: Condvar, | ||
} | ||
|
||
pub struct ComputeNodeInner { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate state_changed
looks good to me. I'm not sure about additional ComputeNodeInner
, it's still a volatile compute state, but this
ComputeState was quite confusing, as it was used both to define the format of the JSON that the /get_status API returned, and to hold other mutable state of a ComputeNode.
is indeed a good point. Yet, I'd maybe solve it in a bit different way -- define GET /status
response as a separate structure, which can be built from ComputeState
. It should eliminate confusion and will be generally cleaner than currently in the main. I've already defined struct for error response
About metrics, I'm 100% +1 for moving them under the same lock, but as you properly pointed out in the original PR, let's keep subset of changes smaller to simplify review
If no objections, I'd incorporate mentioned above changes into #3963
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet, I'd maybe solve it in a bit different way -- define GET /status response as a separate structure, which can be built from ComputeState.
Works for me
@@ -12,6 +15,44 @@ use serde_json; | |||
use tracing::{error, info}; | |||
use tracing_utils::http::OtelName; | |||
|
|||
async fn handle_spec_request(req: Request<Body>, compute: &Arc<ComputeNode>) -> Result<(), (String, StatusCode)> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored this in a similar way too, thanks. Also tweaked waiting part to spawn a separate blocking thread with tokio::task::spawn_blocking
to do not block main pool of worker threads and serve status requests. I didn't expect it to be a problem, but it seems to be a safer approach
1f2946a
to
1326ccc
Compare
Separated the last part of the original PR into #3980, so I think we can close this one |
1326ccc
to
d823c15
Compare
Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.7.0 to 3.7.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/aio-libs/aiohttp/releases">aiohttp's releases</a>.</em></p> <blockquote> <h2>aiohttp 3.7.3 release</h2> <h2>Features</h2> <ul> <li>Use Brotli instead of brotlipy <code>[neondatabase#3803](aio-libs/aiohttp#3803) <https://github.com/aio-libs/aiohttp/issues/3803></code>_</li> <li>Made exceptions pickleable. Also changed the repr of some exceptions. <code>[neondatabase#4077](aio-libs/aiohttp#4077) <https://github.com/aio-libs/aiohttp/issues/4077></code>_</li> </ul> <h2>Bugfixes</h2> <ul> <li>Raise a ClientResponseError instead of an AssertionError for a blank HTTP Reason Phrase. <code>[neondatabase#3532](aio-libs/aiohttp#3532) <https://github.com/aio-libs/aiohttp/issues/3532></code>_</li> <li>Fix <code>web_middlewares.normalize_path_middleware</code> behavior for patch without slash. <code>[neondatabase#3669](aio-libs/aiohttp#3669) <https://github.com/aio-libs/aiohttp/issues/3669></code>_</li> <li>Fix overshadowing of overlapped sub-applications prefixes. <code>[neondatabase#3701](aio-libs/aiohttp#3701) <https://github.com/aio-libs/aiohttp/issues/3701></code>_</li> <li>Make <code>BaseConnector.close()</code> a coroutine and wait until the client closes all connections. Drop deprecated "with Connector():" syntax. <code>[neondatabase#3736](aio-libs/aiohttp#3736) <https://github.com/aio-libs/aiohttp/issues/3736></code>_</li> <li>Reset the <code>sock_read</code> timeout each time data is received for a <code>aiohttp.client</code> response. <code>[neondatabase#3808](aio-libs/aiohttp#3808) <https://github.com/aio-libs/aiohttp/issues/3808></code>_</li> <li>Fixed type annotation for add_view method of UrlDispatcher to accept any subclass of View <code>[neondatabase#3880](aio-libs/aiohttp#3880) <https://github.com/aio-libs/aiohttp/issues/3880></code>_</li> <li>Fixed querying the address families from DNS that the current host supports. <code>[neondatabase#5156](aio-libs/aiohttp#5156) <https://github.com/aio-libs/aiohttp/issues/5156></code>_</li> <li>Change return type of MultipartReader.<strong>aiter</strong>() and BodyPartReader.<strong>aiter</strong>() to AsyncIterator. <code>[neondatabase#5163](aio-libs/aiohttp#5163) <https://github.com/aio-libs/aiohttp/issues/5163></code>_</li> <li>Provide x86 Windows wheels. <code>[neondatabase#5230](aio-libs/aiohttp#5230) <https://github.com/aio-libs/aiohttp/issues/5230></code>_</li> </ul> <h2>Improved Documentation</h2> <ul> <li>Add documentation for <code>aiohttp.web.FileResponse</code>. <code>[neondatabase#3958](aio-libs/aiohttp#3958) <https://github.com/aio-libs/aiohttp/issues/3958></code>_</li> <li>Removed deprecation warning in tracing example docs <code>[neondatabase#3964](aio-libs/aiohttp#3964) <https://github.com/aio-libs/aiohttp/issues/3964></code>_</li> <li>Fixed wrong "Usage" docstring of <code>aiohttp.client.request</code>. <code>[neondatabase#4603](aio-libs/aiohttp#4603) <https://github.com/aio-libs/aiohttp/issues/4603></code>_</li> <li>Add aiohttp-pydantic to third party libraries <code>[neondatabase#5228](aio-libs/aiohttp#5228) <https://github.com/aio-libs/aiohttp/issues/5228></code>_</li> </ul> <h2>Misc</h2> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst">aiohttp's changelog</a>.</em></p> <blockquote> <h1>3.7.4 (2021-02-25)</h1> <h2>Bugfixes</h2> <ul> <li> <p><strong>(SECURITY BUG)</strong> Started preventing open redirects in the <code>aiohttp.web.normalize_path_middleware</code> middleware. For more details, see <a href="https://github.com/aio-libs/aiohttp/security/advisories/GHSA-v6wp-4m6f-gcjg">https://github.com/aio-libs/aiohttp/security/advisories/GHSA-v6wp-4m6f-gcjg</a>.</p> <p>Thanks to <code>Beast Glatisant <https://github.com/g147></code>__ for finding the first instance of this issue and <code>Jelmer Vernooij <https://jelmer.uk/></code>__ for reporting and tracking it down in aiohttp. <code>[neondatabase#5497](aio-libs/aiohttp#5497) <https://github.com/aio-libs/aiohttp/issues/5497></code>_</p> </li> <li> <p>Fix interpretation difference of the pure-Python and the Cython-based HTTP parsers construct a <code>yarl.URL</code> object for HTTP request-target.</p> <p>Before this fix, the Python parser would turn the URI's absolute-path for <code>//some-path</code> into <code>/</code> while the Cython code preserved it as <code>//some-path</code>. Now, both do the latter. <code>[neondatabase#5498](aio-libs/aiohttp#5498) <https://github.com/aio-libs/aiohttp/issues/5498></code>_</p> </li> </ul> <hr /> <h1>3.7.3 (2020-11-18)</h1> <h2>Features</h2> <ul> <li>Use Brotli instead of brotlipy <code>[neondatabase#3803](aio-libs/aiohttp#3803) <https://github.com/aio-libs/aiohttp/issues/3803></code>_</li> <li>Made exceptions pickleable. Also changed the repr of some exceptions. <code>[neondatabase#4077](aio-libs/aiohttp#4077) <https://github.com/aio-libs/aiohttp/issues/4077></code>_</li> </ul> <h2>Bugfixes</h2> <ul> <li>Raise a ClientResponseError instead of an AssertionError for a blank HTTP Reason Phrase. <code>[neondatabase#3532](aio-libs/aiohttp#3532) <https://github.com/aio-libs/aiohttp/issues/3532></code>_</li> <li>Fix <code>web_middlewares.normalize_path_middleware</code> behavior for patch without slash. <code>[neondatabase#3669](aio-libs/aiohttp#3669) <https://github.com/aio-libs/aiohttp/issues/3669></code>_</li> <li>Fix overshadowing of overlapped sub-applications prefixes. <code>[neondatabase#3701](aio-libs/aiohttp#3701) <https://github.com/aio-libs/aiohttp/issues/3701></code>_</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/aio-libs/aiohttp/commit/0a26acc1de9e1b0244456b7881ec16ba8bb64fc3"><code>0a26acc</code></a> Bump aiohttp to v3.7.4 for a security release</li> <li><a href="https://github.com/aio-libs/aiohttp/commit/021c416c18392a111225bc7326063dc4a99a5138"><code>021c416</code></a> Merge branch 'GHSA-v6wp-4m6f-gcjg' into master</li> <li><a href="https://github.com/aio-libs/aiohttp/commit/4ed7c25b537f71c6245bb74d6b20e5867db243ab"><code>4ed7c25</code></a> Bump chardet from 3.0.4 to 4.0.0 (<a href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5333">#5333</a>)</li> <li><a href="https://github.com/aio-libs/aiohttp/commit/b61f0fdffc887df24244ba7bdfe8567c580240ff"><code>b61f0fd</code></a> Fix how pure-Python HTTP parser interprets <code>//</code></li> <li><a href="https://github.com/aio-libs/aiohttp/commit/5c1efbc32c46820250bd25440bb7ea96cb05abe9"><code>5c1efbc</code></a> Bump pre-commit from 2.9.2 to 2.9.3 (<a href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5322">#5322</a>)</li> <li><a href="https://github.com/aio-libs/aiohttp/commit/007507580137efcc0a20391a0792f39b337d9c1a"><code>0075075</code></a> Bump pygments from 2.7.2 to 2.7.3 (<a href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5318">#5318</a>)</li> <li><a href="https://github.com/aio-libs/aiohttp/commit/5085173d947e6cc01b6daf1aa48fe7698834c569"><code>5085173</code></a> Bump multidict from 5.0.2 to 5.1.0 (<a href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5308">#5308</a>)</li> <li><a href="https://github.com/aio-libs/aiohttp/commit/5d1a75e68d278c641c90021409f4eb5de1810e5e"><code>5d1a75e</code></a> Bump pre-commit from 2.9.0 to 2.9.2 (<a href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5290">#5290</a>)</li> <li><a href="https://github.com/aio-libs/aiohttp/commit/6724d0e7a944fd7e3a710dc292d785fa8fe424fd"><code>6724d0e</code></a> Bump pre-commit from 2.8.2 to 2.9.0 (<a href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5273">#5273</a>)</li> <li><a href="https://github.com/aio-libs/aiohttp/commit/c688451ce31b914c71b11d2ac6c326b0c87e6d1f"><code>c688451</code></a> Removed duplicate timeout parameter in ClientSession reference docs. (<a href="https://github-redirect.dependabot.com/aio-libs/aiohttp/issues/5262">#5262</a>) ...</li> <li>Additional commits viewable in <a href="https://github.com/aio-libs/aiohttp/compare/v3.7.0...v3.7.4">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=aiohttp&package-manager=pip&previous-version=3.7.0&new-version=3.7.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/neondatabase/neon/network/alerts). </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Vadim Kharitonov <[email protected]>
d823c15
to
50e38f5
Compare
These changes have been done in other PRs, closing. |
This is PR #3923, rebased over #3938.
I tried to refrain from making any non-trivial changes, but here are a few notable ones:
/get_status
API returned, and to hold other mutable state of a ComputeNode. I refactored it so that ComputeState defines the format of/get_status
, like on 'main', and there's another struct,ComputeNodeInner
, which holds all the mutable state.ParsedSpec
, which encapsulatesComputeSpec
and the fields derived from it.