Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new RPC endpoint: health check #3729

Closed
wants to merge 12 commits into from
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ and this project adheres to the versioning scheme outlined in the [README.md](RE
from the source code are now attached to the `SymbolicExpression` nodes. This
will be useful for tools that use the Clarity library to analyze and
manipulate Clarity source code, e.g. a formatter.
- Added a new RPC endpoint to query the node's health. A node is considered healthy if it
is caught up with its bootstrap peers. The endpoint returns
200 if it is synced to its peers' chaintips, 513 if the query was unsuccessful (no viable
pavitthrap marked this conversation as resolved.
Show resolved Hide resolved
data potentially), or 512 if the node is not caught up to its peers. This endpoint can be
accessed at `v2/health`.

### Fixed

Expand Down
11 changes: 11 additions & 0 deletions docs/rpc-endpoints.md
Original file line number Diff line number Diff line change
Expand Up @@ -456,3 +456,14 @@ object of the following form:
Determine whether a given trait is implemented within the specified contract (either explicitly or implicitly).

See OpenAPI [spec](./rpc/openapi.yaml) for details.

### GET /v2/health

Determine whether a node is healthy. A node is considered healthy if its block height
is greater than or equal to the max block height of its initial peers. If there are no valid
initial peers or data for the node to determine this information, this endpoint
returns an error. The endpoint also returns an error if the node's height is
less than the max block height amongst its initial peers, and this error includes
the percent of blocks the node has relative to its most advanced peer.

See OpenAPI [spec](./rpc/openapi.yaml) for details.
4 changes: 4 additions & 0 deletions docs/rpc/api/core-node/get-health.example.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"matches_peers": true,
"percent_of_blocks_synced": 100
}
16 changes: 16 additions & 0 deletions docs/rpc/api/core-node/get-health.schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"description": "GET request to query node health",
"title": "GetHealthResponse",
"type": "object",
"additionalProperties": false,
"required": ["matches_peers", "percent_of_blocks_synced"],
"properties": {
"matches_peers": {
"type": "boolean"
},
"percent_of_blocks_synced": {
"type": "integer"
}
}
}
30 changes: 30 additions & 0 deletions docs/rpc/openapi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -483,3 +483,33 @@ paths:
The Stacks chain tip to query from.
If tip == "latest", the query will be run from the latest known tip (includes unconfirmed state).
If the tip is left unspecified, the stacks chain tip will be selected (only includes confirmed state).

/v2/health:
get:
summary: Query the health of the node.
description: Get a boolean determining whether a node is synced up with its boostrap nodes, and the percent of
blocks the node has relative to its most advanced peer.
tags:
- Info
operationId: get_health
responses:
200:
description: Success
content:
application/json:
schema:
$ref: ./api/core-node/get-health.schema.json
example:
$ref: ./api/core-node/get-health.example.json
512:
description: The node is unhealthy, meaning it is lagging behind its most advanced peer. The
`percent_of_blocks_synced` field indicates what percent of the height this node is at, relative to its
most advanced peer.
content:
application/json:
schema:
$ref: ./api/core-node/get-health.schema.json
example:
$ref: ./api/core-node/get-health.example.json
pavitthrap marked this conversation as resolved.
Show resolved Hide resolved
513:
description: Failed to query for health (no data or no valid peers to query from).
3 changes: 2 additions & 1 deletion src/net/chat.rs
Original file line number Diff line number Diff line change
Expand Up @@ -306,7 +306,8 @@ pub struct ConversationP2P {
pub handshake_addrbytes: PeerAddress, // from handshake
pub handshake_port: u16, // from handshake
pub peer_heartbeat: u32, // how often do we need to ping the remote peer?
pub peer_expire_block_height: u64, // when does the peer's key expire?
// the burn block height at which the peer's key expires
pub peer_expire_block_height: u64,

pub data_url: UrlString, // where does this peer's data live? Set to a 0-length string if not known.

Expand Down
44 changes: 38 additions & 6 deletions src/net/db.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1275,7 +1275,7 @@ impl PeerDB {

if always_include_allowed {
// always include allowed neighbors, freshness be damned
let allow_qry = "SELECT * FROM frontier WHERE network_id = ?1 AND denied < ?2 AND (allowed < 0 OR ?3 < allowed) AND (peer_version & 0x000000ff) >= ?4".to_string();
let allow_qry = "SELECT * FROM frontier WHERE network_id = ?1 AND denied < ?2 AND (allowed < 0 OR ?3 < allowed) AND (peer_version & 0x000000ff) <= ?4".to_string();
obycode marked this conversation as resolved.
Show resolved Hide resolved
let allow_args: &[&dyn ToSql] = &[
&network_id,
&u64_to_sql(now_secs)?,
Expand Down Expand Up @@ -1325,7 +1325,7 @@ impl PeerDB {
/// -- always include all allowed neighbors
/// -- never include denied neighbors
/// -- for neighbors that are neither allowed nor denied, sample them randomly as long as they're fresh.
pub fn get_initial_neighbors(
pub fn get_random_initial_neighbors(
conn: &DBConn,
network_id: u32,
network_epoch: u8,
Expand All @@ -1335,6 +1335,34 @@ impl PeerDB {
PeerDB::get_random_neighbors(conn, network_id, network_epoch, count, block_height, true)
}

pub fn get_valid_initial_neighbors(
conn: &DBConn,
network_id: u32,
peer_version: u32,
burn_block_height: u64,
) -> Result<Vec<Neighbor>, db_error> {
// UTC time
let now_secs = util::get_epoch_time_secs();

// we only select for peers with peer versions that are less than (or equal to)
// this node's peer version
let query = "SELECT * FROM frontier WHERE initial = 1 AND (allowed < 0 OR ?1 < allowed) \
AND network_id = ?2 AND denied < ?3 AND ?4 < expire_block_height \
AND (peer_version & 0x000000ff) <= ?5"
pavitthrap marked this conversation as resolved.
Show resolved Hide resolved
.to_string();

let args: &[&dyn ToSql] = &[
&u64_to_sql(now_secs)?,
&network_id,
&u64_to_sql(now_secs)?,
&u64_to_sql(burn_block_height)?,
&peer_version,
];

let initial_peers = query_rows::<Neighbor, _>(conn, &query, args)?;
Ok(initial_peers)
}

/// Get a randomized set of peers for walking the peer graph.
/// -- selects peers at random even if not allowed
pub fn get_random_walk_neighbors(
Expand Down Expand Up @@ -1667,17 +1695,21 @@ mod test {
)
.unwrap();

let n5 = PeerDB::get_initial_neighbors(db.conn(), 0x9abcdef0, 0x78, 5, 23455).unwrap();
let n5 =
PeerDB::get_random_initial_neighbors(db.conn(), 0x9abcdef0, 0x78, 5, 23455).unwrap();
assert!(are_present(&n5, &initial_neighbors));

let n10 = PeerDB::get_initial_neighbors(db.conn(), 0x9abcdef0, 0x78, 10, 23455).unwrap();
let n10 =
PeerDB::get_random_initial_neighbors(db.conn(), 0x9abcdef0, 0x78, 10, 23455).unwrap();
assert!(are_present(&n10, &initial_neighbors));

let n20 = PeerDB::get_initial_neighbors(db.conn(), 0x9abcdef0, 0x78, 20, 23455).unwrap();
let n20 =
PeerDB::get_random_initial_neighbors(db.conn(), 0x9abcdef0, 0x78, 20, 23455).unwrap();
assert!(are_present(&initial_neighbors, &n20));

let n15_fresh =
PeerDB::get_initial_neighbors(db.conn(), 0x9abcdef0, 0x78, 15, 23456 + 14).unwrap();
PeerDB::get_random_initial_neighbors(db.conn(), 0x9abcdef0, 0x78, 15, 23456 + 14)
.unwrap();
assert!(are_present(
&n15_fresh[10..15].to_vec(),
&initial_neighbors[10..20].to_vec()
Expand Down
81 changes: 80 additions & 1 deletion src/net/http.rs
obycode marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@ lazy_static! {
static ref PATH_POST_MEMPOOL_QUERY: Regex =
Regex::new(r#"^/v2/mempool/query$"#).unwrap();
static ref PATH_OPTIONS_WILDCARD: Regex = Regex::new("^/v2/.{0,4096}$").unwrap();
static ref PATH_GET_HEALTH: Regex = Regex::new(r#"^/v2/health$"#).unwrap();
}

/// HTTP headers that we really care about
Expand Down Expand Up @@ -1602,6 +1603,7 @@ impl HttpRequestType {
&PATH_POST_MEMPOOL_QUERY,
&HttpRequestType::parse_post_mempool_query,
),
("GET", &PATH_GET_HEALTH, &HttpRequestType::parse_get_health),
];

// use url::Url to parse path and query string
Expand Down Expand Up @@ -2677,6 +2679,23 @@ impl HttpRequestType {
))
}

fn parse_get_health<R: Read>(
_protocol: &mut StacksHttp,
preamble: &HttpRequestPreamble,
_regex: &Captures,
_query: Option<&str>,
_fd: &mut R,
) -> Result<HttpRequestType, net_error> {
if preamble.get_content_length() != 0 {
return Err(net_error::DeserializeError(
"Invalid Http request: expected 0-length body for GetInfo".to_string(),
));
}
Ok(HttpRequestType::GetHealth(
HttpRequestMetadata::from_preamble(preamble),
))
}

pub fn metadata(&self) -> &HttpRequestMetadata {
match *self {
HttpRequestType::GetInfo(ref md) => md,
Expand Down Expand Up @@ -2705,6 +2724,7 @@ impl HttpRequestType {
HttpRequestType::MemPoolQuery(ref md, ..) => md,
HttpRequestType::FeeRateEstimate(ref md, _, _) => md,
HttpRequestType::ClientError(ref md, ..) => md,
HttpRequestType::GetHealth(ref md) => md,
}
}

Expand Down Expand Up @@ -2736,6 +2756,7 @@ impl HttpRequestType {
HttpRequestType::MemPoolQuery(ref mut md, ..) => md,
HttpRequestType::FeeRateEstimate(ref mut md, _, _) => md,
HttpRequestType::ClientError(ref mut md, ..) => md,
HttpRequestType::GetHealth(ref mut md) => md,
}
}

Expand Down Expand Up @@ -2909,6 +2930,7 @@ impl HttpRequestType {
ClientError::NotFound(path) => path.to_string(),
_ => "error path unknown".into(),
},
HttpRequestType::GetHealth(_md) => "/v2/health".to_string(),
}
}

Expand Down Expand Up @@ -2945,6 +2967,7 @@ impl HttpRequestType {
HttpRequestType::MemPoolQuery(..) => "/v2/mempool/query",
HttpRequestType::FeeRateEstimate(_, _, _) => "/v2/fees/transaction",
HttpRequestType::OptionsPreflight(..) | HttpRequestType::ClientError(..) => "/",
HttpRequestType::GetHealth(..) => "/v2/health",
}
}

Expand Down Expand Up @@ -3173,7 +3196,7 @@ impl HttpResponseType {
) -> Result<HttpResponseType, net_error> {
if preamble.status_code < 400 || preamble.status_code > 599 {
return Err(net_error::DeserializeError(
"Inavlid response: not an error".to_string(),
"Invalid response: not an error".to_string(),
));
}

Expand All @@ -3187,8 +3210,20 @@ impl HttpResponseType {
}

let mut error_text = String::new();
let mut json_val: serde_json::Value = Default::default();
fd.read_to_string(&mut error_text)
.map_err(net_error::ReadError)?;
if preamble.content_type == HttpContentType::JSON {
let json_result = serde_json::from_str(&error_text.clone());
json_val = match json_result {
Ok(val) => val,
Err(serde_json::Error { .. }) => {
return Err(net_error::DeserializeError(
"Invalid response: unable to deserialize json response".to_string(),
))
}
}
}

let md = HttpResponseMetadata::from_preamble(request_version, preamble);
let resp = match preamble.status_code {
Expand All @@ -3199,6 +3234,8 @@ impl HttpResponseType {
404 => HttpResponseType::NotFound(md, error_text),
500 => HttpResponseType::ServerError(md, error_text),
503 => HttpResponseType::ServiceUnavailable(md, error_text),
512 => HttpResponseType::GetHealthError(md, json_val),
513 => HttpResponseType::GetHealthNoDataError(md, error_text),
_ => HttpResponseType::Error(md, preamble.status_code, error_text),
};
Ok(resp)
Expand Down Expand Up @@ -3423,6 +3460,7 @@ impl HttpResponseType {
&PATH_POST_MEMPOOL_QUERY,
&HttpResponseType::parse_post_mempool_query,
),
(&PATH_GET_HEALTH, &HttpResponseType::parse_get_health),
];

// use url::Url to parse path and query string
Expand Down Expand Up @@ -3958,6 +3996,21 @@ impl HttpResponseType {
))
}

fn parse_get_health<R: Read>(
_protocol: &mut StacksHttp,
request_version: HttpVersion,
preamble: &HttpResponsePreamble,
fd: &mut R,
len_hint: Option<usize>,
) -> Result<HttpResponseType, net_error> {
let health_info =
HttpResponseType::parse_json(preamble, fd, len_hint, MAX_MESSAGE_LEN as u64)?;
Ok(HttpResponseType::GetHealth(
HttpResponseMetadata::from_preamble(request_version, preamble),
health_info,
))
}

fn error_reason(code: u16) -> &'static str {
match code {
400 => "Bad Request",
Expand Down Expand Up @@ -4021,6 +4074,7 @@ impl HttpResponseType {
HttpResponseType::MemPoolTxs(ref md, ..) => md,
HttpResponseType::OptionsPreflight(ref md) => md,
HttpResponseType::TransactionFeeEstimation(ref md, _) => md,
HttpResponseType::GetHealth(ref md, _) => md,
// errors
HttpResponseType::BadRequestJSON(ref md, _) => md,
HttpResponseType::BadRequest(ref md, _) => md,
Expand All @@ -4031,6 +4085,8 @@ impl HttpResponseType {
HttpResponseType::ServerError(ref md, _) => md,
HttpResponseType::ServiceUnavailable(ref md, _) => md,
HttpResponseType::Error(ref md, _, _) => md,
HttpResponseType::GetHealthError(ref md, _) => md,
HttpResponseType::GetHealthNoDataError(ref md, _) => md,
}
}

Expand Down Expand Up @@ -4330,6 +4386,10 @@ impl HttpResponseType {
)?;
HttpResponseType::send_text(protocol, md, fd, "".as_bytes())?;
}
HttpResponseType::GetHealth(ref md, ref data) => {
HttpResponsePreamble::ok_JSON_from_md(fd, md)?;
HttpResponseType::send_json(protocol, md, fd, data)?;
}
HttpResponseType::BadRequestJSON(ref md, ref data) => {
HttpResponsePreamble::new_serialized(
fd,
Expand All @@ -4354,6 +4414,21 @@ impl HttpResponseType {
HttpResponseType::Error(_, ref error_code, ref msg) => {
self.error_response(fd, *error_code, msg)?
}
HttpResponseType::GetHealthError(ref md, ref data) => {
HttpResponsePreamble::new_serialized(
fd,
512,
HttpResponseType::error_reason(512),
md.content_length.clone(),
&HttpContentType::JSON,
md.request_id,
|ref mut fd| keep_alive_headers(fd, md),
)?;
HttpResponseType::send_json(protocol, md, fd, data)?;
}
HttpResponseType::GetHealthNoDataError(_, ref msg) => {
self.error_response(fd, 513, msg)?
}
};
Ok(())
}
Expand Down Expand Up @@ -4436,6 +4511,7 @@ impl MessageSequence for StacksHttpMessage {
HttpRequestType::GetAttachmentsInv(..) => "HTTP(GetAttachmentsInv)",
HttpRequestType::MemPoolQuery(..) => "HTTP(MemPoolQuery)",
HttpRequestType::OptionsPreflight(..) => "HTTP(OptionsPreflight)",
HttpRequestType::GetHealth(_) => "HTTP(GetHealth)",
HttpRequestType::ClientError(..) => "HTTP(ClientError)",
HttpRequestType::FeeRateEstimate(_, _, _) => "HTTP(FeeRateEstimate)",
},
Expand Down Expand Up @@ -4466,6 +4542,7 @@ impl MessageSequence for StacksHttpMessage {
HttpResponseType::MemPoolTxStream(..) => "HTTP(MemPoolTxStream)",
HttpResponseType::MemPoolTxs(..) => "HTTP(MemPoolTxs)",
HttpResponseType::OptionsPreflight(_) => "HTTP(OptionsPreflight)",
HttpResponseType::GetHealth(..) => "HTTP(GetHealth)",
HttpResponseType::BadRequestJSON(..) | HttpResponseType::BadRequest(..) => {
"HTTP(400)"
}
Expand All @@ -4479,6 +4556,8 @@ impl MessageSequence for StacksHttpMessage {
HttpResponseType::TransactionFeeEstimation(_, _) => {
"HTTP(TransactionFeeEstimation)"
}
HttpResponseType::GetHealthError(..) => "HTTP(512)",
HttpResponseType::GetHealthNoDataError(..) => "HTTP(513)",
},
}
}
Expand Down
Loading