Rewrite Task of listening Stellar Messages #545

b-yap · 2024-08-01T14:48:27Z

Summary

We need to fix the vaults from getting unknowingly stuck, potentially caused by running unhandled zombie tasks.

One idea is to include the polling of the stellar messages:

spacewalk/clients/vault/src/oracle/agent.rs

Lines 100 to 104 in 7c79898

    
           match overlay_conn.listen() { 
        
           	Ok(Some(msg)) => { 
        
           		let msg_as_str = to_base64_xdr_string(&msg); 
        
           		if let Err(e) = 
        
           			handle_message(msg, collector_clone.clone(), &sender_clone).await

in the monitoring:

spacewalk/clients/vault/src/system.rs

Lines 806 to 817 in 7c79898

    
           let tasks = self.create_tasks( 
        
           	startup_height, 
        
           	account_id, 
        
           	is_public_network, 
        
           	vault_public_key, 
        
           	oracle_agent, 
        
           	issue_map, 
        
           	ledger_env_map, 
        
           	memos_to_issue_ids, 
        
           )?; 
        
           run_and_monitor_tasks(self.shutdown.clone(), tasks).await

This means updating the OracleAgent's message_sender is delayed; passing the OracleAgent to tasks must be mutable; hence using Arc<RwLock<>> instead of Arc<> alone.

But we cannot have these current tasks STARTING TOGETHER WITH the polling task. The Stellar-overlay has to run already, and all open requests MUST finish first.

spacewalk/clients/vault/src/system.rs

Lines 787 to 790 in 7c79898

    
           	self.create_oracle_agent(is_public_network, self.shutdown.clone()).await?; 
        
           self.agent = Some(oracle_agent.clone()); 
        
           self.execute_open_requests(oracle_agent.clone());

An idea is to introduce another variant of the ServiceTask, where it waits for something to finish before a task starts. Prechecking will be required.

enum ServiceTask {
	...
	// Runs a task after a prequisite check has passed.
	PrecheckRequired(Task),
}

And to make sure the stellar-overlay and the client are communicating well, stellar-overlay will also send to client the:

hello message - useful to signal the client to prepare itself
lib-related error - this way the client will not wait forever, if stellar-overlay has issues.

How to start reviewing:

`stellar-relay-lib`

Reduce clones in any way possible
- use pass-by-value instead, to avoid cloning references
  - from (&param) to (param)
- reuse variables that were passed by reference
  - made obvious in message_reader.rs file
- clone only the needed field, instead of the whole structure
Return a lib-specific error and hello message
- To signal the user (the agent) if the relay has started; and if the lib itself encountered a problem
Reposition timeouts and yields wisely
Let the user do the polling/looping

`vault`

Removed err-derive dependency, and just use the existing thiserror dependency.
Removed unused errors, like the RpcError
tokio_spawn for naming tasks. This is only viable for tokio_unstable.
Increase the event buffer capacity of the console-subscriber
OracleAgent adjustment
- Instead of Arc<OracleAgent>, it will be Arc<RwLock<OracleAgent>>
  - initializing the field message_sender is moved a bit later in the code; hence we need OracleAgent to be mutable.
Listening for stellar messages will be monitored, alongside other tasks.
- Every minute, a health check will be logged, to ensure messages coming from Overlay
- When receiving hello message, flag the proof building.
  - Continue to log messages, until the first slot was saved. Afterwards, this flag is not needed anymore.
proof_builder.rs
- Build proof ONLY if it's ready (if a slot was saved, meaning the overlay is running)
system.rs
- Added another variant of ServiceTask; the PrecheckRequired
  - Executing open requests is required before running some tasks; once finished, these tasks will start.
    The following tasks need this precheck:
    * issue::listen_for_issue_requests
    * issue::listen_for_issue_cancels
    * issue::listen_for_executed_issues
    * CancellationScheduler for Issue
    * CancellationScheduler for Replace
    * listen_for_replace_requests
    * listen_for_accept_replace
    * active_block_listener

I will add relevant comments when necessary.

b-yap · 2024-08-13T08:45:37Z

clients/service/src/lib.rs

 where
 	F: Future<Output = Result<(), E>>,
 {
+	if let Some(mut precheck_signal) = precheck_signal {
+		if let Err(e) = precheck_signal.recv().await {


start the thread ONLY if precheck is finished.

Should we add a form of timeout here so that it's easier to find tasks that never start because the precheck doesn't resolve?

I forgot to reply to this..
I did not use a timeout, but the try_recv(); if it's not ready, then the loop will continue.
The loop ends until it is a success OR an error happens.

b-yap · 2024-08-13T08:47:06Z

clients/stellar-relay-lib/src/config.rs

@@ -35,13 +35,13 @@ impl StellarOverlayConfig {

 	#[allow(dead_code)]
 	pub(crate) fn node_info(&self) -> NodeInfo {
-		self.node_info.clone().into()
+		NodeInfo::new(&self.node_info)


reduce clones

b-yap · 2024-08-13T08:48:34Z

clients/stellar-relay-lib/src/connection/connector/connector.rs

@@ -340,7 +340,7 @@ mod test {
 			cert: new_auth_cert,
 			nonce: [0; 32],
 		};
-		connector.set_remote(RemoteInfo::new(&hello));
+		connector.set_remote(RemoteInfo::new(hello));


Consume hello

b-yap · 2024-08-13T08:49:47Z

clients/stellar-relay-lib/src/connection/connector/message_reader.rs

-					Ok(None) => continue,
-					Ok(Some(xdr)) => return Ok(xdr),
+					Ok(false) => continue,
+					Ok(true) => return Ok(readbuf),


instead of returning a "cloned" readbuf, reuse it.

b-yap · 2024-08-13T08:50:14Z

clients/stellar-relay-lib/src/connection/connector/message_reader.rs

@@ -180,7 +183,7 @@ async fn read_message(

 	// only when the message has the exact expected size bytes, should we send to user.
 	if actual_msg_len == xpect_msg_len {
-		return Ok(Some(readbuf.clone()))


instead of sending back a cloned value, reuse the readbuf.

b-yap · 2024-08-13T08:51:49Z

clients/stellar-relay-lib/src/node/mod.rs

@@ -19,6 +19,19 @@ pub struct NodeInfo {
 	pub network_id: NetworkId,
 }

+impl NodeInfo {
+	pub(crate) fn new(cfg:&NodeInfoCfg) -> Self {


Eliminate an all out clone of NodeInfoCfg. We only need to clone 1 field, the version_str

b-yap · 2024-08-13T08:52:43Z

clients/stellar-relay-lib/src/node/remote.rs

@@ -36,11 +36,11 @@ impl Debug for RemoteInfo {
 }

 impl RemoteInfo {
-	pub fn new(hello: &Hello) -> Self {
+	pub fn new(hello: Hello) -> Self {


reduce clone by consuming Hello

b-yap · 2024-08-13T08:53:22Z

clients/stellar-relay-lib/src/overlay.rs

@@ -65,8 +73,7 @@ impl StellarOverlayConnection {
 			return Err(Error::Disconnected)
 		}

-		timeout(Duration::from_secs(1), self.receiver.recv()).await
-			.map_err(|_| Error::Timeout)
+		Ok(self.receiver.recv().await)


Remove extra timeout. Timeouts can be triggered from inside the polling.

b-yap · 2024-08-13T08:54:19Z

clients/vault/src/lib.rs

+		F::Output: Send + 'static,
+{
+	cfg_if::cfg_if!{
+		if #[cfg(all(tokio_unstable, feature = "allow-debugger"))] {


allow naming tasks, if allow-debugger feature is on.

Do we need to add this check? Or can we just always use the builder pattern and in case the allow-debugger feature is not enabled, the name just wouldn't show?

the builder only works for tokio unstable, unfortunately.

b-yap · 2024-08-13T08:55:12Z

clients/vault/src/oracle/agent.rs

 	pub is_public_network: bool,
 	/// sends message directly to Stellar Node
 	message_sender: Option<StellarMessageSender>,
 	/// sends an entire Vault shutdown
-	shutdown_sender: ShutdownSender,
-	/// sends a 'stop' signal to `StellarOverlayConnection` poll
-	overlay_conn_end_signal: mpsc::Sender<()>,


We do not need this signal anymore; OverlayConnection already implements Drop trait.

b-yap · 2024-08-13T08:57:43Z

clients/vault/src/oracle/agent.rs

-/// Start the connection to the Stellar Node.
-/// Returns an `OracleAgent` that will handle incoming messages from Stellar Node,
-/// and to send messages to Stellar Node
-pub async fn start_oracle_agent(


Renamed to listen_for_stellar_messages(...) to return nothing. This is an added task to be monitored in the tokio_metrics

b-yap · 2024-08-13T08:59:02Z

clients/vault/src/oracle/agent.rs

-					},
-				}},
+	// log a new message received, every 1 minute.
+	let interval = Duration::from_secs(60);


This is intentional, to keep logging every minute to show whether the vault is stuck or not.

Let's move the 60 to an extra constant variable instead of using a magic number. And let's increase it to something like 10 minutes. Otherwise the logs will get bloated quickly just by this health check.

b-yap · 2024-08-13T09:00:09Z

clients/vault/src/oracle/agent.rs

-	let sender_clone = overlay_conn.sender();
-	tokio::spawn(async move {
-		loop {
-			tokio::select! {


Removed tokio::select, since this task will close based on the run_and_monitor_task().

b-yap · 2024-08-13T09:01:02Z

clients/vault/src/oracle/agent.rs

-	}
-
-	/// Stops listening for new SCP messages.
-	pub async fn shutdown(&self) {


This is unnecessary already; the Drop trait will handle dropping connections

…r<oracle::errors::Error>>` is not satisfied --> clients/vault/src/system.rs:820:4

…893/job/28704943452?pr=545 OracleAgent is now `Arc<RwLock<>>`, since the message_sender is updated late in the code; inside the `listen_for_stellar_messages`

b-yap · 2024-10-08T15:11:20Z

clients/vault/src/oracle/agent.rs

+	Ok(())
+}
+#[cfg(any(test, feature = "integration"))]
+pub async fn start_oracle_agent(


This is only for testing purposes

b-yap · 2024-10-08T15:15:41Z

clients/vault/src/oracle/agent.rs

+						"handle_message(): First slot saved: {slot}. Ready to build proofs "
+					);
+				}
+				*is_proof_building_ready = None;


proof building is ready; we do not need this flag/signal anymore.

b-yap · 2024-10-08T15:16:23Z

clients/vault/src/oracle/collector/proof_builder.rs

@@ -16,16 +16,6 @@ use crate::oracle::{
 	ScpArchiveStorage, ScpMessageCollector, TransactionsArchiveStorage,
 };

-/// Returns true if the SCP messages for a given slot are still recoverable from the overlay
-/// because the slot is not too far back.
-fn check_slot_still_recoverable_from_overlay(last_slot_index: Slot, slot: Slot) -> bool {


Moved this at the bottom of the file

b-yap · 2024-10-08T15:17:20Z

clients/vault/src/oracle/collector/proof_builder.rs

-	///
-	/// * `slot` - the slot where the txset is  to get.
-	/// * `sender` - used to send messages to Stellar Node
-	pub async fn build_proof(&self, slot: Slot, sender: &StellarMessageSender) -> Option<Proof> {


moved this at the top of the file

b-yap · 2024-10-08T15:20:36Z

clients/vault/src/system.rs


 		// check if both the config file and the wallet are the same.
 		if is_public_network != stellar_overlay_cfg.is_public_network() {
 			return Err(ServiceError::IncompatibleNetwork);
 		}

-		let oracle_agent = crate::oracle::start_oracle_agent(


We are now starting the StellarRelay inside the monitored tasks; NOT here/

…. Please upgrade to using Environment Files."

b-yap · 2024-10-09T08:08:30Z

.github/actions/prerequisite-nightly/action.yml

@@ -18,6 +18,8 @@ runs:
        token: ${{ inputs.token }}

    - name: Install Rust Nightly
-      uses: ./.github/actions/install-rust-nightly


This github actions is using a deprecated set-output:

The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files.

Replaced with dtolnay

ebma

This is a great improvement, thanks a lot @b-yap! We need to adjust some of the log levels because some messages are a bit spammy for 'regular' vault operators.

Some experiments

I ran the amplitude USDC #1 vault locally to check the resource consumption with tokio-
console.

tokio-console after ~1.5 h (run #1)

CPU usage after ~1.5h: (84% is quite a lot.., almost a full single core of my system consumed by the vault)

After a while it seemed like tokio-console was not updating properly anymore so I was curious if tokio-console stopped working andI tried restarting it but it is not able to connect again. I tested if the vault is still operational by creating fresh issue and redeem requests against it and indeed, it was able to pick them up and complete them as expected. So maybe it's just some bug with tokio-console that makes it unable to connect after a while.

Unable to connect:

Proof that it's still able to process issue and redeem requests even though tokio-console shows blank:

To check the CPU usage, I then ran the vault a second time

CPU usage after 5 min:

We can see that in the beginning (before having received/processed any issue/redeem request) the % CPU is at ~2%- 3%.

I then ran the spacewalk-testing-service so created first 1 issue request and then 1 redeem request.

This increased the % CPU to ~5%

I kept monitoring this for about 10 minutes and from then on, the % CPU fluctuated between 4%-10%. It's hard to say this for sure but it might suggest that the resource consumption increases after an issue or redeem request is processed but stays more or less constant before.

After 20 minutes it stayed almost always above 10%..
tokio-console after 20 minutes:

I'll try to experiment with this again tomorrow. I still don't understand how the CPU consumption constantly increases over time. But at least we don't have any deadlocked tasks anymore and every task apparently properly yields. Which is a great improvement over what we had before.

ebma · 2024-10-10T15:43:05Z

clients/stellar-relay-lib/src/overlay.rs

+				send_to_user_sender,
+				send_to_node_receiver,
+			))
+			.unwrap();


Nit: let's expect() here

ebma · 2024-10-10T15:59:01Z

clients/vault/src/oracle/collector/proof_builder.rs

 	/// Insert envelopes fetched from the archive to the map
 	///
 	/// # Arguments
 	///
 	/// * `envelopes_map_lock` - the map to insert the envelopes to.
 	/// * `slot` - the slot where the envelopes belong to
 	fn get_envelopes_from_horizon_archive(&self, slot: Slot) -> impl Future<Output = ()> {
-		tracing::debug!("get_envelopes_from_horizon_archive(): Fetching SCP envelopes from horizon archive for slot {slot}...");
+		tracing::info!("get_envelopes_from_horizon_archive(): Fetching SCP envelopes from horizon archive for slot {slot}...");


Suggested change

tracing::info!("get_envelopes_from_horizon_archive(): Fetching SCP envelopes from horizon archive for slot {slot}...");

tracing::debug!("get_envelopes_from_horizon_archive(): Fetching SCP envelopes from horizon archive for slot {slot}...");

ebma · 2024-10-10T16:09:26Z

clients/vault/src/oracle/agent.rs

-					},
-				}},
+	// log a new message received, every 1 minute.
+	let interval = Duration::from_secs(60);


Let's move the 60 to an extra constant variable instead of using a magic number. And let's increase it to something like 10 minutes. Otherwise the logs will get bloated quickly just by this health check.

ebma · 2024-10-10T16:11:55Z

clients/vault/src/oracle/collector/proof_builder.rs

@@ -158,11 +171,12 @@ impl ScpMessageCollector {
 		match tx_set {
 			Some(res) => Some(res),
 			None => {
+				tracing::info!("get_txset(): FOR SLOT {slot} check_slot_still_recoverable_from_overlay: LAST SLOT INDEX: {}",self.last_slot_index());


Suggested change

tracing::info!("get_txset(): FOR SLOT {slot} check_slot_still_recoverable_from_overlay: LAST SLOT INDEX: {}",self.last_slot_index());

tracing::debug!("get_txset(): FOR SLOT {slot} check_slot_still_recoverable_from_overlay: LAST SLOT INDEX: {}",self.last_slot_index());

ebma · 2024-10-10T17:33:13Z

clients/vault/src/oracle/collector/proof_builder.rs


 		return None;
 	}

+	/// fetch envelopes not found in the collector
+	async fn _get_envelopes(&self, slot: Slot, sender: &StellarMessageSender) {
+		tracing::info!("_get_envelopes(): FOR SLOT {slot} check_slot_still_recoverable_from_overlay: LAST SLOT INDEX: {}",self.last_slot_index());


Suggested change

tracing::info!("_get_envelopes(): FOR SLOT {slot} check_slot_still_recoverable_from_overlay: LAST SLOT INDEX: {}",self.last_slot_index());

tracing::debug!("_get_envelopes(): FOR SLOT {slot} check_slot_still_recoverable_from_overlay: LAST SLOT INDEX: {}",self.last_slot_index());

ebma · 2024-10-10T17:34:38Z

clients/vault/src/oracle/agent.rs

@@ -156,19 +206,20 @@ impl OracleAgent {

 		timeout(Duration::from_secs(timeout_seconds), async move {
 			loop {
+				tracing::info!("get_proof(): attempt to build proof for slot {slot}");


Suggested change

tracing::info!("get_proof(): attempt to build proof for slot {slot}");

tracing::debug!("get_proof(): attempt to build proof for slot {slot}");

gianfra-t · 2024-10-10T19:41:19Z

clients/vault/src/oracle/collector/proof_builder.rs

 				} else {
-					tokio::spawn(self.get_txset_from_horizon_archive(slot));
+					self.get_txset_from_horizon_archive(slot).await;


So we are now doing await on this vs spawning in a new thread as before. This means if I understand correctly that build_proof will now wait here and the loop won't continue and call build_proof again. Why don't we need this behavior anymore now?

Before tokio-console: when the vaults hang, the last logs are in build_proof section getting data from the archive.

I wanted to eliminate unnecessary tasks as possible, especially since these are not monitored at all. Should the original tokio::spawn(self.get_txset_from_horizon_archive(slot)); become a zombie, there is nothing to stop that task.

The time spent looping build_proof continuously while the map is empty
VS
The time spent waiting for the archive to fill the map

is just the same.

I see! makes sense, in fact makes much more sense since get_txset_from_horizon_archive will return when the tx_set was found. But before we were potentially actually spawning many instances of get_txset_from_horizon_archive for the same slot, I thought maybe we needed several of these tasks, or that we needed to continue and return a proof with None set.

…738055

ebma

Sorry for leaving more comments yet again. I pointed out more logs that I think we should turn to the debug log level instead.

Besides that I encountered an error when running the vault locally. It happened on start

thread 'tokio-runtime-worker' panicked at clients/runtime/src/rpc.rs:296:49:
called `Result::unwrap()` on an `Err` value: Rpc(ClientError(MaxSlotsExceeded))

It's unrelated to the changes of this PR but @b-yap if it's not too complex, maybe you could try to replace all the unwrap() calls in clients/runtime/src/rpc.rs as part of this as well? 😬

I also want to share good news though. Further experiments I did locally showed that the CPU usage only piles up when running the vault with --features allow-debugger ie. support for debugging with tokio-console. Without that flag, the client's resource consumption remains quite low at ~1-4% when idling. Seems like the refactoring indeed fixed our bugs with tokio and we can be optimistic that these changes greatly improve the performance once live on production (without the debugging enabled).

ebma · 2024-10-11T14:20:01Z

clients/vault/src/oracle/agent.rs

+				tracing::info!(
+					"listen_for_stellar_messages(): received hello message from Stellar"
+				);
+				is_proof_building_ready = Some(true);


I'm not sure I understand why we need this variable? Also, with Option it can now be three things, None, Some(true) and Some(false). As far as I can tell, we only check for Some(true) and never for None so it should be fine if we don't make it an Option type but a simple boolean as we apparently only need the two cases.

ebma · 2024-10-11T14:21:23Z

clients/vault/src/requests/execution.rs

 	payment_margin: Duration,
+	precheck_signal: tokio::sync::broadcast::Sender<()>,


Nit: this is missing in the doc comment above.

ebma · 2024-10-11T14:23:10Z

clients/vault/src/issue.rs

@@ -26,6 +26,7 @@ pub(crate) async fn initialize_issue_set(
 	issue_set: &ArcRwLock<IssueRequestsMap>,
 	memos_to_issue_ids: &ArcRwLock<IssueIdLookup>,
 ) -> Result<(), Error> {
+	tracing::info!("initialize_issue_set(): started");


Nit: make this and the other similar logs debug

ebma · 2024-10-11T14:24:58Z

clients/vault/src/replace.rs

@@ -96,6 +96,8 @@ pub async fn listen_for_replace_requests(
 	event_channel: Sender<Event>,
 	accept_replace_requests: bool,
 ) -> Result<(), ServiceError<Error>> {
+	tracing::info!("listen_for_replace_requests(): started");


Suggested change

tracing::info!("listen_for_replace_requests(): started");

tracing::debug!("listen_for_replace_requests(): started");

ebma · 2024-10-11T14:29:52Z

clients/vault/tests/vault_integration_tests.rs

@@ -1163,17 +1158,23 @@ async fn test_execute_open_requests_succeeds() {
 			// add it to the set
 			sleep(Duration::from_secs(5)).await;

+			let (precheck_signal, mut rceiver) = tokio::sync::broadcast::channel(1);


Suggested change

let (precheck_signal, mut rceiver) = tokio::sync::broadcast::channel(1);

let (precheck_signal, mut receiver) = tokio::sync::broadcast::channel(1);

ebma · 2024-10-11T14:33:31Z

clients/service/src/lib.rs

+				Err(TryRecvError::Empty) =>
+					tracing::trace!("wait_or_shutdown precheck signal: waiting..."),


I think this is problematic because we are again creating tasks that never yield, potentially causing issues for the scheduling of other tasks.

Suggested change

Err(TryRecvError::Empty) =>

tracing::trace!("wait_or_shutdown precheck signal: waiting..."),

Err(TryRecvError::Empty) => {

tracing::trace!("wait_or_shutdown precheck signal: waiting..."),

sleep(Duration::from_millis(100)).await; // Yield to the runtime

}

Or we can use precheck_signal.recv().await? this one is async.

await will not yield; it will potentially wait forever. I've yielded instead of the sleep.

I think if the receiver precheck_signal has already consumed the message then next time calling recv().await, it will yield until a new one arrives. Or do you think there will always be a message in the broadcast channel?

This implementation is equivalent though. But with try_recv we do need to yield that's true.

do you think there will always be a message in the broadcast channel?

It will only send 1 () message. See lines 335-337 in clients/vault/src/requests/execution.rs.

The first try was to use recv().await but the possibility of it waiting forever is there.

However.. we also cannot move forward with the task anyway, if the precheck (or the execute_open_requests) is not finished.
I also would not like to wrap the recv() in a timeout() because open requests might really take longer than the timeout we set. Also, a timeout is another thread; That's 8 tasks requiring prechecks, meaning 8 additional threads. (See Marcel's comment for the image of these tasks: Issue Cancel Listener, Issue Request Listener, etc.)

Readability, reverting to recv().await would be better. I would add that back, but we have to risk it awaiting.
I think execute_open_requests() is reliable enough to be able to send the message fast.

gianfra-t

Super nice (and involved😅) refactor 💪! And I'm glad it solved the hanging and apparently also the high CPU usage issue. Thanks @b-yap for going back to my questions and comments.

ebma

Great that you found a way to get rid of the ArcRwLOck on the oracle agent, improves the readability a lot 🙏

ebma · 2024-10-16T10:42:51Z

clients/runtime/src/rpc.rs

+	async fn get_latest_storage(
+		&self,
+	) -> Storage<SpacewalkRuntime, OnlineClient<SpacewalkRuntime>> {
+		self.api.storage().at_latest().await.expect("failed to get latest storage")


Can we map this to an error instead of expecting or would that require a lot of refactoring?

I think we unfortunately cannot expect this to never fail, so it would be very good if we can handle this error gracefully.

wrapped it inside Error::SubxtRuntimeError()

…855387, https://github.com/pendulum-chain/spacewalk/pull/545/files#r1794072004

…job/31813445329?pr=545

b-yap · 2024-10-21T13:33:10Z

@ebma @gianfra-t CI passed. Merging this.

b-yap commented Aug 13, 2024

View reviewed changes

b-yap added 16 commits September 4, 2024 14:31

add logs; change log level

cc7c9cc

adding started logs for each task

5ea8b71

changing debug logs to info

9a4b1c4

reduce restart interval

8b75f56

revert to debug the other logs

519b7eb

make listen() async; use tokio::select

878470d

first iteration

67e540c

fix the trait bound `service::Error<error::Error>: From<service::Erro…

12779c7

…r<oracle::errors::Error>>` is not satisfied --> clients/vault/src/system.rs:820:4

cleanup clones; add names to threads

806568d

add a precheck signal

b110ad1

reduce timeouts; write log every 2 minutes

fe68519

update config and remove precheck when listening for stellar messages

4d3183b

log every minute

53c88de

cleanup

8c345e1

reduce timeout by moving it out to poll_messages_from_stellar()

2ad0d7f

fix https://github.com/pendulum-chain/spacewalk/actions/runs/10369361…

0cb2927

…893/job/28704943452?pr=545 OracleAgent is now `Arc<RwLock<>>`, since the message_sender is updated late in the code; inside the `listen_for_stellar_messages`

b-yap commented Oct 8, 2024

View reviewed changes

remove unnecessary dependencies

e557b44

b-yap requested a review from a team October 8, 2024 15:24

b-yap marked this pull request as ready for review October 8, 2024 15:24

fix "the set-output command is deprecated and will be disabled soon…

9cad8e4

…. Please upgrade to using Environment Files."

b-yap commented Oct 9, 2024

View reviewed changes

ebma reviewed Oct 10, 2024

View reviewed changes

gianfra-t reviewed Oct 10, 2024

View reviewed changes

https://github.com/pendulum-chain/spacewalk/pull/545#discussion_r1795…

e7f3e26

…738055

ebma reviewed Oct 11, 2024

View reviewed changes

gianfra-t approved these changes Oct 11, 2024

View reviewed changes

b-yap mentioned this pull request Oct 15, 2024

conversion to base64 string pendulum-chain/substrate-stellar-sdk#35

Merged

b-yap added 3 commits October 15, 2024 23:49

address major comments

0aa969e

run cargo fmt

e2b3f02

remove unwrap()s

c88e317

ebma approved these changes Oct 16, 2024

View reviewed changes

b-yap added 4 commits October 16, 2024 19:43

https://github.com/pendulum-chain/spacewalk/pull/545#discussion_r1802…

8c07a3d

…855387, https://github.com/pendulum-chain/spacewalk/pull/545/files#r1794072004

https://github.com/pendulum-chain/spacewalk/actions/runs/11364931112/…

4d40c33

…job/31813445329?pr=545

use latest commit version

636b608

use .expect() since this is only for test feature

ebb96c4

b-yap merged commit ea7ef1d into main Oct 21, 2024
4 checks passed

b-yap deleted the rewrite-agent-vault branch October 21, 2024 13:32

ebma mentioned this pull request Oct 21, 2024

Create new vault release v1.0.12 #560

Closed

b-yap mentioned this pull request Oct 22, 2024

add Pendulum assets to Nova wallet pendulum-chain/nova-utils#2

Closed

ebma mentioned this pull request Oct 23, 2024

Spawn listener tasks non-blocking #562

Closed

	match overlay_conn.listen() {
	Ok(Some(msg)) => {
	let msg_as_str = to_base64_xdr_string(&msg);
	if let Err(e) =
	handle_message(msg, collector_clone.clone(), &sender_clone).await

	let tasks = self.create_tasks(
	startup_height,
	account_id,
	is_public_network,
	vault_public_key,
	oracle_agent,
	issue_map,
	ledger_env_map,
	memos_to_issue_ids,
	)?;

	run_and_monitor_tasks(self.shutdown.clone(), tasks).await

	self.create_oracle_agent(is_public_network, self.shutdown.clone()).await?;
	self.agent = Some(oracle_agent.clone());

	self.execute_open_requests(oracle_agent.clone());

	tracing::info!("get_envelopes_from_horizon_archive(): Fetching SCP envelopes from horizon archive for slot {slot}...");
	tracing::debug!("get_envelopes_from_horizon_archive(): Fetching SCP envelopes from horizon archive for slot {slot}...");

	tracing::info!("get_txset(): FOR SLOT {slot} check_slot_still_recoverable_from_overlay: LAST SLOT INDEX: {}",self.last_slot_index());
	tracing::debug!("get_txset(): FOR SLOT {slot} check_slot_still_recoverable_from_overlay: LAST SLOT INDEX: {}",self.last_slot_index());

	tracing::info!("_get_envelopes(): FOR SLOT {slot} check_slot_still_recoverable_from_overlay: LAST SLOT INDEX: {}",self.last_slot_index());
	tracing::debug!("_get_envelopes(): FOR SLOT {slot} check_slot_still_recoverable_from_overlay: LAST SLOT INDEX: {}",self.last_slot_index());

	tracing::info!("get_proof(): attempt to build proof for slot {slot}");
	tracing::debug!("get_proof(): attempt to build proof for slot {slot}");

		payment_margin: Duration,
		precheck_signal: tokio::sync::broadcast::Sender<()>,

	tracing::info!("listen_for_replace_requests(): started");
	tracing::debug!("listen_for_replace_requests(): started");

	let (precheck_signal, mut rceiver) = tokio::sync::broadcast::channel(1);
	let (precheck_signal, mut receiver) = tokio::sync::broadcast::channel(1);

		Err(TryRecvError::Empty) =>
		tracing::trace!("wait_or_shutdown precheck signal: waiting..."),

Rewrite Task of listening Stellar Messages #545

Rewrite Task of listening Stellar Messages #545

Conversation

b-yap commented Aug 1, 2024 • edited Loading

Summary

How to start reviewing:

stellar-relay-lib

vault

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebma left a comment • edited Loading

Choose a reason for hiding this comment

Some experiments

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b-yap Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gianfra-t left a comment

Choose a reason for hiding this comment

ebma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b-yap commented Oct 21, 2024

b-yap commented Aug 1, 2024 •

edited

Loading

`stellar-relay-lib`

`vault`

ebma left a comment •

edited

Loading

b-yap Oct 11, 2024 •

edited

Loading