Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transaction Stream Service] [CACHE WORKER] Failed to update the latest version in the cache. Version is not right. #648

Open
lukasz-layerzerolabs opened this issue Dec 13, 2024 · 4 comments
Labels
transaction-stream-service Issues relating to the Transaction Stream Service

Comments

@lukasz-layerzerolabs
Copy link

Description

Cache worker failing to sync from PFN (fails after few hours on random version(s)). Not sure if this is a bug, because the version check looks like intentional health check logic, but maybe there is a way to workaround the version check on redis/cache worker to make the cache worker going?

Repro

  1. I have synced aptos ledger using aptos node bootstrap-db --ledger-history-start-version 0 --command-adapter-config gcs.yaml --target-db-dir data/db --metadata-cache-dir meta (aptos cli 4.6.0)
  2. I am running a public archival fullnode successfuly (aptoslabs/validator:aptos-node-v1.24.2). The node is healthy and progressing fine with public peers.
% cat fullnode.yaml
...
storage:
  enable_indexer: true
  storage_pruner_config:
    ledger_pruner_config:
      enable: false

state_sync:
  state_sync_driver:
    bootstrapping_mode: ExecuteOrApplyFromGenesis
    continuous_syncing_mode: ExecuteTransactionsOrApplyOutputs

indexer_grpc:
  enabled: true
  address: 0.0.0.0:50051
  processor_task_count: 20
  processor_batch_size: 1000
  output_batch_size: 1000

indexer_table_info:
  parser_task_count: 20
  parser_batch_size: 1000
  table_info_service_mode:
    IndexingOnly
....
% curl https://.../v1/ 
{
  "chain_id": 1,
  "epoch": "9565",
  "ledger_version": "2042386383",
  "oldest_ledger_version": "0",
  "ledger_timestamp": "1734089606798667",
  "node_role": "full_node",
  "oldest_block_height": "0",
  "block_height": "264269483",
  "git_hash": "5279eaf03dd910ad753a68c134b6d0e6cbce3f11"
}
  1. I am running TSS (redis, file-store, cache-worker, data-service) with default configs. TSS is pulling the data from PFN.
% cat cache-worker-config.yaml
health_check_port: 8082
server_config:
  fullnode_grpc_address: http://aptos.aptos.svc.cluster.local:50051 <<< this points to PFN GRPC from above
  file_store_config:
    file_store_type: LocalFileStore
    local_file_store_path: /opt/aptos/data/file-store
  redis_main_instance_address: redis://localhost:6379
...
       - name: redis
          image: redis:7.2
          command: ["redis-server", "--appendonly", "yes"]
          volumeMounts:
            - name: aptos-indexer-tss-pvc
              mountPath: /opt/aptos
          resources:
            requests:
              cpu: "1"
              memory: "32Gi"
            limits:
              cpu: "8"
              memory: "64Gi"
          ports:
            - containerPort: 6379
              name: redis

        - name: cache-worker
          image: aptoslabs/indexer-grpc:aptos-indexer-grpc-v1.8.0
          command: ["/usr/local/bin/aptos-indexer-grpc-cache-worker", "--config-path", "/opt/aptos/etc/cache-worker-config.yaml"]
          volumeMounts:
            - name: aptos-indexer-tss-pvc
              mountPath: /opt/aptos
          resources:
            requests:
              cpu: "2"
              memory: "16Gi"
            limits:
              cpu: "8"
              memory: "64Gi"
  1. TSS is working fine, for a few hours. But after that cache-worker simply fails. Restarting does not help. Bootstaping TSS from version 0 (removing the LocalFileStore) is the only way to make the cache-worker start again. But it never manages to fully sync.
image: aptoslabs/indexer-grpc:aptos-indexer-grpc-v1.8.0
...
{"timestamp":"2024-12-13T10:29:42.182322Z","level":"INFO","message":"[Indexer Cache] Processed transactions in a batch.","start_version":295447968,"end_version":295447999,"start_txn_timestamp_iso":"2023-10-14T03:13:07.953586000Z","end_txn_timestamp_iso":"2023-10-14T03:13:10.995598000Z","num_transactions":32,"duration_in_secs":0.00305522,"size_in_bytes":292211,"service_type":"cache_worker","step":"3","filename":"ecosystem/indexer-grpc/indexer-grpc-utils/src/counters.rs","line_number":263,"threadName":"tokio-runtime-worker","threadId":"ThreadId(7)"}
{"timestamp":"2024-12-13T10:29:42.219983Z","level":"ERROR","message":"Redis latest version update failed. The version is beyond the next expected version.","version":295448000,"filename":"ecosystem/indexer-grpc/indexer-grpc-utils/src/cache_operator.rs","line_number":361,"threadName":"tokio-runtime-worker","threadId":"ThreadId(8)"}
details = """
panicked at ecosystem/indexer-grpc/indexer-grpc-cache-worker/src/lib.rs:59:14:
Cache worker failed: Failed to run cache worker

Caused by:
    0: Failed to update the latest version in the cache
    1: Version is not right."""
backtrace = """
   0:     0x5bbcd91f7693 - backtrace::backtrace::libunwind::trace::h080984fce8cc0d2e
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/backtrace-0.3.69/src/backtrace/libunwind.rs:93:5
                           backtrace::backtrace::trace_unsynchronized::h5ad767334a9176ec
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/backtrace-0.3.69/src/backtrace/mod.rs:66:5
                           backtrace::backtrace::trace::h80b35d796876c071
{"timestamp":"2024-12-13T10:29:42.590877Z","level":"ERROR","message":"details = \"\"\"\npanicked at ecosystem/indexer-grpc/indexer-grpc-cache-worker/src/lib.rs:59:14:\nCache worker failed: Failed to run cache worker\n\nCaused by:\n    0: Failed to update the latest version in the cache\n    1: Version is not right.\"\"\"\nbacktrace = \"\"\"\n   0:     0x5bbcd91f7693 - backtrace::backtrace::libunwind::trace::h080984fce8cc0d2e\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/backtrace-0.3.69/src/backtrace/libunwind.rs:93:5\n                           backtrace::backtrace::trace_unsynchronized::h5ad767334a9176ec\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/backtrace-0.3.69/src/backtrace/mod.rs:66:5\n                           backtrace::backtrace::trace::h80b35d796876c071\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/backtrace-0.3.69/src/backtrace/mod.rs:53:14\n                           backtrace::capture::Backtrace::create::h432de5c9b6d787e4\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/backtrace-0.3.69/src/capture.rs:176:9\n                           backtrace::capture::Backtrace::new::h145127d7ab18bed0\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/backtrace-0.3.69/src/capture.rs:140:22\n   1:     0x5bbcd905f31f - aptos_indexer_grpc_server_framework::handle_panic::h7ee4abe32678cbcf\n                               at /aptos/ecosystem/indexer-grpc/indexer-grpc-server-framework/src/lib.rs:147:38\n                           aptos_indexer_grpc_server_framework::setup_panic_handler::{{closure}}::hc52788d2d6fa2b71\n                               at /aptos/ecosystem/indexer-grpc/indexer-grpc-server-framework/src/lib.rs:139:9\n   2:     0x5bbcd989d1e0 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::h022ca2c0d8c21c9e\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/boxed.rs:2034:9\n                           std::panicking::rust_panic_with_hook::h0ad14d90dcf5224f\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:783:13\n   3:     0x5bbcd989cf22 - std::panicking::begin_panic_handler::{{closure}}::h4a1838a06f542647\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:657:13\n   4:     0x5bbcd989bbb6 - std::sys_common::backtrace::__rust_end_short_backtrace::h77cc4dc3567ca904\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:171:18\n   5:     0x5bbcd989cc54 - rust_begin_unwind\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5\n   6:     0x5bbcd98c4375 - core::panicking::panic_fmt::h940d4fd01a4b4fd1\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:72:14\n   7:     0x5bbcd98c48d3 - core::result::unwrap_failed::h5119205a73b72b0d\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/result.rs:1654:5\n   8:     0x5bbcd8f85a2a - core::result::Result<T,E>::expect::h8f3dcbe5b9a7fe0c\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/result.rs:1034:23\n                           <aptos_indexer_grpc_cache_worker::IndexerGrpcCacheWorkerConfig as aptos_indexer_grpc_server_framework::RunnableConfig>::run::{{closure}}::h2a723299215dd164\n                               at /aptos/ecosystem/indexer-grpc/indexer-grpc-cache-worker/src/lib.rs:55:9\n   9:     0x5bbcd8e9e9ec - <core::pin::Pin<P> as core::future::future::Future>::poll::heb20ac6bd53f7719\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9\n                           <aptos_indexer_grpc_server_framework::GenericConfig<T> as aptos_indexer_grpc_server_framework::RunnableConfig>::run::{{closure}}::hc503538106878c6f\n                               at /aptos/ecosystem/indexer-grpc/indexer-grpc-server-framework/src/lib.rs:93:34\n  10:     0x5bbcd8e22bdd - <core::pin::Pin<P> as core::future::future::Future>::poll::heb20ac6bd53f7719\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9\n                           aptos_indexer_grpc_server_framework::run_server_with_config::{{closure}}::{{closure}}::h93206b5da4dce421\n                               at /aptos/ecosystem/indexer-grpc/indexer-grpc-server-framework/src/lib.rs:53:48\n  11:     0x5bbcd8e30ab3 - tokio::runtime::task::core::Core<T,S>::poll::{{closure}}::hd6f1182b016c639a\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/core.rs:328:17\n                           tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::h6f475a1a078de812\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/loom/std/unsafe_cell.rs:16:9\n                           tokio::runtime::task::core::Core<T,S>::poll::h45ffe6a80449524c\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/core.rs:317:30\n  12:     0x5bbcd8e5af69 - tokio::runtime::task::harness::poll_future::{{closure}}::h6abb721a3381fd7e\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:485:19\n                           <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h69db49d14914124c\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panic/unwind_safe.rs:272:9\n                           std::panicking::try::do_call::h50df100727ed1f92\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40\n                           std::panicking::try::h845b3877ead1dc5a\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19\n                           std::panic::catch_unwind::h778a0a10320547e4\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14\n                           tokio::runtime::task::harness::poll_future::hf8bfe606da5a32d6\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:473:18\n                           tokio::runtime::task::harness::Harness<T,S>::poll_inner::h761f12616944196d\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:208:27\n                           tokio::runtime::task::harness::Harness<T,S>::poll::h03590ec51310230a\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:153:15\n  13:     0x5bbcd9825d1b - tokio::runtime::task::raw::RawTask::poll::h61b671be96e13913\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/raw.rs:201:18\n                           tokio::runtime::task::LocalNotified<S>::run::h6c76a5caa54aacd6\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/mod.rs:416:9\n                           tokio::runtime::scheduler::multi_thread::worker::Context::run_task::{{closure}}::hcba3fb3752c458ad\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:639:22\n                           tokio::runtime::coop::with_budget::h752987771c6b1853\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:107:5\n                           tokio::runtime::coop::budget::h0f69edb4cf651a67\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:73:5\n                           tokio::runtime::scheduler::multi_thread::worker::Context::run_task::h02d504edf66d84fb\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:575:9\n  14:     0x5bbcd9824a88 - tokio::runtime::scheduler::multi_thread::worker::Context::run::h4eadb18db878e49d\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:526:24\n  15:     0x5bbcd983fb20 - tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}::{{closure}}::hce853457818b4267\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:491:21\n                           tokio::runtime::context::scoped::Scoped<T>::set::h328c0ae31975e5c5\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/scoped.rs:40:9\n                           tokio::runtime::context::set_scheduler::{{closure}}::h26e4d1e6e66bcf76\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context.rs:176:26\n                           std::thread::local::LocalKey<T>::try_with::h99608f6172581fad\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/local.rs:284:16\n                           std::thread::local::LocalKey<T>::with::h23bd83e75ddbfbc5\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/local.rs:260:9\n                           tokio::runtime::context::set_scheduler::h185332b7187d8a41\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context.rs:176:17\n  16:     0x5bbcd9823c7c - tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}::h6f4b78a276814a86\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:486:9\n                           tokio::runtime::context::runtime::enter_runtime::hb09e071c0f3d953d\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/runtime.rs:65:16\n                           tokio::runtime::scheduler::multi_thread::worker::run::hd47af3be6616c354\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:478:5\n  17:     0x5bbcd9849f5e - tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{{closure}}::h019e4a3149525f1d\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:447:45\n                           <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll::ha3a7560cc3097e07\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/blocking/task.rs:42:21\n                           tokio::runtime::task::core::Core<T,S>::poll::{{closure}}::he999b87d441a6e1d\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/core.rs:328:17\n                           tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::h7ab610d91dd6487b\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/loom/std/unsafe_cell.rs:16:9\n                           tokio::runtime::task::core::Core<T,S>::poll::h2dbbe55f7bea582b\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/core.rs:317:30\n  18:     0x5bbcd9833f7a - tokio::runtime::task::harness::poll_future::{{closure}}::he9686abc03b96a22\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:485:19\n                           <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h280e02f69e5a25f4\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panic/unwind_safe.rs:272:9\n                           std::panicking::try::do_call::h3b93d4794ef7c2fe\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40\n                           std::panicking::try::hca3dec4bc3fe586c\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19\n                           std::panic::catch_unwind::hf175a98cf782f808\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14\n                           tokio::runtime::task::harness::poll_future::h521386cd6ec7e84d\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:473:18\n                           tokio::runtime::task::harness::Harness<T,S>::poll_inner::hb4185d6e7a64cddf\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:208:27\n                           tokio::runtime::task::harness::Harness<T,S>::poll::h4e5279681e98c5e6\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:153:15\n  19:     0x5bbcd981cc74 - tokio::runtime::task::raw::RawTask::poll::h61b671be96e13913\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/raw.rs:201:18\n                           tokio::runtime::task::UnownedTask<S>::run::h6b992ab0e86c457a\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/mod.rs:453:9\n                           tokio::runtime::blocking::pool::Task::run::h125bc561c2fc2d42\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/blocking/pool.rs:159:9\n                           tokio::runtime::blocking::pool::Inner::run::hf05559cfffc92945\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/blocking/pool.rs:513:17\n  20:     0x5bbcd9830441 - tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}::haab53917730dc78c\n                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/blocking/pool.rs:471:13\n                           std::sys_common::backtrace::__rust_begin_short_backtrace::hf9eb5b19d7c272bf\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18\n  21:     0x5bbcd9848d3b - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::h2862ba12ac86c12f\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/mod.rs:528:17\n                           <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h5b5b8af3dacdea45\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panic/unwind_safe.rs:272:9\n                           std::panicking::try::do_call::he53ec96826dfc193\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40\n                           std::panicking::try::h6bac339f9b4301bd\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19\n                           std::panic::catch_unwind::hc949eabd8c762b67\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14\n                           std::thread::Builder::spawn_unchecked_::{{closure}}::haa37dcebb97593ea\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/mod.rs:527:30\n                           core::ops::function::FnOnce::call_once{{vtable.shim}}::hbbc2c0cafbe4bfd7\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5\n  22:     0x5bbcd989f8b5 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h19b9e642d37e7272\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/boxed.rs:2020:9\n                           <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h97265befc434d3ae\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/boxed.rs:2020:9\n                           std::sys::pal::unix::thread::Thread::new::thread_start::h420dad5cf01a9f35\n                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys/pal/unix/thread.rs:108:17\n  23:     0x7aa080371ea7 - start_thread\n  24:     0x7aa080145a6f - clone\n  25:                0x0 - <unknown>\n\"\"\"\n","filename":"ecosystem/indexer-grpc/indexer-grpc-server-framework/src/lib.rs","line_number":150,"threadName":"tokio-runtime-worker","threadId":"ThreadId(8)"}
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/backtrace-0.3.69/src/backtrace/mod.rs:53:14
                           backtrace::capture::Backtrace::create::h432de5c9b6d787e4
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/backtrace-0.3.69/src/capture.rs:176:9
                           backtrace::capture::Backtrace::new::h145127d7ab18bed0
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/backtrace-0.3.69/src/capture.rs:140:22
   1:     0x5bbcd905f31f - aptos_indexer_grpc_server_framework::handle_panic::h7ee4abe32678cbcf
                               at /aptos/ecosystem/indexer-grpc/indexer-grpc-server-framework/src/lib.rs:147:38
                           aptos_indexer_grpc_server_framework::setup_panic_handler::{{closure}}::hc52788d2d6fa2b71
                               at /aptos/ecosystem/indexer-grpc/indexer-grpc-server-framework/src/lib.rs:139:9
   2:     0x5bbcd989d1e0 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::h022ca2c0d8c21c9e
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/boxed.rs:2034:9
                           std::panicking::rust_panic_with_hook::h0ad14d90dcf5224f
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:783:13
   3:     0x5bbcd989cf22 - std::panicking::begin_panic_handler::{{closure}}::h4a1838a06f542647
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:657:13
   4:     0x5bbcd989bbb6 - std::sys_common::backtrace::__rust_end_short_backtrace::h77cc4dc3567ca904
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:171:18
   5:     0x5bbcd989cc54 - rust_begin_unwind
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5
   6:     0x5bbcd98c4375 - core::panicking::panic_fmt::h940d4fd01a4b4fd1
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:72:14
   7:     0x5bbcd98c48d3 - core::result::unwrap_failed::h5119205a73b72b0d
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/result.rs:1654:5
   8:     0x5bbcd8f85a2a - core::result::Result<T,E>::expect::h8f3dcbe5b9a7fe0c
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/result.rs:1034:23
                           <aptos_indexer_grpc_cache_worker::IndexerGrpcCacheWorkerConfig as aptos_indexer_grpc_server_framework::RunnableConfig>::run::{{closure}}::h2a723299215dd164
                               at /aptos/ecosystem/indexer-grpc/indexer-grpc-cache-worker/src/lib.rs:55:9
   9:     0x5bbcd8e9e9ec - <core::pin::Pin<P> as core::future::future::Future>::poll::heb20ac6bd53f7719
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
                           <aptos_indexer_grpc_server_framework::GenericConfig<T> as aptos_indexer_grpc_server_framework::RunnableConfig>::run::{{closure}}::hc503538106878c6f
                               at /aptos/ecosystem/indexer-grpc/indexer-grpc-server-framework/src/lib.rs:93:34
  10:     0x5bbcd8e22bdd - <core::pin::Pin<P> as core::future::future::Future>::poll::heb20ac6bd53f7719
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
                           aptos_indexer_grpc_server_framework::run_server_with_config::{{closure}}::{{closure}}::h93206b5da4dce421
                               at /aptos/ecosystem/indexer-grpc/indexer-grpc-server-framework/src/lib.rs:53:48
  11:     0x5bbcd8e30ab3 - tokio::runtime::task::core::Core<T,S>::poll::{{closure}}::hd6f1182b016c639a
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/core.rs:328:17
                           tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::h6f475a1a078de812
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/loom/std/unsafe_cell.rs:16:9
                           tokio::runtime::task::core::Core<T,S>::poll::h45ffe6a80449524c
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/core.rs:317:30
  12:     0x5bbcd8e5af69 - tokio::runtime::task::harness::poll_future::{{closure}}::h6abb721a3381fd7e
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:485:19
                           <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h69db49d14914124c
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panic/unwind_safe.rs:272:9
                           std::panicking::try::do_call::h50df100727ed1f92
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
                           std::panicking::try::h845b3877ead1dc5a
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
                           std::panic::catch_unwind::h778a0a10320547e4
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
                           tokio::runtime::task::harness::poll_future::hf8bfe606da5a32d6
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:473:18
                           tokio::runtime::task::harness::Harness<T,S>::poll_inner::h761f12616944196d
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:208:27
                           tokio::runtime::task::harness::Harness<T,S>::poll::h03590ec51310230a
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:153:15
  13:     0x5bbcd9825d1b - tokio::runtime::task::raw::RawTask::poll::h61b671be96e13913
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/raw.rs:201:18
                           tokio::runtime::task::LocalNotified<S>::run::h6c76a5caa54aacd6
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/mod.rs:416:9
                           tokio::runtime::scheduler::multi_thread::worker::Context::run_task::{{closure}}::hcba3fb3752c458ad
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:639:22
                           tokio::runtime::coop::with_budget::h752987771c6b1853
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:107:5
                           tokio::runtime::coop::budget::h0f69edb4cf651a67
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:73:5
                           tokio::runtime::scheduler::multi_thread::worker::Context::run_task::h02d504edf66d84fb
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:575:9
  14:     0x5bbcd9824a88 - tokio::runtime::scheduler::multi_thread::worker::Context::run::h4eadb18db878e49d
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:526:24
  15:     0x5bbcd983fb20 - tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}::{{closure}}::hce853457818b4267
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:491:21
                           tokio::runtime::context::scoped::Scoped<T>::set::h328c0ae31975e5c5
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/scoped.rs:40:9
                           tokio::runtime::context::set_scheduler::{{closure}}::h26e4d1e6e66bcf76
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context.rs:176:26
                           std::thread::local::LocalKey<T>::try_with::h99608f6172581fad
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/local.rs:284:16
                           std::thread::local::LocalKey<T>::with::h23bd83e75ddbfbc5
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/local.rs:260:9
                           tokio::runtime::context::set_scheduler::h185332b7187d8a41
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context.rs:176:17
  16:     0x5bbcd9823c7c - tokio::runtime::scheduler::multi_thread::worker::run::{{closure}}::h6f4b78a276814a86
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:486:9
                           tokio::runtime::context::runtime::enter_runtime::hb09e071c0f3d953d
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/runtime.rs:65:16
                           tokio::runtime::scheduler::multi_thread::worker::run::hd47af3be6616c354
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:478:5
  17:     0x5bbcd9849f5e - tokio::runtime::scheduler::multi_thread::worker::Launch::launch::{{closure}}::h019e4a3149525f1d
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/worker.rs:447:45
                           <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll::ha3a7560cc3097e07
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/blocking/task.rs:42:21
                           tokio::runtime::task::core::Core<T,S>::poll::{{closure}}::he999b87d441a6e1d
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/core.rs:328:17
                           tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::h7ab610d91dd6487b
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/loom/std/unsafe_cell.rs:16:9
                           tokio::runtime::task::core::Core<T,S>::poll::h2dbbe55f7bea582b
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/core.rs:317:30
  18:     0x5bbcd9833f7a - tokio::runtime::task::harness::poll_future::{{closure}}::he9686abc03b96a22
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:485:19
                           <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h280e02f69e5a25f4
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panic/unwind_safe.rs:272:9
                           std::panicking::try::do_call::h3b93d4794ef7c2fe
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
                           std::panicking::try::hca3dec4bc3fe586c
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
                           std::panic::catch_unwind::hf175a98cf782f808
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
                           tokio::runtime::task::harness::poll_future::h521386cd6ec7e84d
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:473:18
                           tokio::runtime::task::harness::Harness<T,S>::poll_inner::hb4185d6e7a64cddf
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:208:27
                           tokio::runtime::task::harness::Harness<T,S>::poll::h4e5279681e98c5e6
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/harness.rs:153:15
  19:     0x5bbcd981cc74 - tokio::runtime::task::raw::RawTask::poll::h61b671be96e13913
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/raw.rs:201:18
                           tokio::runtime::task::UnownedTask<S>::run::h6b992ab0e86c457a
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/task/mod.rs:453:9
                           tokio::runtime::blocking::pool::Task::run::h125bc561c2fc2d42
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/blocking/pool.rs:159:9
                           tokio::runtime::blocking::pool::Inner::run::hf05559cfffc92945
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/blocking/pool.rs:513:17
  20:     0x5bbcd9830441 - tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}::haab53917730dc78c
                               at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/blocking/pool.rs:471:13
                           std::sys_common::backtrace::__rust_begin_short_backtrace::hf9eb5b19d7c272bf
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
  21:     0x5bbcd9848d3b - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::h2862ba12ac86c12f
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/mod.rs:528:17
                           <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h5b5b8af3dacdea45
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panic/unwind_safe.rs:272:9
                           std::panicking::try::do_call::he53ec96826dfc193
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
                           std::panicking::try::h6bac339f9b4301bd
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
                           std::panic::catch_unwind::hc949eabd8c762b67
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
                           std::thread::Builder::spawn_unchecked_::{{closure}}::haa37dcebb97593ea
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/thread/mod.rs:527:30
                           core::ops::function::FnOnce::call_once{{vtable.shim}}::hbbc2c0cafbe4bfd7
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
  22:     0x5bbcd989f8b5 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h19b9e642d37e7272
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/boxed.rs:2020:9
                           <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h97265befc434d3ae
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/boxed.rs:2020:9
                           std::sys::pal::unix::thread::Thread::new::thread_start::h420dad5cf01a9f35
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys/pal/unix/thread.rs:108:17
  23:     0x7aa080371ea7 - start_thread
  24:     0x7aa080145a6f - clone
  25:                0x0 - <unknown>
"""
@lukasz-layerzerolabs lukasz-layerzerolabs added the transaction-stream-service Issues relating to the Transaction Stream Service label Dec 13, 2024
@grao1991
Copy link
Contributor

When you see this error, can you check if the "latest_version" is still in redis?

@lukasz-layerzerolabs
Copy link
Author

This is what I see

"Redis latest version update failed. The version is beyond the next expected version.","version":295448000

root@aptos-indexer-tss-statefulset-0:/data# redis-cli keys "*"
    1) "295436505"
    2) "295436030"
    3) "295440616"
    4) "295431680"
 ...
 6072) "295444080"
 6073) "latest_version"
 6074) "295445006"
 ...
18781) "295439814"
18782) "file_store_latest_version"
18783) "295442174"
 ...
20001) "295430489"
20002) "295433056"
20003) "295433772"

root@aptos-indexer-tss-statefulset-0:/data# redis-cli get "latest_version"
"0"
root@aptos-indexer-tss-statefulset-0:/data# redis-cli get "file_store_latest_version"
"295428000"
root@aptos-indexer-tss-statefulset-0:/data# redis-cli get "295448000"
(nil)
root@aptos-indexer-tss-statefulset-0:/data# redis-cli get "295433772" ## checking some random key if the value is there
"Cgw...IIQ=="

@grao1991
Copy link
Contributor

This is what I see

"Redis latest version update failed. The version is beyond the next expected version.","version":295448000

root@aptos-indexer-tss-statefulset-0:/data# redis-cli keys "*"
    1) "295436505"
    2) "295436030"
    3) "295440616"
    4) "295431680"
 ...
 6072) "295444080"
 6073) "latest_version"
 6074) "295445006"
 ...
18781) "295439814"
18782) "file_store_latest_version"
18783) "295442174"
 ...
20001) "295430489"
20002) "295433056"
20003) "295433772"

root@aptos-indexer-tss-statefulset-0:/data# redis-cli get "latest_version"
"0"
root@aptos-indexer-tss-statefulset-0:/data# redis-cli get "file_store_latest_version"
"295428000"
root@aptos-indexer-tss-statefulset-0:/data# redis-cli get "295448000"
(nil)
root@aptos-indexer-tss-statefulset-0:/data# redis-cli get "295433772" ## checking some random key if the value is there
"Cgw...IIQ=="

This aligns with my guess. For some reason, the "latest_version" got evicted from redis, which will cause the error. You can manually set the "latest_version" back to recover.

@lukasz-layerzerolabs
Copy link
Author

Yes, setting back the latest_version helps. But only for a while, until redis decides to evict it again.

root@aptos-indexer-tss-statefulset-0:/data# redis-cli get "file_store_latest_version"
"295428000"
root@aptos-indexer-tss-statefulset-0:/data# redis-cli get "latest_version"
"0"
root@aptos-indexer-tss-statefulset-0:/data# redis-cli set "latest_version" "295428000"
OK
root@aptos-indexer-tss-statefulset-0:/data# redis-cli get "latest_version"
"295428000"

...

{"timestamp":"2024-12-18T10:47:27.385942Z","level":"ERROR","message":"Redis latest version update failed. The version is beyond the next expected version.","version":301309000,"filename":"ecosystem/indexer-grpc/indexer-grpc-utils/src/cache_operator.rs","line_number":361,"threadName":"tokio-runtime-worker","threadId":"ThreadId(2)"}
details = """
panicked at ecosystem/indexer-grpc/indexer-grpc-cache-worker/src/lib.rs:59:14:
Cache worker failed: Failed to run cache worker

Caused by:
    0: Failed to update the latest version in the cache
    1: Version is not right."""
backtrace = """

I have doubled the memory (32G->64G) and I am testing if changing the eviction policy makes the latest_version key stable.

root@aptos-indexer-tss-statefulset-0:/data# redis-cli CONFIG GET maxmemory-policy
1) "maxmemory-policy"
2) "noeviction"
root@aptos-indexer-tss-statefulset-0:/data# redis-cli CONFIG SET maxmemory-policy allkeys-lfu
OK
root@aptos-indexer-tss-statefulset-0:/data# redis-cli CONFIG REWRITE
...
root@aptos-indexer-tss-statefulset-0:/data# redis-cli CONFIG GET maxmemory-policy
1) "maxmemory-policy"
2) "allkeys-lfu"

Additionally, I see that there is some logic of active cache eviction here. The threshold is set to 300k keys. I was able to hold 20k with 32GB of ram assigned to redis. Sadly this is const CACHE_SIZE_EVICTION_LOWER_BOUND but if this would be parametrizable, this would allow to tune resource assignment vs active eviction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
transaction-stream-service Issues relating to the Transaction Stream Service
Projects
None yet
Development

No branches or pull requests

3 participants
@grao1991 @lukasz-layerzerolabs and others