Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repeated failures with Reolink cameras #144

Closed
jlpoolen opened this issue Aug 14, 2021 · 13 comments
Closed

repeated failures with Reolink cameras #144

jlpoolen opened this issue Aug 14, 2021 · 13 comments

Comments

@jlpoolen
Copy link
Contributor

jlpoolen commented Aug 14, 2021

Describe the bug
6 Reolink cameras, 3 models, system on Docker image, within a 2 minute session: 2824 "panics" in the log. [edit by scottlamb: the logs refer to panic handler stack frames but the system didn't actually panic.] Recordings are subject to gaps in time, some for 9 minutes of more. The system looks to be working well, but I started to notice some major gaps, e.g. several minutes in the recording sequences.

Running moonfire-nvr in docker. Can start and stop and through web interface all seems well, works for the most part. However there are gaps in recordings and the log shows panics.

To Reproduce
Steps to reproduce the behavior:

See below "Consoles"
See log attached and end of this entry

Note: nvr2 differs by having additional line for LVM mount

Expected behavior
no "panics" and no gaps of several minutes in recordings

Bug policy https://github.com/scottlamb/moonfire-nvr/blob/master/guide/troubleshooting.md#panic-errors states:
Errors like the one below indicate a serious bug in Moonfire NVR. Please file a bug if you see one.

(Should the word "panic" be the trigger to file a bug, or when "panic" occurs from writer.rs?)

Screenshots
Here is a screenshot showing a raspberry pi console streaming "date" results overlaying a live stream
via the Moonfire-nvr Live mode displaying the camera's white date time stamp.
2021-08-14_09-47-35

Server (please complete the following information):
Raspberry Pi 4 8GB Complete Starter Kit - 32GB Storage (purchased May/June 2020)
Western Digital 12TB WD Purple Surveillance Internal Hard Drive HDD - 7200 RPM, SATA 6 Gb/s, 256 MB Cache, 3.5" - WD121PURZ
[Independent Control Switch] SISUN Aluminum USB 3.0 5 Bay 3.5 inch SATA Hard Drive Enclosure Support 5 x 12TB Drive + 2X USB3.0 HUB (USB 3.0/5 Bay Hard Drive Enclosure-Black)

Rebuilt (on same SD HC) August, 2021; crashed Winter of 2021 - defective SD HC??

 jlpoole@raspberrypi:~ $ uname -a
 Linux raspberrypi 5.10.52-v7l+ #1441 SMP Tue Aug 3 18:11:56 BST 2021 armv7l GNU/Linux
 jlpoole@raspberrypi:~ $
  • If using Docker: docker ps + docker images
    jlpoole@raspberrypi:/usr/local/src/moonfire-nvr $ sudo docker ps
    [sudo] password for jlpoole:
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    a76d874a0eae scottlamb/moonfire-nvr:latest "/usr/local/bin/moon…" 38 hours ago Up 37 minutes moonfire-nvr

    jlpoole@raspberrypi:/usr/local/src/moonfire-nvr $ sudo docker images
    REPOSITORY TAG IMAGE ID CREATED SIZE
    hello-world latest 1a30b4397839 5 weeks ago 4.85kB
    scottlamb/moonfire-nvr latest 9732cf869a48 6 weeks ago 582MB
    jlpoole@raspberrypi:/usr/local/src/moonfire-nvr $

  • If building from git: git describe --dirty + moonfire-nvr --version
    jlpoole@raspberrypi:/usr/local/src/moonfire-nvr $ git describe --dirty
    v0.6.4-11-gdad9bdc
    jlpoole@raspberrypi:/usr/local/src/moonfire-nvr $

  • Attach a log file. Run with the RUST_BACKTRACE=1 environment variable set if possible.

See Log at end of this entry.

I'm not sure I needed to set RUST_BACKTRACE when launching from Docker as it looks like the Docker
log has backtraces. Nevertheless, I did as requested:

CONSOLE 1:
jlpoole@raspberrypi:~ $ echo $RUST_BACKTRACE
1
jlpoole@raspberrypi:~ $ date; sudo nvr2 start
Sat 14 Aug 2021 11:00:56 AM PDT
moonfire-nvr
jlpoole@raspberrypi:~ $ date; sudo nvr2 stop
Sat 14 Aug 2021 11:01:39 AM PDT
moonfire-nvr
jlpoole@raspberrypi:~ $

CONSOLE 2:
jlpoole@raspberrypi:/usr/local/src/moonfire-nvr $ date;sudo docker logs --follow --timestamps --since=10s moonfire-nvr >/tmp/session.log 2>&1; date;
Sat 14 Aug 2021 11:00:58 AM PDT
Sat 14 Aug 2021 11:01:41 AM PDT
jlpoole@raspberrypi:/usr/local/src/moonfire-nvr $

Camera (please complete the following information):
I have six cameras, 2 of three different models:

Device Name: GarageWest
Model: Reolink RLC-420
Build No.: build 17112700
Hardware No.: IPC_3816M
Configuration Version: v2.0.0.0
Firmware Version: v2.0.0.889_17112700
Details: IPC_3816M110000000000000
Client Version: v1.0.227
2)
Device Name: GarageEast
Model: RLC-420
Build No.: build 17112700
Hardware No.: IPC_3816M
Configuration Version: v2.0.0.0
Firmware Version: v2.0.0.889_17112700
Details: IPC_3816M110000000000000
Client Version: v1.0.227

Device Name: Peck Alley West
Model: Reolink RLC-420-5MP
Build No.: build 19013001
Hardware No.: IPC_51516M5M
Configuration Version: v2.0.0.0
Firmware Version: v2.0.0.354_19013001
Details: IPC_51516M5M110000000100000
Client Version: v1.0.239

Device Name: Peck Alley East
Model: Reolink RLC-420-5MP
Build No.: build 19013001
Hardware No.: IPC_51516M5M
Configuration Version: v2.0.0.0
Firmware Version: v2.0.0.354_19013001
Details: IPC_51516M5M110000000100000
Client Version: v1.0.239

Camera Name: PeckPear
Model: Reolink RLC-410-5MP
UID: 95270002JS5EWXMZ
Build No.build: 20121100
Hardware No.: IPC_515B16M5M
Config Version: v3.0.0.0
Firmware Version: v3.0.0.136_20121100
Details: IPC_515B16M5MS10E1W01100000001

Camera Name: Maple1
Model: Reolink RLC-410-5MP
UID: 95270002JS5IEVPW
Build No.: build 20121100
Hardware No.: IPC_515B16M5M
Config Version: v3.0.0.0
Firmware Version: v3.0.0.136_20121100
Details: IPC_515B16M5MS10E1W01100000001

Desktop (please complete the following information):
NA

Smartphone (please complete the following information):
NA

Additional context
I've been seeing cameras go offline for several minutes, e.g. 9 minutes or more

Below may be helpful if time synchronization is involved:

I have all the reolink cameras using the time sync feature pointing to time-b.nist.gov.
2021-08-14_10-04-35

Log File (numbered columns added)
session_numbered.zip

For others using Docker: I accessed the log using this command:

 sudo docker logs --follow --timestamps --since=1s moonfire-nvr

You want to use the "--since=[# of seconds back]s", otherwise you may get a dump of a very long log. Consider the "--since" flag as a "tail" type truncator.

@scottlamb
Copy link
Owner

Errors like this one?

W20210814 11:00:57.434 s-GarageWest-main moonfire_nvr::streamer] GarageWest-main: sleeping for Duration { secs: 1, nanos: 0 } after error: Expected ssrc=Some(21cd90d7) seq=Some(8f4a) got ssrc=6d87df1a seq=88b9 ts=1866533263 (mod-2^32: 1866533263), npt 0.000 at [192.168.1.88:33514(me)->192.168.1.48:554@2021-08-14T11:00:57 pos=1510@2021-08-14T11:00:57]
Backtrace:
   0: backtrace::backtrace::libunwind::trace
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/backtrace-0.3.59/src/backtrace/libunwind.rs:90:5
      backtrace::backtrace::trace_unsynchronized
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/backtrace-0.3.59/src/backtrace/mod.rs:66:5
      backtrace::backtrace::trace
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/backtrace-0.3.59/src/backtrace/mod.rs:53:14
   1: backtrace::capture::Backtrace::create
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/backtrace-0.3.59/src/capture.rs:176:9
      backtrace::capture::Backtrace::new_unresolved
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/backtrace-0.3.59/src/capture.rs:170:9
   2: failure::backtrace::internal::InternalBacktrace::new
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/failure-0.1.8/src/backtrace/internal.rs:46:44
   3: failure::backtrace::Backtrace::new
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/failure-0.1.8/src/backtrace/mod.rs:121:35
      <failure::backtrace::Backtrace as core::default::Default>::default
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/failure-0.1.8/src/backtrace/mod.rs:140:13
   4: <failure::error::error_impl::ErrorImpl as core::convert::From<F>>::from
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/failure-0.1.8/src/error/error_impl.rs:19:17
      <failure::error::Error as core::convert::From<F>>::from
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/failure-0.1.8/src/error/mod.rs:36:18
      failure::error_message::err_msg
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/failure-0.1.8/src/error_message.rs:12:5
      retina::client::rtp::StrictSequenceChecker::rtp
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/retina-0.0.4/src/client/rtp.rs:128:13
   5: retina::client::Session<retina::client::Playing>::handle_data
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/retina-0.0.4/src/client/mod.rs:728:41
      <retina::client::Session<retina::client::Playing> as futures_core::stream::Stream>::poll_next
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/retina-0.0.4/src/client/mod.rs:760:31
   6: <retina::client::Demuxed as futures_core::stream::Stream>::poll_next
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/retina-0.0.4/src/client/mod.rs:824:55
   7: <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/futures-core-0.3.15/src/stream.rs:120:9
      futures_util::stream::stream::StreamExt::poll_next_unpin
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/futures-util-0.3.15/src/stream/stream/mod.rs:1416:9
      <futures_util::stream::stream::next::Next<St> as core::future::future::Future>::poll
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/futures-util-0.3.15/src/stream/stream/next.rs:32:9
      moonfire_nvr::stream::RetinaOpener::play::{{closure}}
             at src/server/src/stream.rs:394:51
   8: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/future/mod.rs:80:19
      <tokio::time::timeout::Timeout<T> as core::future::future::Future>::poll
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/time/timeout.rs:149:33
      <moonfire_nvr::stream::RetinaOpener as moonfire_nvr::stream::Opener>::open::{{closure}}
             at src/server/src/stream.rs:304:21
      <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/future/mod.rs:80:19
   9: tokio::runtime::task::core::CoreStage<T>::poll
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/core.rs:225:13
      tokio::runtime::task::harness::poll_future::{{closure}}
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/harness.rs:422:23
      <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:344:9
      std::panicking::try::do_call
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:379:40
      std::panicking::try
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:343:19
      std::panic::catch_unwind
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:431:14
      tokio::runtime::task::harness::poll_future
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/harness.rs:409:19
  10: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/harness.rs:89:9
      tokio::runtime::task::harness::Harness<T,S>::poll
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/harness.rs:59:15
  11: tokio::runtime::task::raw::RawTask::poll
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/raw.rs:66:18
      tokio::runtime::task::Notified<S>::run
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/mod.rs:171:9
      tokio::runtime::thread_pool::worker::Context::run_task::{{closure}}
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/thread_pool/worker.rs:349:13
      tokio::coop::with_budget::{{closure}}
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/coop.rs:106:9
      std::thread::local::LocalKey<T>::try_with
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/thread/local.rs:376:16
      std::thread::local::LocalKey<T>::with
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/thread/local.rs:352:9
  12: tokio::coop::with_budget
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/coop.rs:99:5
      tokio::coop::budget
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/coop.rs:76:5
      tokio::runtime::thread_pool::worker::Context::run_task
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/thread_pool/worker.rs:348:9
  13: tokio::runtime::thread_pool::worker::Context::run
  14: tokio::runtime::thread_pool::worker::run::{{closure}}
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/thread_pool/worker.rs:303:17
      tokio::macros::scoped_tls::ScopedKey<T>::set
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/macros/scoped_tls.rs:61:9
  15: tokio::runtime::thread_pool::worker::run
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/thread_pool/worker.rs:300:5
  16: tokio::runtime::thread_pool::worker::Launch::launch::{{closure}}
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/thread_pool/worker.rs:279:45
      <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/blocking/task.rs:42:21
      tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/core.rs:235:17
      tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/loom/std/unsafe_cell.rs:14:9
  17: tokio::runtime::task::core::CoreStage<T>::poll
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/core.rs:225:13
      tokio::runtime::task::harness::poll_future::{{closure}}
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/harness.rs:422:23
      <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:344:9
      std::panicking::try::do_call
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:379:40
      std::panicking::try
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:343:19
  18: std::panic::catch_unwind
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:431:14
      tokio::runtime::task::harness::poll_future
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/harness.rs:409:19
      tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/harness.rs:89:9
      tokio::runtime::task::harness::Harness<T,S>::poll
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/harness.rs:59:15
  19: tokio::runtime::task::raw::RawTask::poll
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/raw.rs:66:18
      tokio::runtime::task::Notified<S>::run
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/task/mod.rs:171:9
      tokio::runtime::blocking::pool::Inner::run
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/blocking/pool.rs:278:17
  20: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
             at .cargo/registry/src/github.aaakk.us.kg-1ecc6299db9ec823/tokio-1.6.0/src/runtime/blocking/pool.rs:258:17
      std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/sys_common/backtrace.rs:125:18
  21: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/thread/mod.rs:481:17
      <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:344:9
      std::panicking::try::do_call
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:379:40
      std::panicking::try
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panicking.rs:343:19
      std::panic::catch_unwind
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/panic.rs:431:14
      std::thread::Builder::spawn_unchecked::{{closure}}
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/thread/mod.rs:480:30
      core::ops::function::FnOnce::call_once{{vtable.shim}}
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/core/src/ops/function.rs:227:5
  22: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/alloc/src/boxed.rs:1546:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/alloc/src/boxed.rs:1546:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/53cb7b09b00cbea8754ffb78e7e3cb521cb8af4b/library/std/src/sys/unix/thread.rs:71:17

That's not actually a panic, despite it using the word in the middle. tokio has some code for catching panics and handling them relatively gracefully (keeping the thread running) and you're seeing stack frames relating to that.

I think this log message is complaining about the thing I mentioned in this Apr 29th comment. I think at the Retina layer I can add a feature that just ignores these (except maybe logging a packet/byte count every minute or something).

@jlpoolen
Copy link
Contributor Author

Yes. I apologize, here's what I faced. I have web UI rows which are showing intermittent files for a camera, sometime several hours in succession are sequentially recorded, then there is a gap. I looked at your troubleshooting readme and when I watched the logs go scrolling by, I immediately focused on creating a log file for the first 2 minutes. After doing that, I searched in it for the word "panic" and when the count returned with over 2,000 found, I just assumed something is horribly wrong and I'll let you look. I had forgotten at the problem you described that Reolink cameras have.

What I'll do now is generate a new log file and try to whittle down the entries that has documented gaps in recordings. I'll try to craft a Perl regular expression to cull the entries that have to do with sleep so we can see if there is also something else. The backtrace on each error makes filtering the log a challenge. (I'm hoping I can halt the back tracing for an initial run by setting to "0" or removing it, but I suspect in this Docker instance, it's wired to be active.)

I'll see if I can assess what a log run has statistically: are they all sleeps?, and then report back here.

Thank you.

@scottlamb
Copy link
Owner

If you feel like playing with a new tool, this sort of analysis is what lnav is good for.

@jlpoolen
Copy link
Contributor Author

1232 "Opening output" entries for 3 cameras out of six in 24 minutes.

I ran moonfire-nvr for 24 minutes (no RUST_BACKTRACE). I filtered the log to show only those lines with "Opening output".

ran for 24 minutes 1232 "Opening Input"
start: 2021-08-15T02:20:20.390944000Z I20210814 19:20:20.390
end:  2021-08-15T02:44:26.078480000Z I20210814 19:44:26.077

jlpoole@raspberrypi:~ $ cat /tmp/session_stderr.log |grep "Opening input" |wc
   1232   11088  215423
jlpoole@raspberrypi:~ $

over 3 cameras: GarageEast, GarageWest & PeckAlleyEast.
The "Garage" cameras are the oldest.  The PeckAlley are about 2 years old.

session_OpeningInput_only.zip

I can provide the log with the additional entry after an "Opening input" that has what was received; zipped up that log is 4.8 MBs.

Question: what can I provide you to give you the ability to assess the problem more thoroughly. I can redo a run, use RUST_BACKTRACE, or whatever modifications to the run time environment in docker you want.

The "Garage" may exhibit slower network response time given that they are connected via a Ubiquiti aerial connection. But I do not recall in earlier version that this issue manifested itself to such a degree as it does now. The PeckAlley are on a direct connection.

I'm going to follow-up on your suggestion for replacing the SD HC with a USB driven drive, I want to read your posted recommendation more closely. I'm also seeing my 5-disk array is no longer available, the company seems to have disappeared, and there are some bad review, so the USB circuitry for that array may be a contributing factor to problems. For people reading this, the lesson learned is that video is a very high demand on your hardware so cutting corners and pushing the envelope may end up costing more in terms of everyone's time than dollars saved.

Note: I did fine the PeckAlleyEast running at 100% CPU Loading using the Reolink admin interface, so I rebooted it.

@scottlamb
Copy link
Owner

Question: what can I provide you to give you the ability to assess the problem more thoroughly.

I think your earlier log has everything I need for now. All of the errors in there look consistent with something that scottlamb/retina#17 should improve significantly. You'll likely still have occasional disconnects—that seems inevitable with a flaky wifi connection—but I think it's going into a spiral after a disconnect now where after that change it will recover properly.

I also (finally) ordered my own Reolink camera so I can keep experimenting with it. It should arrive tomorrow.

The "Garage" may exhibit slower network response time given that they are connected via a Ubiquiti aerial connection. But I do not recall in earlier version that this issue manifested itself to such a degree as it does now.

The difference is probably the RTSP library change then. You can go back to the old behavior by adding --rtsp-library=ffmpeg to the docker run command in /usr/local/bin/nvr, then running nvr stop; nvr rm; nvr run to make it take effect.

I'm also seeing my 5-disk array is no longer available, the company seems to have disappeared, and there are some bad review, so the USB circuitry for that array may be a contributing factor to problems.

That's too bad. :-( 2021 is a tough year for ordering hardware too. I think the ODROID-H2+ is an excellent compromise between reliability and performance vs price, power, and space but it's just not available now. When the chip situation gets back to normal, there will likely be some other nice SBCs coming out too, but not much new is being released now and stocks are low / prices are up for stuff that already was released.

Some of these problems can be worked around. Eg, I put in the docs several places a recommendation to disable UAS with USB SATA adapters because of problems with my own hardware. Maybe that would help with yours too.

scottlamb added a commit that referenced this issue Aug 19, 2021
Attempt at #114 and #144. Let's see how much it helps.
@scottlamb
Copy link
Owner

Okay, I believe if you build from git HEAD (and undo the --rtsp-library=ffmpeg if you did that), it will work much better. The Unexpected RTSP message Data(...) and Expected ssrc=Some(...) seq=Some(...) got ssrc=... seq=... causes of connection failure should be eliminated. Can you give it a try and let me know how it goes? If it's a significant improvement, I'll prepare a release soon after.

@jlpoolen
Copy link
Contributor Author

I'll try this evening. I'm on a new image on Raspberry. I successfully built moonfire-nvr, but ran into an error in the npm ui, but it may be a harmless error. Haven't fired up moonfire-nvr in the new environment, yet, I still have to configure and mount my purple drive. I did not try reverting to the ffmpeg in the former environment (on the SD HC card).

@jlpoolen
Copy link
Contributor Author

While I flopped around trying to get the web UI working, I was starting and stopping nvr many times. I noticed that when I first started up, everything in the log looked clean. I then Control-C'd to stop. And then I started up within a minute or two. On the 2nd run, it looked like every camera was out of sync. I halted nvr. Waited several minutes, and then when I started up for a third time, things looked good. I did this kind of start/stop with interspersed waits and concluded that one has to wait several minutes before restarting to assure a clean start. I suspect this is because the cameras are still sending packets from the prior closed sessions. I'll let it run through the night and update tomorrow.

@scottlamb
Copy link
Owner

Hmm, I'd hoped that cooling-off period would be no longer necessary.

The brand-new Reolink camera I just bought—unlike the one I borrowed from a neighbor a while ago—doesn't display exactly these symptoms. So I haven't directly tested it on a problematic camera myself.

@jlpoolen
Copy link
Contributor Author

Aug_19_clean_start_then_failures.zip
Ouch. No sooner was I getting ready to retire when all of a sudden yellow went blazing across the screen. Here's the log file. I stopped the server and will resume tomorrow morning.

@scottlamb
Copy link
Owner

Shoot. Thanks for trying it out, and sorry it's still not working well.

W20210819 21:57:47.115 s-PeckPear-main moonfire_nvr::streamer] PeckPear-main: sleeping for Duration { secs: 1, nanos: 0 } after error: [192.168.1.88:51560(me)->192.168.1.54:554@2021-08-19T21:52:15, 115975788@2021-08-19T21:57:47] RTSP framing error: Invalid RTSP message; next 128 of 651 buffered bytes are:
0000:   b9 e1 69 a7  3a d0 b9 d4  e6 e5 f3 d5  33 10 54 6c   ..i.:.......3.Tl
0010:   ae fd 33 83  3d 7f 99 74  c8 2b eb ec  5d 80 c3 d2   ..3.=..t.+..]...
0020:   7d 7b 0a 7d  1f 18 a5 30  9f 89 5e 95  0f e7 13 37   }{.}...0..^....7
0030:   42 e1 be c3  63 6b 3d 45  c4 bf 73 85  33 ce 10 d1   B...ck=E..s.3...
0040:   f0 1b 5f 5d  73 72 f1 b3  d7 8d 20 f8  ea e1 d0 d4   .._]sr.... .....
0050:   24 6d 99 00  8e df e9 36  b2 ca 6f 8d  92 af 19 ce   $m.....6..o.....
0060:   2c 13 be 84  4d 0c bf 7c  4d 50 56 53  a7 c4 8c c7   ,...M..|MPVS....
0070:   02 d7 b0 d2  c7 a1 2a 69  66 1b 90 5a  b3 81 ca 1b   ......*if..Z....

Not sure yet if this is a bug in the client (Moonfire/Retina) or server (Reolink). I'll take a look at it. I might need a packet capture or similar to figure this out. The log shows the following bytes (which indeed aren't the start of a correct RTSP message) but I might need the preceding ones also. Maybe Retina erroneously dropped half a message or something.

@scottlamb
Copy link
Owner

Looks like if Reolink is sending malformed RTSP messages, they wouldn't be the first. There's a gstreamer issue at gst-plugins-base#653 to work around a problem like that. They didn't say what brand; my best guess is Uniview from looking at the URLs in the bug reporter's packet capture.

@scottlamb scottlamb changed the title 2 Minute Run w/6 cameras and 2,824 panics in log repeated failures with Reolink cameras Aug 31, 2021
@scottlamb
Copy link
Owner

As this is now understood as an RTSP-level interop problem with these buggy cameras, let's continue discussion at scottlamb/retina#17 as necessary. Current best solution is to use the new --rtsp-transport=udp argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants