Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: disk-stalled/log=false,data=true failed #54332

Closed
cockroach-teamcity opened this issue Sep 14, 2020 · 37 comments · Fixed by #56064
Closed

roachtest: disk-stalled/log=false,data=true failed #54332

cockroach-teamcity opened this issue Sep 14, 2020 · 37 comments · Fixed by #56064
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).disk-stalled/log=false,data=true failed on master@dc5544839735faaa04075e0d9e021ddba721f3bb:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Sep 14, 2020
@cockroach-teamcity cockroach-teamcity added this to the 20.2 milestone Sep 14, 2020
@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@e6a1f596188bf885c84e25f4a4ac3e5f6ff598da:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@petermattis
Copy link
Collaborator

@itsbilal We should skip this test if we can't fix it soon.

@petermattis petermattis removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Sep 21, 2020
@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=true,data=false failed on release-20.2@5f494c81c87743ec4eba12f5a1a5dc180b496c0c:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=true,data=false
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on release-20.2@445481640c3b9b1b34e537e6da9c853e9a83cc75:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on release-20.2@bdb8cd0e7b2f25a08569a56464838486b6d16421:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@5a7f32fd41cb84d35562db09761912ff9c903700:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=true,data=false failed on master@0cbfe9be4311af7facc0307ba4e4246680760635:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=true,data=false
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@d816ce972c4d28fcde15925ebc04e7c3522dbf20:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@e824da5a2b33168fa4ef93c83295c8205acfadb7:

		W201011 07:19:11.533851 9356 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] slow heartbeat took 8.426493974s; err=heartbeat failed on epoch increment
		E201011 07:19:11.533940 9356 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] heartbeat failed on epoch increment
		I201011 07:19:12.117862 186 server/status/runtime.go:522 ⋮ [n1] runtime stats: 158 MiB RSS, 185 goroutines, 28 MiB/24 MiB/58 MiB GO alloc/idle/total, 17 MiB/19 MiB CGO alloc/total, 0.1 CGO/sec, 0.8/0.7 %(u/s)time, 0.0 %gc (1x), 271 KiB/11 KiB (r/w)net
		W201011 07:19:12.436752 9289 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r28/1:‹/Table/3{2-3}›] slow heartbeat took 9.203978718s; err=heartbeat failed on epoch increment
		E201011 07:19:12.436919 9289 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r28/1:‹/Table/3{2-3}›] heartbeat failed on epoch increment
		W201011 07:19:12.436975 154 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201011 07:19:13.089145 154 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201011 07:19:13.089093 9397 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] slow heartbeat took 6.119556139s; err=heartbeat failed on epoch increment
		E201011 07:19:13.089597 9397 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] heartbeat failed on epoch increment
		W201011 07:19:13.389933 9380 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] slow heartbeat took 6.419771865s; err=heartbeat failed on epoch increment
		E201011 07:19:13.390201 9380 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] heartbeat failed on epoch increment
		W201011 07:19:13.691029 9400 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] slow heartbeat took 5.980401888s; err=heartbeat failed on epoch increment
		E201011 07:19:13.691167 9400 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] heartbeat failed on epoch increment
		I201011 07:19:14.491722 179 kv/kvserver/replica_rangefeed.go:610 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] RangeFeed closed timestamp 1602400737.687037550,0 is behind by 16.804681822s
		W201011 07:19:14.995107 9447 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] slow heartbeat took 3.460936257s; err=<nil>
		W201011 07:19:14.995027 188 kv/kvserver/node_liveness.go:787 ⋮ [n1,liveness-hb] slow heartbeat took 3.524246575s; err=<nil>
		W201011 07:19:14.995195 9450 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r8/1:‹/Table/1{2-3}›] slow heartbeat took 2.558039247s; err=<nil>
		W201011 07:19:14.995251 9466 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] slow heartbeat took 1.905355488s; err=<nil>
		W201011 07:19:14.995291 9451 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] slow heartbeat took 1.604829546s; err=<nil>
		W201011 07:19:15.496926 154 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r8/1:‹/Table/1{2-3}›] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
		W201011 07:19:15.497038 145 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
		W201011 07:19:15.496853 9377 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] slow heartbeat took 1.805442678s; err=<nil>
		W201011 07:19:15.496961 159 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
		W201011 07:19:15.498110 167 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
		W201011 07:19:16.349859 163 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] handle raft ready: 0.9s [applied=1, batches=1, state_assertions=0]
		W201011 07:19:16.349945 159 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 0.9s [applied=1, batches=1, state_assertions=0]
		W201011 07:19:16.349967 167 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 0.9s [applied=1, batches=1, state_assertions=0]
		W201011 07:19:16.349859 145 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 0.9s [applied=1, batches=1, state_assertions=0]
		W201011 07:19:16.349859 154 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r8/1:‹/Table/1{2-3}›] handle raft ready: 0.9s [applied=1, batches=1, state_assertions=0]
		W201011 07:19:16.350345 162 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 0.9s [applied=1, batches=1, state_assertions=0]
		W201011 07:19:16.701055 165 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 0.9s [applied=1, batches=1, state_assertions=0]
		W201011 07:19:19.661523 162 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 3.3s [applied=2, batches=1, state_assertions=0]
		W201011 07:19:19.661627 163 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] handle raft ready: 3.3s [applied=1, batches=1, state_assertions=0]
		I201011 07:19:19.661919 294 sql/sqlliveness/slstorage/slstorage.go:342 ⋮ [n1] inserted sqlliveness session ‹f313cf8196c5447ab34901c788c47a55›
		I201011 07:19:19.662395 294 sql/sqlliveness/slinstance/slinstance.go:143 ⋮ [n1] created new SQL liveness session ‹f313cf8196c5447ab34901c788c47a55›
		W201011 07:19:20.113482 188 kv/kvserver/node_liveness.go:787 ⋮ [n1,liveness-hb] slow heartbeat took 5.117863941s; err=<nil>
		W201011 07:19:20.113613 165 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 3.4s [applied=1, batches=1, state_assertions=0]
		W201011 07:19:20.113613 168 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 1.8s [applied=1, batches=1, state_assertions=0]
		W201011 07:19:20.666456 159 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 4.3s [applied=2, batches=2, state_assertions=0]
		W201011 07:19:20.815087 145 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 4.5s [applied=2, batches=2, state_assertions=0]
		W201011 07:19:21.618415 162 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 2.0s [applied=1, batches=1, state_assertions=0]
		bash: line 1:  4384 Killed                  bash -c "timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store /mnt/data1/cockroach/faulty --log-dir /mnt/data1/cockroach/real/logs"
		Error: COMMAND_PROBLEM: exit status 137
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/real/logs
		  | ```
		Wraps: (3) exit status 137
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=true,data=false failed on master@e824da5a2b33168fa4ef93c83295c8205acfadb7:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=true,data=false
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@petermattis
Copy link
Collaborator

@itsbilal The new failures are not happening during test setup. It is worth taking a fresh look at them.

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@9b97bda17396aabc4519b9fa272c39dfe5e851da:

		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		W201012 07:27:15.372429 135 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 3.2s [applied=1, batches=1, state_assertions=0]
		W201012 07:27:15.672992 9202 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] slow heartbeat took 9.906848182s; err=<nil>
		W201012 07:27:15.673131 9234 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r18/1:‹/Table/2{2-3}›] slow heartbeat took 9.791483938s; err=<nil>
		W201012 07:27:15.673496 9204 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] slow heartbeat took 9.701352598s; err=<nil>
		W201012 07:27:16.375088 9253 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] slow heartbeat took 7.821408342s; err=<nil>
		W201012 07:27:16.375182 159 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201012 07:27:16.375576 136 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r18/1:‹/Table/2{2-3}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201012 07:27:16.375578 132 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		I201012 07:27:16.375626 173 kv/kvserver/replica_rangefeed.go:610 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] RangeFeed closed timestamp 1602487619.720213055,0 is behind by 16.655411699s
		W201012 07:27:16.375648 152 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201012 07:27:16.976540 9241 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] slow heartbeat took 8.172681187s; err=<nil>
		W201012 07:27:16.976632 160 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201012 07:27:16.976572 132 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201012 07:27:16.976750 152 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201012 07:27:16.976659 136 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r18/1:‹/Table/2{2-3}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201012 07:27:17.477901 9299 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] slow heartbeat took 5.312667859s; err=<nil>
		W201012 07:27:17.477996 161 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
		W201012 07:27:17.478043 160 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 0.5s [applied=2, batches=1, state_assertions=0]
		W201012 07:27:17.678555 198 kv/kvserver/node_liveness.go:787 ⋮ [n1,liveness-hb] slow heartbeat took 4.624477028s; err=heartbeat failed on epoch increment
		I201012 07:27:17.678642 198 kv/kvserver/node_liveness.go:679 ⋮ [n1,liveness-hb] heartbeat failed on epoch increment; retrying
		W201012 07:27:18.887264 198 kv/kvserver/node_liveness.go:787 ⋮ [n1,liveness-hb] slow heartbeat took 1.208564403s; err=<nil>
		W201012 07:27:19.138906 9371 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] slow heartbeat took 1.309122407s; err=<nil>
		W201012 07:27:19.290352 9349 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] slow heartbeat took 1.460561409s; err=<nil>
		W201012 07:27:19.592543 9373 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] slow heartbeat took 1.708165997s; err=<nil>
		W201012 07:27:19.894872 9374 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] slow heartbeat took 1.712662057s; err=<nil>
		I201012 07:27:20.770308 196 server/status/runtime.go:522 ⋮ [n1] runtime stats: 161 MiB RSS, 178 goroutines, 34 MiB/22 MiB/58 MiB GO alloc/idle/total, 17 MiB/19 MiB CGO alloc/total, 0.1 CGO/sec, 0.9/0.5 %(u/s)time, 0.0 %gc (0x), 4.1 KiB/11 KiB (r/w)net
		W201012 07:27:22.053453 135 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 2.2s [applied=1, batches=1, state_assertions=0]
		W201012 07:27:22.053521 140 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 2.2s [applied=1, batches=1, state_assertions=0]
		W201012 07:27:22.604718 155 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 2.6s [applied=1, batches=1, state_assertions=0]
		W201012 07:27:22.604794 153 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 2.3s [applied=1, batches=1, state_assertions=0]
		W201012 07:27:22.655944 145 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 2.8s [applied=2, batches=2, state_assertions=0]
		W201012 07:27:23.808068 140 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 1.8s [applied=2, batches=1, state_assertions=0]
		W201012 07:27:28.069191 153 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 5.5s [applied=1, batches=1, state_assertions=0]
		W201012 07:27:28.069234 155 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 5.5s [applied=2, batches=1, state_assertions=0]
		W201012 07:27:28.069263 139 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 5.5s [applied=1, batches=1, state_assertions=0]
		W201012 07:27:28.871046 145 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 6.2s [applied=1, batches=1, state_assertions=0]
		W201012 07:27:30.125108 148 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] handle raft ready: 3.8s [applied=1, batches=1, state_assertions=0]
		bash: line 1:  3785 Killed                  bash -c "timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store /mnt/data1/cockroach/faulty --log-dir /mnt/data1/cockroach/real/logs"
		Error: COMMAND_PROBLEM: exit status 137
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/real/logs
		  | ```
		Wraps: (3) exit status 137
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=true,data=false failed on master@9b97bda17396aabc4519b9fa272c39dfe5e851da:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=true,data=false
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=true,data=false failed on master@ea42b57cbd348864b7d389cb1cf3cd70fff61bfa:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=true,data=false
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@ea42b57cbd348864b7d389cb1cf3cd70fff61bfa:

		W201013 07:56:31.881285 131 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 4.7s [applied=1, batches=1, state_assertions=0]
		W201013 07:56:32.182224 145 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201013 07:56:32.182231 158 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201013 07:56:32.583856 131 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201013 07:56:32.985007 145 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201013 07:56:32.985113 158 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201013 07:56:32.985293 9382 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r8/1:‹/Table/1{2-3}›] slow heartbeat took 6.89939764s; err=heartbeat failed on epoch increment
		E201013 07:56:32.985376 9382 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r8/1:‹/Table/1{2-3}›] heartbeat failed on epoch increment
		W201013 07:56:33.536693 9385 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] slow heartbeat took 6.770513383s; err=heartbeat failed on epoch increment
		E201013 07:56:33.536831 9385 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] heartbeat failed on epoch increment
		W201013 07:56:33.937941 9367 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] slow heartbeat took 6.939590421s; err=heartbeat failed on epoch increment
		E201013 07:56:33.938104 9367 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] heartbeat failed on epoch increment
		W201013 07:56:34.138036 158 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 1.2s [applied=2, batches=2, state_assertions=0]
		W201013 07:56:34.590087 9388 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] slow heartbeat took 7.374103737s; err=heartbeat failed on epoch increment
		E201013 07:56:34.590243 9388 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] heartbeat failed on epoch increment
		W201013 07:56:34.941234 9424 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] slow heartbeat took 4.313822505s; err=heartbeat failed on epoch increment
		E201013 07:56:34.941419 9424 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] heartbeat failed on epoch increment
		W201013 07:56:35.242200 9440 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] slow heartbeat took 3.995987164s; err=heartbeat failed on epoch increment
		E201013 07:56:35.242416 9440 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] heartbeat failed on epoch increment
		W201013 07:56:35.543478 32 kv/kvserver/node_liveness.go:787 ⋮ [n1,liveness-hb] slow heartbeat took 4.277836278s; err=heartbeat failed on epoch increment
		I201013 07:56:35.543592 32 kv/kvserver/node_liveness.go:679 ⋮ [n1,liveness-hb] heartbeat failed on epoch increment; retrying
		W201013 07:56:35.765888 32 kv/kvserver/node_liveness.go:689 ⋮ [n1,liveness-hb] failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 4.5s›
		(1) ‹operation "node liveness heartbeat" timed out after 4.5s›
		Wraps: (2) context deadline exceeded
		Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
		
		An inability to maintain liveness will prevent a node from participating in a
		cluster. If this problem persists, it may be a sign of resource starvation or
		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		I201013 07:56:36.332596 173 kv/kvserver/replica_rangefeed.go:610 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] RangeFeed closed timestamp 1602575777.727070116,0 is behind by 18.605522589s
		W201013 07:56:36.397876 9491 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r18/1:‹/Table/2{2-3}›] slow heartbeat took 3.412264396s; err=<nil>
		W201013 07:56:36.397995 9501 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] slow heartbeat took 2.860856854s; err=<nil>
		W201013 07:56:36.398100 9509 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] slow heartbeat took 1.807628682s; err=<nil>
		W201013 07:56:36.398033 9467 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] slow heartbeat took 2.459664324s; err=<nil>
		W201013 07:56:36.398229 9510 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] slow heartbeat took 1.456591433s; err=<nil>
		W201013 07:56:36.398379 9450 kv/kvserver/node_liveness.go:787 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] slow heartbeat took 1.155733975s; err=<nil>
		W201013 07:56:37.456328 149 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r7/1:‹/Table/1{1-2}›] handle raft ready: 0.5s [applied=2, batches=1, state_assertions=0]
		W201013 07:56:37.505676 139 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 0.8s [applied=2, batches=2, state_assertions=0]
		bash: line 1:  3827 Killed                  bash -c "timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store /mnt/data1/cockroach/faulty --log-dir /mnt/data1/cockroach/real/logs"
		Error: COMMAND_PROBLEM: exit status 137
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/real/logs
		  | ```
		Wraps: (3) exit status 137
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@05826e3a748d3bf7634130d49667627e687d5875:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@80e7127197f76ef35c1f6ec3984c4d49d4afde7f:

		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		W201015 07:25:16.885925 118 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 2.8s [applied=1, batches=1, state_assertions=0]
		W201015 07:25:17.702840 28 kv/kvserver/closedts/provider/provider.go:155 ⋮ [ct-closer] unable to move closed timestamp forward: not live
		(1) attached stack trace
		  -- stack trace:
		  | github.com/cockroachdb/cockroach/pkg/kv/kvserver.init
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/node_liveness.go:66
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:5228
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:5223
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:5223
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:5223
		  | runtime.doInit
		  | 	/usr/local/go/src/runtime/proc.go:5223
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:190
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (2) not live
		Error types: (1) *withstack.withStack (2) *errutil.leafError
		W201015 07:25:21.249589 118 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 4.4s [applied=1, batches=1, state_assertions=0]
		W201015 07:25:21.249596 136 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 4.4s [applied=1, batches=1, state_assertions=0]
		W201015 07:25:21.249857 141 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 4.3s [applied=1, batches=1, state_assertions=0]
		W201015 07:25:21.386242 196 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 4.500409079s; err=‹aborted during DistSender.Send: context deadline exceeded›
		W201015 07:25:21.386358 196 kv/kvserver/node_liveness.go:717 ⋮ [n1,liveness-hb] failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 4.5s›
		(1) ‹operation "node liveness heartbeat" timed out after 4.5s›
		Wraps: (2) ‹aborted during DistSender.Send: context deadline exceeded›
		Error types: (1) *contextutil.TimeoutError (2) *roachpb.internalError
		
		An inability to maintain liveness will prevent a node from participating in a
		cluster. If this problem persists, it may be a sign of resource starvation or
		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		W201015 07:25:21.449612 127 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 5.1s [applied=1, batches=1, state_assertions=0]
		bash: line 1:  3813 Killed                  bash -c "timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store /mnt/data1/cockroach/faulty --log-dir /mnt/data1/cockroach/real/logs"
		Error: COMMAND_PROBLEM: exit status 137
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/real/logs
		  | ```
		Wraps: (3) exit status 137
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=true,data=false failed on master@80e7127197f76ef35c1f6ec3984c4d49d4afde7f:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=true,data=false
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@47044feed11ec0c0390989bf8f44e777ec3eb00d:

		W201016 07:20:10.646512 137 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 4.4s [applied=2, batches=2, state_assertions=0]
		W201016 07:20:10.646577 160 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 4.4s [applied=2, batches=1, state_assertions=0]
		W201016 07:20:10.646482 141 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 4.4s [applied=1, batches=1, state_assertions=0]
		W201016 07:20:10.735991 186 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 4.500136257s; err=context deadline exceeded
		W201016 07:20:10.736091 186 kv/kvserver/node_liveness.go:717 ⋮ [n1,liveness-hb] failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 4.5s›
		(1) ‹operation "node liveness heartbeat" timed out after 4.5s›
		Wraps: (2) context deadline exceeded
		Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
		
		An inability to maintain liveness will prevent a node from participating in a
		cluster. If this problem persists, it may be a sign of resource starvation or
		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		W201016 07:20:11.247866 153 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201016 07:20:11.598505 152 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201016 07:20:11.899194 149 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201016 07:20:11.899397 9326 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r23/1:‹/Table/2{7-8}›] slow heartbeat took 6.580922485s; err=heartbeat failed on epoch increment
		E201016 07:20:11.899468 9326 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r23/1:‹/Table/2{7-8}›] heartbeat failed on epoch increment
		W201016 07:20:12.199892 152 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201016 07:20:12.550752 149 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201016 07:20:12.550656 9410 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] slow heartbeat took 6.315388209s; err=heartbeat failed on epoch increment
		E201016 07:20:12.550828 9410 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] heartbeat failed on epoch increment
		W201016 07:20:13.052156 9426 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] slow heartbeat took 6.767273135s; err=heartbeat failed on epoch increment
		E201016 07:20:13.052317 9426 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] heartbeat failed on epoch increment
		W201016 07:20:13.402871 9413 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] slow heartbeat took 6.391211901s; err=heartbeat failed on epoch increment
		E201016 07:20:13.403087 9413 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] heartbeat failed on epoch increment
		W201016 07:20:13.753602 9465 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] slow heartbeat took 4.410484475s; err=heartbeat failed on epoch increment
		E201016 07:20:13.753722 9465 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] heartbeat failed on epoch increment
		W201016 07:20:14.054290 186 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 3.318139303s; err=heartbeat failed on epoch increment
		I201016 07:20:14.054379 186 kv/kvserver/node_liveness.go:707 ⋮ [n1,liveness-hb] heartbeat failed on epoch increment; retrying
		I201016 07:20:14.961802 177 kv/kvserver/replica_rangefeed.go:610 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] RangeFeed closed timestamp 1602832796.957761076,0 is behind by 18.004038601s
		W201016 07:20:15.157641 9482 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r13/1:‹/Table/1{7-8}›] slow heartbeat took 3.257974877s; err=<nil>
		W201016 07:20:15.157745 9468 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] slow heartbeat took 2.606579998s; err=<nil>
		W201016 07:20:15.157836 9484 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] slow heartbeat took 2.105103193s; err=<nil>
		W201016 07:20:15.158178 9452 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] slow heartbeat took 1.754811934s; err=<nil>
		W201016 07:20:15.158323 9453 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] slow heartbeat took 1.404407322s; err=<nil>
		W201016 07:20:15.409007 186 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 1.308322051s; err=<nil>
		W201016 07:20:16.063571 163 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 0.5s [applied=2, batches=1, state_assertions=0]
		W201016 07:20:16.112403 137 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 0.7s [applied=2, batches=2, state_assertions=0]
		bash: line 1:  4289 Killed                  bash -c "timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store /mnt/data1/cockroach/faulty --log-dir /mnt/data1/cockroach/real/logs"
		Error: COMMAND_PROBLEM: exit status 137
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/real/logs
		  | ```
		Wraps: (3) exit status 137
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=true,data=false failed on master@47044feed11ec0c0390989bf8f44e777ec3eb00d:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=true,data=false
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@b1abf9c8dfb5880fce69dfc7240e593f077bf77c:

		W201017 07:26:19.855721 162 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 2.2s [applied=1, batches=1, state_assertions=0]
		I201017 07:26:23.439642 227 server/status/runtime.go:522 ⋮ [n1] runtime stats: 156 MiB RSS, 185 goroutines, 29 MiB/22 MiB/57 MiB GO alloc/idle/total, 17 MiB/19 MiB CGO alloc/total, 0.1 CGO/sec, 1.2/0.5 %(u/s)time, 0.0 %gc (1x), 2.7 KiB/7.3 KiB (r/w)net
		W201017 07:26:24.318785 161 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 6.0s [applied=1, batches=1, state_assertions=0]
		W201017 07:26:24.318808 157 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 5.5s [applied=1, batches=1, state_assertions=0]
		I201017 07:26:24.319050 229 kv/kvserver/node_liveness.go:1189 ⋮ [n1,liveness-hb] retrying liveness update after ‹kvserver.errRetryLiveness›: ‹result is ambiguous (context deadline exceeded)›
		W201017 07:26:24.319116 229 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 10.982488524s; err=context deadline exceeded
		W201017 07:26:24.319154 229 kv/kvserver/node_liveness.go:717 ⋮ [n1,liveness-hb] failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 4.5s›
		(1) ‹operation "node liveness heartbeat" timed out after 4.5s›
		Wraps: (2) context deadline exceeded
		Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
		
		An inability to maintain liveness will prevent a node from participating in a
		cluster. If this problem persists, it may be a sign of resource starvation or
		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		W201017 07:26:25.170181 156 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 6.5s [applied=1, batches=1, state_assertions=0]
		W201017 07:26:26.423990 120 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r24/1:‹/Table/2{8-9}›] handle raft ready: 6.6s [applied=1, batches=1, state_assertions=0]
		W201017 07:26:26.424056 152 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 5.3s [applied=1, batches=1, state_assertions=0]
		W201017 07:26:26.424375 124 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 6.6s [applied=1, batches=1, state_assertions=0]
		I201017 07:26:27.319513 8879 kv/txn.go:750 ⋮ [n1] async rollback failed: ‹aborted during DistSender.Send: context deadline exceeded›
		I201017 07:26:28.402815 112 gossip/gossip.go:1508 ⋮ [n1] node has connected to cluster via gossip
		W201017 07:26:28.819345 229 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 4.500128939s; err=context deadline exceeded
		W201017 07:26:28.819449 229 kv/kvserver/node_liveness.go:717 ⋮ [n1,liveness-hb] failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 4.5s›
		(1) ‹operation "node liveness heartbeat" timed out after 4.5s›
		Wraps: (2) context deadline exceeded
		Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
		
		An inability to maintain liveness will prevent a node from participating in a
		cluster. If this problem persists, it may be a sign of resource starvation or
		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		W201017 07:26:30.884949 157 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 6.6s [applied=1, batches=1, state_assertions=0]
		W201017 07:26:30.885102 153 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 6.6s [applied=1, batches=1, state_assertions=0]
		W201017 07:26:30.887843 156 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 5.7s [applied=1, batches=1, state_assertions=0]
		W201017 07:26:31.335987 124 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 4.9s [applied=1, batches=1, state_assertions=0]
		W201017 07:26:31.336053 152 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 4.9s [applied=2, batches=2, state_assertions=0]
		W201017 07:26:31.336030 120 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r24/1:‹/Table/2{8-9}›] handle raft ready: 4.9s [applied=1, batches=1, state_assertions=0]
		bash: line 1:  4004 Killed                  bash -c "timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store /mnt/data1/cockroach/faulty --log-dir /mnt/data1/cockroach/real/logs"
		Error: COMMAND_PROBLEM: exit status 137
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/real/logs
		  | ```
		Wraps: (3) exit status 137
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=true,data=false failed on master@b1abf9c8dfb5880fce69dfc7240e593f077bf77c:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=true,data=false
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@d752fa2bd9afad255e8c655de9c7edc6dad14486:

		W201018 07:21:00.435787 164 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201018 07:21:00.513471 194 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 4.500160717s; err=context deadline exceeded
		W201018 07:21:00.513586 194 kv/kvserver/node_liveness.go:717 ⋮ [n1,liveness-hb] failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 4.5s›
		(1) ‹operation "node liveness heartbeat" timed out after 4.5s›
		Wraps: (2) context deadline exceeded
		Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
		
		An inability to maintain liveness will prevent a node from participating in a
		cluster. If this problem persists, it may be a sign of resource starvation or
		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		W201018 07:21:00.936968 9230 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r10/1:‹/Table/1{4-5}›] slow heartbeat took 11.221552987s; err=<nil>
		W201018 07:21:00.937078 9276 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] slow heartbeat took 10.908232365s; err=<nil>
		W201018 07:21:01.339020 9333 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] slow heartbeat took 9.073754629s; err=heartbeat failed on epoch increment
		E201018 07:21:01.339244 9333 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] heartbeat failed on epoch increment
		W201018 07:21:01.641000 9321 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] slow heartbeat took 7.471171268s; err=heartbeat failed on epoch increment
		E201018 07:21:01.641144 9321 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] heartbeat failed on epoch increment
		W201018 07:21:01.892446 9370 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] slow heartbeat took 6.065474476s; err=heartbeat failed on epoch increment
		E201018 07:21:01.892589 9370 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] heartbeat failed on epoch increment
		W201018 07:21:02.043660 9381 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] slow heartbeat took 6.102129249s; err=heartbeat failed on epoch increment
		E201018 07:21:02.043804 9381 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] heartbeat failed on epoch increment
		I201018 07:21:02.350910 176 kv/kvserver/replica_rangefeed.go:610 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] RangeFeed closed timestamp 1603005643.746320923,0 is behind by 18.604586891s
		W201018 07:21:02.648845 194 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 2.135183424s; err=<nil>
		W201018 07:21:02.648954 9435 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] slow heartbeat took 1.309630162s; err=<nil>
		W201018 07:21:02.649038 9436 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r10/1:‹/Table/1{4-5}›] slow heartbeat took 1.309663267s; err=<nil>
		W201018 07:21:02.649423 9309 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] slow heartbeat took 1.309960009s; err=<nil>
		W201018 07:21:02.649497 9437 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] slow heartbeat took 1.008174384s; err=<nil>
		I201018 07:21:03.611608 183 server/status/runtime.go:522 ⋮ [n1] runtime stats: 160 MiB RSS, 177 goroutines, 35 MiB/21 MiB/56 MiB GO alloc/idle/total, 17 MiB/19 MiB CGO alloc/total, 0.1 CGO/sec, 0.9/0.3 %(u/s)time, 0.0 %gc (0x), 4.5 KiB/11 KiB (r/w)net
		W201018 07:21:07.114630 165 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 3.9s [applied=1, batches=1, state_assertions=0]
		W201018 07:21:07.114698 145 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 3.9s [applied=1, batches=1, state_assertions=0]
		W201018 07:21:07.114633 158 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 3.9s [applied=2, batches=1, state_assertions=0]
		W201018 07:21:07.114945 156 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 3.9s [applied=1, batches=1, state_assertions=0]
		W201018 07:21:07.615837 151 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 2.3s [applied=1, batches=1, state_assertions=0]
		W201018 07:21:07.615838 157 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r7/1:‹/Table/1{1-2}›] handle raft ready: 1.1s [applied=1, batches=1, state_assertions=0]
		W201018 07:21:07.867786 160 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 4.6s [applied=2, batches=2, state_assertions=0]
		W201018 07:21:09.821339 194 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 6.719884978s; err=<nil>
		W201018 07:21:09.821595 158 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 2.7s [applied=1, batches=1, state_assertions=0]
		W201018 07:21:11.324706 164 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 4.2s [applied=1, batches=1, state_assertions=0]
		W201018 07:21:11.525157 156 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 4.4s [applied=3, batches=3, state_assertions=0]
		bash: line 1:  4415 Killed                  bash -c "timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store /mnt/data1/cockroach/faulty --log-dir /mnt/data1/cockroach/real/logs"
		Error: COMMAND_PROBLEM: exit status 137
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/real/logs
		  | ```
		Wraps: (3) exit status 137
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=true,data=false failed on master@d752fa2bd9afad255e8c655de9c7edc6dad14486:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=true,data=false
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@ab503e2fd708541e5e9ebb9a6f2651eda506f2ef:

		Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
		
		An inability to maintain liveness will prevent a node from participating in a
		cluster. If this problem persists, it may be a sign of resource starvation or
		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		W201019 07:14:44.238256 158 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201019 07:14:44.539110 9342 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r25/1:‹/{Table/29-NamespaceTab…}›] slow heartbeat took 4.95093348s; err=<nil>
		W201019 07:14:44.539174 140 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201019 07:14:45.090760 140 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201019 07:14:45.090740 9343 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] slow heartbeat took 4.716080185s; err=<nil>
		W201019 07:14:45.090813 141 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r25/1:‹/{Table/29-NamespaceTab…}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201019 07:14:45.391515 9407 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] slow heartbeat took 4.76523496s; err=<nil>
		W201019 07:14:45.742502 141 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r25/1:‹/{Table/29-NamespaceTab…}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201019 07:14:45.742567 137 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201019 07:14:46.043243 9441 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] slow heartbeat took 5.266769761s; err=<nil>
		W201019 07:14:46.043598 143 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201019 07:14:46.394318 137 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201019 07:14:46.845484 143 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201019 07:14:46.845577 145 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201019 07:14:46.845612 146 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201019 07:14:48.099050 137 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 1.7s [applied=1, batches=1, state_assertions=0]
		W201019 07:14:48.449908 145 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 1.6s [applied=1, batches=1, state_assertions=0]
		W201019 07:14:48.450209 146 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 1.6s [applied=1, batches=1, state_assertions=0]
		W201019 07:14:48.449940 131 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201019 07:14:48.450602 135 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 1.6s [applied=1, batches=1, state_assertions=0]
		I201019 07:14:48.450805 196 kv/kvserver/node_liveness.go:1189 ⋮ [n1,liveness-hb] retrying liveness update after ‹kvserver.errRetryLiveness›: ‹result is ambiguous (context deadline exceeded)›
		W201019 07:14:48.450981 196 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 4.512928567s; err=context deadline exceeded
		W201019 07:14:48.451032 196 kv/kvserver/node_liveness.go:717 ⋮ [n1,liveness-hb] failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 4.5s›
		(1) ‹operation "node liveness heartbeat" timed out after 4.5s›
		Wraps: (2) context deadline exceeded
		Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
		
		An inability to maintain liveness will prevent a node from participating in a
		cluster. If this problem persists, it may be a sign of resource starvation or
		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		bash: line 1:  4511 Killed                  bash -c "timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store /mnt/data1/cockroach/faulty --log-dir /mnt/data1/cockroach/real/logs"
		Error: COMMAND_PROBLEM: exit status 137
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/real/logs
		  | ```
		Wraps: (3) exit status 137
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=true,data=false failed on master@ab503e2fd708541e5e9ebb9a6f2651eda506f2ef:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=true,data=false
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@1d46df77dbd8721cccf508fb5ed498f3de78022c:

		Wraps: (2) context deadline exceeded
		Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
		
		An inability to maintain liveness will prevent a node from participating in a
		cluster. If this problem persists, it may be a sign of resource starvation or
		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		W201021 07:29:31.722369 9396 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] slow heartbeat took 9.230484068s; err=heartbeat failed on epoch increment
		E201021 07:29:31.722572 9396 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] heartbeat failed on epoch increment
		W201021 07:29:32.073661 140 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r25/1:‹/{Table/29-NamespaceTab…}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201021 07:29:32.374690 9374 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] slow heartbeat took 9.080904868s; err=heartbeat failed on epoch increment
		E201021 07:29:32.374895 9374 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] heartbeat failed on epoch increment
		W201021 07:29:32.675711 140 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r25/1:‹/{Table/29-NamespaceTab…}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201021 07:29:33.027017 9419 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] slow heartbeat took 5.907298156s; err=heartbeat failed on epoch increment
		E201021 07:29:33.027207 9419 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] heartbeat failed on epoch increment
		W201021 07:29:33.528926 9408 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] slow heartbeat took 4.114111933s; err=heartbeat failed on epoch increment
		E201021 07:29:33.529145 9408 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] heartbeat failed on epoch increment
		W201021 07:29:34.833450 9498 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r25/1:‹/{Table/29-NamespaceTab…}›] slow heartbeat took 2.759586144s; err=<nil>
		W201021 07:29:34.833554 9523 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] slow heartbeat took 2.458320162s; err=<nil>
		W201021 07:29:34.833658 9524 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] slow heartbeat took 1.806218891s; err=<nil>
		W201021 07:29:34.833317 200 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 3.340398856s; err=<nil>
		W201021 07:29:34.833381 9522 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] slow heartbeat took 3.110323747s; err=<nil>
		W201021 07:29:35.285033 151 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201021 07:29:35.585875 9525 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] slow heartbeat took 2.056475246s; err=<nil>
		W201021 07:29:35.585968 161 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r25/1:‹/{Table/29-NamespaceTab…}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201021 07:29:35.586043 139 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201021 07:29:35.586185 157 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201021 07:29:35.586372 160 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201021 07:29:35.937214 151 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201021 07:29:36.238265 157 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201021 07:29:36.238389 138 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201021 07:29:36.238438 135 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201021 07:29:36.238258 161 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r25/1:‹/{Table/29-NamespaceTab…}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201021 07:29:36.238264 160 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201021 07:29:36.689296 139 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 1.1s [applied=2, batches=2, state_assertions=0]
		W201021 07:29:36.790363 138 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201021 07:29:36.790427 135 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201021 07:29:36.790637 133 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201021 07:29:38.251524 139 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 1.6s [applied=1, batches=1, state_assertions=0]
		bash: line 1:  3909 Killed                  bash -c "timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store /mnt/data1/cockroach/faulty --log-dir /mnt/data1/cockroach/real/logs"
		Error: COMMAND_PROBLEM: exit status 137
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/real/logs
		  | ```
		Wraps: (3) exit status 137
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=true,data=false failed on master@1d46df77dbd8721cccf508fb5ed498f3de78022c:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=true,data=false
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@40b7942025de0d8e347d25451611ad2c20267d48:

		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (2) not live
		Error types: (1) *withstack.withStack (2) *errutil.leafError
		W201022 07:38:20.815010 150 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 5.6s [applied=1, batches=1, state_assertions=0]
		W201022 07:38:21.617054 155 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 4.7s [applied=1, batches=1, state_assertions=0]
		W201022 07:38:21.617129 142 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 5.1s [applied=1, batches=1, state_assertions=0]
		W201022 07:38:21.666590 144 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 6.3s [applied=1, batches=1, state_assertions=0]
		W201022 07:38:22.118669 150 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 1.3s [applied=2, batches=2, state_assertions=0]
		I201022 07:38:22.118929 196 kv/kvserver/node_liveness.go:1189 ⋮ [n1,liveness-hb] retrying liveness update after ‹kvserver.errRetryLiveness›: ‹result is ambiguous (context deadline exceeded)›
		W201022 07:38:22.119004 196 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 9.725519462s; err=context deadline exceeded
		W201022 07:38:22.119046 196 kv/kvserver/node_liveness.go:717 ⋮ [n1,liveness-hb] failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 4.5s›
		(1) ‹operation "node liveness heartbeat" timed out after 4.5s›
		Wraps: (2) context deadline exceeded
		Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
		
		An inability to maintain liveness will prevent a node from participating in a
		cluster. If this problem persists, it may be a sign of resource starvation or
		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		W201022 07:38:26.230744 155 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 4.6s [applied=1, batches=1, state_assertions=0]
		W201022 07:38:26.230821 142 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 4.6s [applied=2, batches=1, state_assertions=0]
		W201022 07:38:26.233426 144 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 4.6s [applied=1, batches=1, state_assertions=0]
		W201022 07:38:26.531434 9433 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] slow heartbeat took 9.010861087s; err=heartbeat failed on epoch increment
		E201022 07:38:26.531579 9433 kv/kvserver/replica_range_lease.go:339 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] heartbeat failed on epoch increment
		W201022 07:38:26.531730 163 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 4.4s [applied=2, batches=2, state_assertions=0]
		I201022 07:38:26.531920 9439 kv/txn.go:750 ⋮ [n1] async rollback failed: ‹result is ambiguous (context deadline exceeded)›: "unnamed" meta={id=10bd7f40 pri=0.00847168 epo=0 ts=1603352295.198220927,0 min=1603352295.198220927,0 seq=0} lock=true stat=PENDING rts=1603352295.198220927,0 wto=false max=1603352295.698220927,0
		W201022 07:38:26.619331 196 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 4.500204218s; err=context deadline exceeded
		W201022 07:38:26.619443 196 kv/kvserver/node_liveness.go:717 ⋮ [n1,liveness-hb] failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 4.5s›
		(1) ‹operation "node liveness heartbeat" timed out after 4.5s›
		Wraps: (2) context deadline exceeded
		Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
		
		An inability to maintain liveness will prevent a node from participating in a
		cluster. If this problem persists, it may be a sign of resource starvation or
		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		bash: line 1:  4393 Killed                  bash -c "timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store /mnt/data1/cockroach/faulty --log-dir /mnt/data1/cockroach/real/logs"
		Error: COMMAND_PROBLEM: exit status 137
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/real/logs
		  | ```
		Wraps: (3) exit status 137
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=true,data=false failed on master@40b7942025de0d8e347d25451611ad2c20267d48:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=true,data=false
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@4702643dd0755a48365a115c970415fbb5023ad2:

		W201023 07:28:26.971918 124 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r12/1:‹/Table/1{6-7}›] handle raft ready: 4.8s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:27.574811 114 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 5.4s [applied=2, batches=2, state_assertions=0]
		W201023 07:28:27.673741 113 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 5.0s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:27.673775 127 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 2.9s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:27.673850 132 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 2.1s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:27.673795 121 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] handle raft ready: 2.9s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:30.132604 124 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r12/1:‹/Table/1{6-7}›] handle raft ready: 3.2s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:30.383995 127 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 2.7s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:30.384036 132 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 2.7s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:30.384132 121 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] handle raft ready: 2.7s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:30.533902 119 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 3.6s [applied=1, batches=1, state_assertions=0]
		I201023 07:28:30.574006 183 server/status/runtime.go:522 ⋮ [n1] runtime stats: 152 MiB RSS, 187 goroutines, 42 MiB/15 MiB/59 MiB GO alloc/idle/total, 17 MiB/18 MiB CGO alloc/total, 0.1 CGO/sec, 0.7/0.6 %(u/s)time, 0.0 %gc (0x), 3.7 KiB/9.1 KiB (r/w)net
		W201023 07:28:30.685193 139 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:31.236518 139 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:31.472395 185 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 4.500150062s; err=context deadline exceeded
		W201023 07:28:31.472491 185 kv/kvserver/node_liveness.go:717 ⋮ [n1,liveness-hb] failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 4.5s›
		(1) ‹operation "node liveness heartbeat" timed out after 4.5s›
		Wraps: (2) context deadline exceeded
		Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
		
		An inability to maintain liveness will prevent a node from participating in a
		cluster. If this problem persists, it may be a sign of resource starvation or
		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		W201023 07:28:31.838136 9426 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] slow heartbeat took 5.367807695s; err=<nil>
		W201023 07:28:32.139127 9385 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r12/1:‹/Table/1{6-7}›] slow heartbeat took 5.167107413s; err=<nil>
		W201023 07:28:32.490304 112 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:32.841227 9407 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] slow heartbeat took 5.267990515s; err=<nil>
		W201023 07:28:32.841355 138 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:32.841446 114 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r12/1:‹/Table/1{6-7}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:33.342946 112 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 0.9s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:33.643731 140 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:33.643710 9458 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] slow heartbeat took 5.969833161s; err=<nil>
		W201023 07:28:33.643722 114 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r12/1:‹/Table/1{6-7}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:33.643787 138 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:34.948060 112 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 1.6s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:35.348973 140 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 1.7s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:35.349101 124 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] handle raft ready: 1.7s [applied=1, batches=1, state_assertions=0]
		W201023 07:28:35.348994 9387 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] slow heartbeat took 7.67506424s; err=<nil>
		bash: line 1:  3900 Killed                  bash -c "timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store /mnt/data1/cockroach/faulty --log-dir /mnt/data1/cockroach/real/logs"
		Error: COMMAND_PROBLEM: exit status 137
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/real/logs
		  | ```
		Wraps: (3) exit status 137
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@8aceac3c99c3addece3a9ef9af04cc74715cdb37:

		W201025 07:08:11.675305 133 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 3.3s [applied=2, batches=1, state_assertions=0]
		W201025 07:08:11.975886 141 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 3.2s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:11.975886 119 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 3.2s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:11.975959 124 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r7/1:‹/Table/1{1-2}›] handle raft ready: 3.2s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:11.975952 220 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 5.24346477s; err=context deadline exceeded
		W201025 07:08:11.976345 220 kv/kvserver/node_liveness.go:717 ⋮ [n1,liveness-hb] failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 4.5s›
		(1) ‹operation "node liveness heartbeat" timed out after 4.5s›
		Wraps: (2) context deadline exceeded
		Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
		
		An inability to maintain liveness will prevent a node from participating in a
		cluster. If this problem persists, it may be a sign of resource starvation or
		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		W201025 07:08:12.326847 117 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:13.179190 120 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:14.131706 120 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 0.7s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:14.582834 121 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:14.883509 8335 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r24/1:‹/Table/2{8-9}›] slow heartbeat took 6.468194579s; err=<nil>
		W201025 07:08:14.883564 8336 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] slow heartbeat took 6.459470894s; err=<nil>
		W201025 07:08:14.883707 8317 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] slow heartbeat took 6.117128972s; err=<nil>
		W201025 07:08:14.883946 8284 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] slow heartbeat took 6.117244092s; err=<nil>
		W201025 07:08:14.884122 8342 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] slow heartbeat took 3.208609373s; err=<nil>
		W201025 07:08:15.384815 220 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 3.408375726s; err=<nil>
		W201025 07:08:15.384923 122 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:15.384966 130 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:15.385279 139 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:15.385451 138 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r24/1:‹/Table/2{8-9}›] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:15.385558 117 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 0.5s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:15.986696 139 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:15.986711 138 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r24/1:‹/Table/2{8-9}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:15.986693 130 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:15.986781 117 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:15.986802 122 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:15.986771 136 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r8/1:‹/Table/1{2-3}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:18.147947 136 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r8/1:‹/Table/1{2-3}›] handle raft ready: 2.2s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:18.148051 139 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 2.2s [applied=1, batches=1, state_assertions=0]
		W201025 07:08:18.148111 122 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 2.2s [applied=2, batches=1, state_assertions=0]
		W201025 07:08:18.149037 138 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r24/1:‹/Table/2{8-9}›] handle raft ready: 2.2s [applied=1, batches=1, state_assertions=0]
		bash: line 1:  3869 Killed                  bash -c "timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store /mnt/data1/cockroach/faulty --log-dir /mnt/data1/cockroach/real/logs"
		Error: COMMAND_PROBLEM: exit status 137
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/real/logs
		  | ```
		Wraps: (3) exit status 137
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@6184870a438ae34afbcf29dda5452345dc7587d3:

		W201026 07:01:44.718851 146 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 1.1s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:44.718593 157 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 1.1s [applied=1, batches=1, state_assertions=0]
		I201026 07:01:47.404677 193 server/status/runtime.go:522 ⋮ [n1] runtime stats: 152 MiB RSS, 191 goroutines, 40 MiB/16 MiB/58 MiB GO alloc/idle/total, 17 MiB/19 MiB CGO alloc/total, 0.1 CGO/sec, 1.0/0.6 %(u/s)time, 0.0 %gc (0x), 271 KiB/15 KiB (r/w)net
		W201026 07:01:49.635421 195 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 7.238848967s; err=<nil>
		W201026 07:01:49.635475 146 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 4.9s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:49.635478 157 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 4.9s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:49.635804 134 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r7/1:‹/Table/1{1-2}›] handle raft ready: 4.9s [applied=6, batches=1, state_assertions=0]
		W201026 07:01:50.438022 137 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 3.5s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:50.438098 127 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r24/1:‹/Table/2{8-9}›] handle raft ready: 2.0s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:50.638015 144 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 5.9s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:50.989878 134 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r7/1:‹/Table/1{1-2}›] handle raft ready: 1.4s [applied=2, batches=1, state_assertions=0]
		W201026 07:01:50.990016 133 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 1.4s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:51.442342 136 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 6.7s [applied=2, batches=2, state_assertions=0]
		W201026 07:01:51.641803 137 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 1.2s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:51.641816 144 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 1.0s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:51.641812 127 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r24/1:‹/Table/2{8-9}›] handle raft ready: 1.2s [applied=2, batches=1, state_assertions=0]
		W201026 07:01:54.136586 195 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 4.500448007s; err=‹aborted during DistSender.Send: context deadline exceeded›
		W201026 07:01:54.136706 195 kv/kvserver/node_liveness.go:717 ⋮ [n1,liveness-hb] failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 4.5s›
		(1) ‹operation "node liveness heartbeat" timed out after 4.5s›
		Wraps: (2) ‹aborted during DistSender.Send: context deadline exceeded›
		Error types: (1) *contextutil.TimeoutError (2) *roachpb.internalError
		
		An inability to maintain liveness will prevent a node from participating in a
		cluster. If this problem persists, it may be a sign of resource starvation or
		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		W201026 07:01:55.653252 133 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 4.7s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:55.653225 136 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 4.2s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:55.653380 145 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 4.7s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:55.653301 141 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r7/1:‹/Table/1{1-2}›] handle raft ready: 4.7s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:56.104321 153 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r24/1:‹/Table/2{8-9}›] handle raft ready: 4.5s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:56.104377 126 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 2.5s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:56.104468 156 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] handle raft ready: 4.5s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:56.405093 145 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:56.405078 133 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:56.405153 141 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r7/1:‹/Table/1{1-2}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:56.705932 156 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:56.706003 126 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		W201026 07:01:57.006944 145 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 0.6s [applied=1, batches=1, state_assertions=0]
		bash: line 1:  4402 Killed                  bash -c "timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store /mnt/data1/cockroach/faulty --log-dir /mnt/data1/cockroach/real/logs"
		Error: COMMAND_PROBLEM: exit status 137
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/real/logs
		  | ```
		Wraps: (3) exit status 137
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@5e3c201595fc33b0d120057c61413195716f811d:

		An inability to maintain liveness will prevent a node from participating in a
		cluster. If this problem persists, it may be a sign of resource starvation or
		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		W201027 07:28:19.734088 166 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 2.8s [applied=1, batches=1, state_assertions=0]
		W201027 07:28:19.884908 161 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 2.6s [applied=1, batches=1, state_assertions=0]
		W201027 07:28:20.639150 9304 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r31/1:‹/Table/3{5-6}›] slow heartbeat took 10.570016061s; err=<nil>
		W201027 07:28:20.639239 9306 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r26/1:‹/NamespaceTable/{30-Max}›] slow heartbeat took 10.502297021s; err=<nil>
		W201027 07:28:20.639643 9324 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] slow heartbeat took 8.274475479s; err=<nil>
		W201027 07:28:20.639876 9286 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] slow heartbeat took 7.333387782s; err=<nil>
		W201027 07:28:20.941501 9288 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] slow heartbeat took 7.273427014s; err=<nil>
		W201027 07:28:21.293569 9350 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] slow heartbeat took 7.42403972s; err=<nil>
		I201027 07:28:21.293674 182 kv/kvserver/replica_rangefeed.go:610 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] RangeFeed closed timestamp 1603783685.699849292,0 is behind by 15.593823768s
		I201027 07:28:21.966415 196 server/status/runtime.go:522 ⋮ [n1] runtime stats: 151 MiB RSS, 193 goroutines, 29 MiB/25 MiB/58 MiB GO alloc/idle/total, 17 MiB/19 MiB CGO alloc/total, 0.1 CGO/sec, 1.1/0.2 %(u/s)time, 0.0 %gc (0x), 15 KiB/16 KiB (r/w)net
		W201027 07:28:22.096924 198 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 4.128829868s; err=<nil>
		W201027 07:28:22.097076 146 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201027 07:28:22.097076 152 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201027 07:28:22.097160 163 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r7/1:‹/Table/1{1-2}›] handle raft ready: 0.8s [applied=7, batches=1, state_assertions=0]
		W201027 07:28:22.097385 167 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 0.8s [applied=1, batches=1, state_assertions=0]
		W201027 07:28:28.011566 198 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 5.914559011s; err=context deadline exceeded
		W201027 07:28:28.012031 198 kv/kvserver/node_liveness.go:717 ⋮ [n1,liveness-hb] failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 4.5s›
		(1) ‹operation "node liveness heartbeat" timed out after 4.5s›
		Wraps: (2) context deadline exceeded
		Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
		
		An inability to maintain liveness will prevent a node from participating in a
		cluster. If this problem persists, it may be a sign of resource starvation or
		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		W201027 07:28:28.011630 146 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 5.9s [applied=2, batches=2, state_assertions=0]
		W201027 07:28:28.011654 163 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r7/1:‹/Table/1{1-2}›] handle raft ready: 5.9s [applied=2, batches=1, state_assertions=0]
		W201027 07:28:28.612812 160 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 2.7s [applied=1, batches=1, state_assertions=0]
		W201027 07:28:28.612821 165 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] handle raft ready: 6.5s [applied=1, batches=1, state_assertions=0]
		W201027 07:28:28.612915 142 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 6.5s [applied=1, batches=1, state_assertions=0]
		W201027 07:28:28.915571 167 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] handle raft ready: 6.8s [applied=2, batches=2, state_assertions=0]
		W201027 07:28:29.214410 163 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r7/1:‹/Table/1{1-2}›] handle raft ready: 1.2s [applied=4, batches=2, state_assertions=0]
		W201027 07:28:29.214441 146 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 1.2s [applied=1, batches=1, state_assertions=0]
		bash: line 1:  4248 Killed                  bash -c "timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store /mnt/data1/cockroach/faulty --log-dir /mnt/data1/cockroach/real/logs"
		Error: COMMAND_PROBLEM: exit status 137
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/real/logs
		  | ```
		Wraps: (3) exit status 137
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=true,data=false failed on master@5e3c201595fc33b0d120057c61413195716f811d:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=true,data=false
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=false,data=true failed on master@70fb9a54d6c4d7b3c1d8ad93bb399eefef8f5d04:

		W201028 07:01:28.402624 9444 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] slow heartbeat took 7.569182081s; err=context canceled
		E201028 07:01:28.402803 9444 kv/kvserver/replica_range_lease.go:340 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] context canceled
		W201028 07:01:28.831253 205 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 4.500127398s; err=context deadline exceeded
		W201028 07:01:28.831350 205 kv/kvserver/node_liveness.go:717 ⋮ [n1,liveness-hb] failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 4.5s›
		(1) ‹operation "node liveness heartbeat" timed out after 4.5s›
		Wraps: (2) context deadline exceeded
		Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
		
		An inability to maintain liveness will prevent a node from participating in a
		cluster. If this problem persists, it may be a sign of resource starvation or
		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		W201028 07:01:28.853018 159 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 1.4s [applied=1, batches=1, state_assertions=0]
		W201028 07:01:29.304353 150 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 0.9s [applied=1, batches=1, state_assertions=0]
		W201028 07:01:30.156511 150 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 0.9s [applied=1, batches=1, state_assertions=0]
		W201028 07:01:31.409942 146 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 1.3s [applied=1, batches=1, state_assertions=0]
		W201028 07:01:31.810788 150 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 1.7s [applied=1, batches=1, state_assertions=0]
		I201028 07:01:31.811056 9541 kv/txn.go:750 ⋮ [n1] async rollback failed: ‹result is ambiguous (context deadline exceeded)›: "unnamed" meta={id=51984e35 pri=0.01185038 epo=0 ts=1603868486.998310483,0 min=1603868486.998310483,0 seq=0} lock=true stat=PENDING rts=1603868486.998310483,0 wto=false max=1603868487.498310483,0
		W201028 07:01:33.331598 205 kv/kvserver/node_liveness.go:815 ⋮ [n1,liveness-hb] slow heartbeat took 4.500172419s; err=context deadline exceeded
		W201028 07:01:33.331732 205 kv/kvserver/node_liveness.go:717 ⋮ [n1,liveness-hb] failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 4.5s›
		(1) ‹operation "node liveness heartbeat" timed out after 4.5s›
		Wraps: (2) context deadline exceeded
		Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
		
		An inability to maintain liveness will prevent a node from participating in a
		cluster. If this problem persists, it may be a sign of resource starvation or
		of network connectivity problems. For help troubleshooting, visit:
		
		    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
		
		W201028 07:01:33.916307 146 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r3/1:‹/System/{NodeLive…-tsd}›] handle raft ready: 2.5s [applied=1, batches=1, state_assertions=0]
		W201028 07:01:34.367457 155 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r35/1:‹/{Table/39-Max}›] handle raft ready: 2.3s [applied=1, batches=1, state_assertions=0]
		W201028 07:01:34.367495 140 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r2/1:‹/System/NodeLiveness{-Max}›] handle raft ready: 2.6s [applied=1, batches=1, state_assertions=0]
		W201028 07:01:34.367593 9463 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r31/1:‹/Table/3{5-6}›] slow heartbeat took 13.532243346s; err=heartbeat failed on epoch increment
		W201028 07:01:34.367451 152 kv/kvserver/store_raft.go:492 ⋮ [n1,s1,r1/1:‹/{Min-System/NodeL…}›] handle raft ready: 1.2s [applied=1, batches=1, state_assertions=0]
		E201028 07:01:34.367667 9463 kv/kvserver/replica_range_lease.go:340 ⋮ [n1,s1,r31/1:‹/Table/3{5-6}›] heartbeat failed on epoch increment
		W201028 07:01:34.367714 9387 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r24/1:‹/Table/2{8-9}›] slow heartbeat took 13.531193842s; err=<nil>
		W201028 07:01:34.368023 9465 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r4/1:‹/System{/tsd-tse}›] slow heartbeat took 13.485603854s; err=<nil>
		W201028 07:01:34.368171 9441 kv/kvserver/node_liveness.go:815 ⋮ [n1,s1,r6/1:‹/Table/{SystemCon…-11}›] slow heartbeat took 13.133858838s; err=<nil>
		bash: line 1:  4528 Killed                  bash -c "timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store /mnt/data1/cockroach/faulty --log-dir /mnt/data1/cockroach/real/logs"
		Error: COMMAND_PROBLEM: exit status 137
		(1) COMMAND_PROBLEM
		Wraps: (2) Node 1. Command with error:
		  | ```
		  | timeout --signal 9 600s env COCKROACH_ENGINE_MAX_SYNC_DURATION_FATAL=true COCKROACH_ENGINE_MAX_SYNC_DURATION=40ms COCKROACH_LOG_MAX_SYNC_DURATION=1h0m0s ./cockroach start-single-node --insecure --logtostderr=INFO --store {store-dir}/faulty --log-dir {store-dir}/real/logs
		  | ```
		Wraps: (3) exit status 137
		Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError

More

Artifacts: /disk-stalled/log=false,data=true
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).disk-stalled/log=true,data=false failed on master@70fb9a54d6c4d7b3c1d8ad93bb399eefef8f5d04:

		  |   | ```
		  |   | set -exuo pipefail;
		  |   |   thrift_dir="/opt/thrift"
		  |   |
		  |   |   if [ ! -f "/usr/bin/thrift" ]; then
		  |   | 	sudo apt-get update;
		  |   | 	sudo apt-get install -qy automake bison flex g++ git libboost-all-dev libevent-dev libssl-dev libtool make pkg-config python-setuptools libglib2.0-dev
		  |   |
		  |   |     sudo mkdir -p "${thrift_dir}"
		  |   |     sudo chmod 777 "${thrift_dir}"
		  |   |     cd "${thrift_dir}"
		  |   |     curl "https://downloads.apache.org/thrift/0.13.0/thrift-0.13.0.tar.gz" | sudo tar xvz --strip-components 1
		  |   |     sudo ./configure --prefix=/usr
		  |   |     sudo make -j$(nproc)
		  |   |     sudo make install
		  |   |     (cd "${thrift_dir}/lib/py" && sudo python setup.py install)
		  |   |   fi
		  |   |
		  |   |   charybde_dir="/opt/charybdefs"
		  |   |   nemesis_path="${charybde_dir}/charybdefs-nemesis"
		  |   |
		  |   |   if [ ! -f "${nemesis_path}" ]; then
		  |   |     sudo apt-get install -qy build-essential cmake libfuse-dev fuse
		  |   |     sudo rm -rf "${charybde_dir}" "${nemesis_path}" /usr/local/bin/charybdefs{,-nemesis}
		  |   |     sudo mkdir -p "${charybde_dir}"
		  |   |     sudo chmod 777 "${charybde_dir}"
		  |   |     # TODO(bilal): Change URL back to scylladb/charybdefs once https://github.com/scylladb/charybdefs/pull/21 is merged.
		  |   |     git clone --depth 1 "https://github.com/itsbilal/charybdefs.git" "${charybde_dir}"
		  |   |
		  |   |     cd "${charybde_dir}"
		  |   |     thrift -r --gen cpp server.thrift
		  |   |     cmake CMakeLists.txt
		  |   |     make -j$(nproc)
		  |   |
		  |   |     sudo modprobe fuse
		  |   |     sudo ln -s "${charybde_dir}/charybdefs" /usr/local/bin/charybdefs
		  |   |     cat > "${nemesis_path}" <<EOF
		  |   | #!/bin/bash
		  |   | cd /opt/charybdefs/cookbook
		  |   | ./recipes "\$@"
		  |   | EOF
		  |   |     chmod +x "${nemesis_path}"
		  |   | 	sudo ln -s "${nemesis_path}" /usr/local/bin/charybdefs-nemesis
		  |   | fi
		  |   |
		  |   | ```
		  | Wraps: (3) exit status 2
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		Wraps: (2) exit status 20
		Error types: (1) *main.withCommandDetails (2) *exec.ExitError

More

Artifacts: /disk-stalled/log=true,data=false
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

itsbilal added a commit to itsbilal/cockroach that referenced this issue Oct 28, 2020
The disk-stalled roachtest relies on the ability to control disk stall
detection / fatal intervals, as charybdefs only injects 50ms of
delay per syscall. This change adds an env variable, similar to the
one removed in cockroachdb#55186 to set max sync duration, except now it governs
the default of the cluster setting. The roachtest now modifies that
env variable to let disk stall detection trip on short syscall
delays.

Fixes cockroachdb#54332.

Release note: None.
craig bot pushed a commit that referenced this issue Oct 28, 2020
56064: storage: Bring back MaxSyncDuration env var, fix disk-stalled roachtest r=itsbilal a=itsbilal

The disk-stalled roachtest relies on the ability to control disk stall
detection / fatal intervals, as charybdefs only injects 50ms of
delay per syscall. This change adds an env variable, similar to the
one removed in #55186 to set max sync duration, except now it governs
the default of the cluster setting. The roachtest now modifies that
env variable to let disk stall detection trip on short syscall
delays.

Fixes #54332.

Release note: None.

56092: parser: pin pkg/security as a bazel dep for pkg/parser r=irfansharif a=irfansharif

We picked up a dependency on pkg/security within the parser package
after #55398.

We have to pin go dependencies by hand if they're only present in
auto-generated code. It's because it's not otherwise visible to
bazel/gazelle when generating the BUILD files (during the analysis
phase).

Release note: None

56095: partialidx: add benchmarks for two-variable comparisons r=RaduBerinde a=mgartner

Two-variable comparison implication performs similarly to other types of
implications.

    BenchmarkImplicator/single-exact-match-16                         76.5 ns/op
    BenchmarkImplicator/single-inexact-match-16                      342 ns/op
    BenchmarkImplicator/range-inexact-match-16                       782 ns/op
    BenchmarkImplicator/two-var-comparison-16                        302 ns/op
    BenchmarkImplicator/single-exact-match-extra-filters-16          310 ns/op
    BenchmarkImplicator/single-inexact-match-extra-filters-16        609 ns/op
    BenchmarkImplicator/multi-column-and-exact-match-16               82.4 ns/op
    BenchmarkImplicator/multi-column-and-inexact-match-16            722 ns/op
    BenchmarkImplicator/multi-column-and-two-var-comparisons-16      611 ns/op
    BenchmarkImplicator/multi-column-or-exact-match-16                76.1 ns/op
    BenchmarkImplicator/multi-column-or-exact-match-reverse-16       595 ns/op
    BenchmarkImplicator/multi-column-or-inexact-match-16            1081 ns/op
    BenchmarkImplicator/in-implies-or-16                             976 ns/op
    BenchmarkImplicator/and-filters-do-not-imply-pred-16            3710 ns/op
    BenchmarkImplicator/or-filters-do-not-imply-pred-16              917 ns/op
    BenchmarkImplicator/many-columns-exact-match10-16                296 ns/op
    BenchmarkImplicator/many-columns-inexact-match10-16             6853 ns/op
    BenchmarkImplicator/many-columns-exact-match100-16             19817 ns/op
    BenchmarkImplicator/many-columns-inexact-match100-16          447894 ns/op

Release note: None

Co-authored-by: Bilal Akhtar <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
@craig craig bot closed this as completed in 79f1751 Oct 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants