Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENA keeps restarting #134

Closed
arainero opened this issue Jul 14, 2020 · 17 comments
Closed

ENA keeps restarting #134

arainero opened this issue Jul 14, 2020 · 17 comments

Comments

@arainero
Copy link

arainero commented Jul 14, 2020

Hello,

I have a T3.a instance that is experiencing dropped RX packets due to ENA resetting. I don't know what's causing ENA to reset constantly and I was hoping you could shed some light on the matter. I have multiple servers based off of the same AMI so I don't know why this one is having these issues. The server is heavy on UDP traffic compared to TCP if that helps. The instance ID is "i-0c068f87c4161e736".

When ENA resets the following logs are generated in /var/log/messages and jounralctl

Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 918.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 919.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 920.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 921.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 922.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 923.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 924.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 925.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 926.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 927.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 928.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 929.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 930.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 931.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 932.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 933.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 934.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 935.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 936.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 937.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 938.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 939.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 940.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 941.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 942.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 943.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 944.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 945.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 946.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 947.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 948.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 949.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 950.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 951.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 952.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 953.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 954.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 955.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 956.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 957.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 958.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 959.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 960.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 961.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 962.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 963.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 964.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 965.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 966.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 967.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 968.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 969.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 970.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 971.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 972.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 973.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 974.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 975.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 976.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 977.
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: Keep alive watchdog timeout.
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: Trigger reset is on
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: tx_timeout: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: suspend: 0
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 944.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 945.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 946.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 947.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 948.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 949.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 950.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 951.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 952.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 953.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 954.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 955.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 956.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 957.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 958.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 959.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 960.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 961.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 962.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 963.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 964.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 965.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 966.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 967.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 968.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 969.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 970.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 971.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 972.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 973.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 974.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 975.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 976.
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 977.
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: Keep alive watchdog timeout.
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: Trigger reset is on
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: tx_timeout: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: suspend: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: resume: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: wd_expired: 1
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: interface_up: 1
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: interface_down: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: admin_q_pause: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_cnt: 2112023
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_bytes: 533808489
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_queue_stop: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_queue_wakeup: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_dma_mapping_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_linearize: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_linearize_failed: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_napi_comp: 4403324
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_tx_poll: 4403326
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_doorbells: 2089811
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_prepare_ctx_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_bad_req_id: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_missed_tx: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_cnt: 2371476
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bytes: 512952189
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_refil_partial: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_csum: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_page_alloc_fail: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_skb_alloc_fail: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_dma_mapping_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_desc_num: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_rx_copybreak_pkt: 142811
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_req_id: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_empty_rx_ring: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_cnt: 1861158
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_bytes: 1698200576
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_queue_stop: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_queue_wakeup: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_dma_mapping_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_linearize: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_linearize_failed: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_napi_comp: 3672075
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_tx_poll: 3672121
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_doorbells: 1743809
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_prepare_ctx_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_bad_req_id: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_missed_tx: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_cnt: 1913582
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bytes: 412499776
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_refil_partial: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_csum: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_page_alloc_fail: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_skb_alloc_fail: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_dma_mapping_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_desc_num: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_rx_copybreak_pkt: 53095
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_req_id: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_empty_rx_ring: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_cnt: 2405368
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_bytes: 547235419
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_queue_stop: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_queue_wakeup: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_dma_mapping_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_linearize: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_linearize_failed: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_napi_comp: 4279065
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_tx_poll: 4279175
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_doorbells: 2382262
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_prepare_ctx_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_bad_req_id: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_missed_tx: 60
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_cnt: 1953528
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bytes: 430232614
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_refil_partial: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_csum: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_page_alloc_fail: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_skb_alloc_fail: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_dma_mapping_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_desc_num: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_rx_copybreak_pkt: 72039
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_req_id: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_req_id: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_empty_rx_ring: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_cnt: 2696836
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_bytes: 731741920
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_queue_stop: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_queue_wakeup: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_dma_mapping_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_linearize: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_linearize_failed: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_napi_comp: 5190044
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_tx_poll: 5190044
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_doorbells: 2682972
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_prepare_ctx_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_bad_req_id: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_missed_tx: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_cnt: 2556581
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bytes: 701259477
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_refil_partial: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_csum: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_page_alloc_fail: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_skb_alloc_fail: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_dma_mapping_err: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_desc_num: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_rx_copybreak_pkt: 218516
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_req_id: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_empty_rx_ring: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_aborted_cmd: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_submitted_cmd: 28
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_completed_cmd: 28
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_out_of_space: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_no_completion: 0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 0 idx 0x217
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 1 idx 0x226
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 2 idx 0x0
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 3 idx 0x284
Jul 14 10:10:22 myserver.hostname kernel: ena: ena device version: 0.10
Jul 14 10:10:22 myserver.hostname kernel: ena: ena controller version: 0.0.1 implementation version 1
Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0: irq 24 for MSI/MSI-X
Jul 14 10:10:22 drb kernel: ena 0000:00:05.0: Device reset completed successfully

modinfo gives the following:

filename:       /lib/modules/3.10.0-957.21.3.el7.x86_64/kernel/drivers/net/ethernet/amazon/ena/ena.ko.xz
version:        1.5.0K
license:        GPL
description:    Elastic Network Adapter (ENA)
author:         Amazon.com, Inc. or its affiliates
retpoline:      Y
rhelversion:    7.6
srcversion:     1B9931F07C26733BA8D4F94
alias:          pci:v00001D0Fd0000EC21sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd0000EC20sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00001EC2sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00000EC2sv*sd*bc*sc*i*
depends:
intree:         Y
vermagic:       3.10.0-957.21.3.el7.x86_64 SMP mod_unload modversions
signer:         CentOS Linux kernel signing key
sig_key:        1E:5F:1D:87:70:4B:F3:38:01:2F:A2:B0:FE:16:94:59:97:B3:31:27
sig_hashalgo:   sha256
parm:           debug:Debug level (0=none,...,16=all) (int)
@nafeabshara
Copy link

nafeabshara commented Jul 14, 2020 via email

@arainero
Copy link
Author

arainero commented Jul 14, 2020

This happened again with a little more information that I noticed. Something that stands out to me is "The number of lost tx completions is above the threshold (248 > 128). Reset the device"

According to https://nxmnpg.lemoda.net/4/ena

"Packet was pushed to the NIC but not sent within given time limit; it may be caused by hang of the IO queue."

I want to investigate the IO queue mentioned here, do you have any advice for that or what to look at / look for?

Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 327.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 328.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 329.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 330.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 331.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 332.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 333.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 334.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 335.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 336.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 337.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 338.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 339.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 340.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 341.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 342.
Jul 14 14:17:34 drb.arbeitvoice.com kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 343.
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: The number of lost tx completions is above the threshold (248 > 128). Reset the device
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: Trigger reset is on
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: tx_timeout: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: suspend: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: resume: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: wd_expired: 1
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: interface_up: 3
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: interface_down: 2
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: admin_q_pause: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_cnt: 4137606
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_bytes: 1014469330
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_queue_stop: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_queue_wakeup: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_dma_mapping_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_linearize: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_linearize_failed: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_napi_comp: 8802215
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_tx_poll: 8802229
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_doorbells: 4113070
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_prepare_ctx_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_bad_req_id: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_missed_tx: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_cnt: 4783684
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bytes: 1023117478
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_refil_partial: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_csum: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_page_alloc_fail: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_skb_alloc_fail: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_dma_mapping_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_desc_num: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_rx_copybreak_pkt: 182682
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_req_id: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_empty_rx_ring: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_cnt: 5194918
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_bytes: 4147099565
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_queue_stop: 1
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_queue_wakeup: 1
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_dma_mapping_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_linearize: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_linearize_failed: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_napi_comp: 8925222
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_tx_poll: 8925546
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_doorbells: 4938995
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_prepare_ctx_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_bad_req_id: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_missed_tx: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_cnt: 3985053
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bytes: 852301798
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_refil_partial: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_csum: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_page_alloc_fail: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_skb_alloc_fail: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_dma_mapping_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_desc_num: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_rx_copybreak_pkt: 118590
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_req_id: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_empty_rx_ring: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_cnt: 4693359
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_bytes: 1134350239
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_queue_stop: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_queue_wakeup: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_dma_mapping_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_linearize: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_linearize_failed: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_napi_comp: 8894540
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_tx_poll: 8894686
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_doorbells: 4656434
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_prepare_ctx_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_bad_req_id: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_missed_tx: 248
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_cnt: 4374717
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bytes: 929391616
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_refil_partial: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_csum: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_page_alloc_fail: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_skb_alloc_fail: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_dma_mapping_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_desc_num: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_rx_copybreak_pkt: 189996
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_req_id: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_empty_rx_ring: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_cnt: 4377595
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_bytes: 1100045461
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_queue_stop: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_queue_wakeup: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_dma_mapping_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_linearize: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_linearize_failed: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_napi_comp: 8944166
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_tx_poll: 8944180
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_doorbells: 4348263
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_prepare_ctx_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_bad_req_id: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_missed_tx: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_cnt: 4678466
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bytes: 1007127711
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_refil_partial: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_csum: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_page_alloc_fail: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_skb_alloc_fail: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_dma_mapping_err: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_desc_num: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_rx_copybreak_pkt: 118426
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_req_id: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_empty_rx_ring: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_aborted_cmd: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_submitted_cmd: 78
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_completed_cmd: 78
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_out_of_space: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_no_completion: 0
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 2 idx 0x60
Jul 14 14:17:34 myserver.hostname kernel: ena: ena device version: 0.10
Jul 14 14:17:34 myserver.hostname kernel: ena: ena controller version: 0.0.1 implementation version 1
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0: irq 24 for MSI/MSI-X
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0: irq 25 for MSI/MSI-X
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0: irq 26 for MSI/MSI-X
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0: irq 27 for MSI/MSI-X
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0: irq 28 for MSI/MSI-X
Jul 14 14:17:34 myserver.hostname kernel: ena 0000:00:05.0: Device reset completed successfully
Jul 14 14:17:40 myserver.hostname postfix/pickup[7639]: 8EFAA40CB90D: uid=990 from=<netdata>
Jul 14 14:17:40 myserver.hostname postfix/cleanup[28287]: 8EFAA40CB90D: message-id=<[email protected]>
Jul 14 14:17:40 myserver.hostname postfix/qmgr[2091]: 8EFAA40CB90D: from=<[email protected]>, size=11531, nrcpt=1 (queue active)
Jul 14 14:17:40 myserver.hostname postfix/local[28290]: 8EFAA40CB90D: to=<[email protected]>, orig_to=<root>, relay=local, delay=0.01, delays=0/0/0/0, dsn=2.0.0, status=sent (delivered to mailbox)
Jul 14 14:17:40 myserver.hostname postfix/qmgr[2091]: 8EFAA40CB90D: removed
Jul 14 14:17:50 myserver.hostname kernel: ------------[ cut here ]------------
Jul 14 14:17:50 myserver.hostname kernel: WARNING: CPU: 0 PID: 9896 at net/sched/sch_generic.c:356 dev_watchdog+0x248/0x260
Jul 14 14:17:50 myserver.hostname kernel: NETDEV WATCHDOG: eth0 (ena): transmit queue 3 timed out
Jul 14 14:17:50 myserver.hostname kernel: Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_multiport xt_conntrack nf_conntrack ip6table_filter ip6_tables iptable_filter ip_tables binfmt_misc bluetooth rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs
Jul 14 14:17:50 myserver.hostname kernel: CPU: 0 PID: 9896 Comm: asterisk Kdump: loaded Tainted: G               ------------ T 3.10.0-957.21.3.el7.x86_64 #1
Jul 14 14:17:50 myserver.hostname kernel: Hardware name: Amazon EC2 t3a.xlarge/, BIOS 1.0 10/16/2017
Jul 14 14:17:50 myserver.hostname kernel: Call Trace:
Jul 14 14:17:50 myserver.hostname kernel:  <IRQ>  [<ffffffff9bf63107>] dump_stack+0x19/0x1b
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b897768>] __warn+0xd8/0x100
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b8977ef>] warn_slowpath_fmt+0x5f/0x80
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9be66c38>] dev_watchdog+0x248/0x260
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9be669f0>] ? dev_deactivate_queue.constprop.26+0x60/0x60
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b8a80c8>] call_timer_fn+0x38/0x110
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9be669f0>] ? dev_deactivate_queue.constprop.26+0x60/0x60
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b8aa52d>] run_timer_softirq+0x24d/0x300
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b8a1075>] __do_softirq+0xf5/0x280
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9bf7932c>] call_softirq+0x1c/0x30
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b82e675>] do_softirq+0x65/0xa0
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b8a13f5>] irq_exit+0x105/0x110
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9bf7a6e8>] smp_apic_timer_interrupt+0x48/0x60
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9bf76df2>] apic_timer_interrupt+0x162/0x170
Jul 14 14:17:50 myserver.hostname kernel:  <EOI>  [<ffffffff9b912142>] ? __pv_queued_spin_lock_slowpath+0xf2/0x2e0
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b9122ee>] ? __pv_queued_spin_lock_slowpath+0x29e/0x2e0
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9bf5d28b>] queued_spin_lock_slowpath+0xb/0xf
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9bf6b760>] _raw_spin_lock+0x20/0x30
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b9e95b0>] handle_pte_fault+0x160/0xd10
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b8c5950>] ? hrtimer_get_res+0x50/0x50
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9b9ec27d>] handle_mm_fault+0x39d/0x9b0
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9bf70603>] __do_page_fault+0x203/0x4f0
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9bf709d6>] trace_do_page_fault+0x56/0x150
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9bf6ff62>] do_async_page_fault+0x22/0xf0
Jul 14 14:17:50 myserver.hostname kernel:  [<ffffffff9bf6c798>] async_page_fault+0x28/0x30
Jul 14 14:17:50 myserver.hostname kernel: ---[ end trace 33bb31ed0dc8b342 ]---
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: Transmit time out
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: Trigger reset is on
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: tx_timeout: 1
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: suspend: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: resume: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: wd_expired: 1
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: interface_up: 4
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: interface_down: 3
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: admin_q_pause: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_cnt: 4333
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_bytes: 1060807
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_queue_stop: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_queue_wakeup: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_dma_mapping_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_linearize: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_linearize_failed: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_napi_comp: 6968
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_tx_poll: 6971
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_doorbells: 4103
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_prepare_ctx_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_bad_req_id: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_missed_tx: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_cnt: 3265
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bytes: 693233
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_refil_partial: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_csum: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_page_alloc_fail: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_skb_alloc_fail: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_dma_mapping_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_desc_num: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_rx_copybreak_pkt: 195
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_req_id: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_empty_rx_ring: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_cnt: 5849
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_bytes: 4675602
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_queue_stop: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_queue_wakeup: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_dma_mapping_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_linearize: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_linearize_failed: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_napi_comp: 10619
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_tx_poll: 10623
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_doorbells: 4774
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_prepare_ctx_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_bad_req_id: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_missed_tx: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_cnt: 6127
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bytes: 1321486
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_refil_partial: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_csum: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_page_alloc_fail: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_skb_alloc_fail: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_dma_mapping_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_desc_num: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_rx_copybreak_pkt: 148
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_req_id: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_empty_rx_ring: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_cnt: 5435
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_bytes: 1210766
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_queue_stop: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_queue_wakeup: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_dma_mapping_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_linearize: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_linearize_failed: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_napi_comp: 14605
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_tx_poll: 14615
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_doorbells: 5080
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_prepare_ctx_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_bad_req_id: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_missed_tx: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_cnt: 11158
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bytes: 2387063
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_refil_partial: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_csum: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_page_alloc_fail: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_skb_alloc_fail: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_dma_mapping_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_desc_num: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_rx_copybreak_pkt: 181
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_req_id: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_empty_rx_ring: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_cnt: 5420
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_bytes: 1335484
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_queue_stop: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_queue_wakeup: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_dma_mapping_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_linearize: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_linearize_failed: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_napi_comp: 8833
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_tx_poll: 8834
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_doorbells: 5050
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_prepare_ctx_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_bad_req_id: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_missed_tx: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_cnt: 4056
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bytes: 890160
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_refil_partial: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_csum: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_page_alloc_fail: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_skb_alloc_fail: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_dma_mapping_err: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_desc_num: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_rx_copybreak_pkt: 142
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_req_id: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_empty_rx_ring: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_aborted_cmd: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_submitted_cmd: 103
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_completed_cmd: 103
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_out_of_space: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_no_completion: 0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 0 idx 0xed
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 1 idx 0x0
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 2 idx 0x13b
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 3 idx 0x123
Jul 14 14:17:50 myserver.hostname kernel: ena: ena device version: 0.10
Jul 14 14:17:50 myserver.hostname kernel: ena: ena controller version: 0.0.1 implementation version 1
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0: irq 24 for MSI/MSI-X
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0: irq 25 for MSI/MSI-X
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0: irq 26 for MSI/MSI-X
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0: irq 27 for MSI/MSI-X
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0: irq 28 for MSI/MSI-X
Jul 14 14:17:50 myserver.hostname kernel: ena 0000:00:05.0: Device reset completed successfully
Jul 14 14:17:50 myserver.hostname postfix/pickup[7639]: 71BE440CB90D: uid=990 from=<netdata>
Jul 14 14:17:50 myserver.hostname postfix/cleanup[28287]: 71BE440CB90D: message-id=<[email protected]>
Jul 14 14:17:50 myserver.hostname postfix/qmgr[2091]: 71BE440CB90D: from=<[email protected]>, size=11765, nrcpt=1 (queue active)
Jul 14 14:17:50 myserver.hostname postfix/local[28290]: 71BE440CB90D: to=<[email protected]>, orig_to=<root>, relay=local, delay=0.01, delays=0/0/0/0, dsn=2.0.0, status=sent (delivered to mailbox)
Jul 14 14:17:50 myserver.hostname postfix/qmgr[2091]: 71BE440CB90D: removed

@zorikm
Copy link
Contributor

zorikm commented Jul 14, 2020

Thanks @arainero for additional info.
We suspect your instance gets overloaded with processing and CPUs just don't get enough cycles to process network traffic.
Both logs indicate that TX packet completions and other events from the device weren't processed timely.
Do you see any dmesg messages that point to CPU stalls or lockups? What CPU utilization do you observe?

@zorikm
Copy link
Contributor

zorikm commented Jul 14, 2020

Also we strongly recommend to update the driver to the latest version

@arainero
Copy link
Author

@zorikm I attached the dmesg output. I don't think it's CPU load related since it doesn't really spike past 50% until ENA gets reset. Then there is a large spike playing catchup.

As for updating the driver, what's the best way to do that? I haven't done that before.

dmesg.txt

@zorikm
Copy link
Contributor

zorikm commented Jul 14, 2020

arainero@, could you please reach out directly to me [email protected], and we'll guide you.
Thanks

@druchoo
Copy link

druchoo commented Jul 19, 2020

Hi @zorikm, seeing similar errors as well as others with v2.2.9 of driver. Also on t3a instance but have seen on other instance types as well.

$ modinfo ena
filename:       /lib/modules/3.10.0-1127.13.1.el7.x86_64/extra/ena.ko.xz
version:        2.2.9g
license:        GPL
description:    Elastic Network Adapter (ENA)
author:         Amazon.com, Inc. or its affiliates
retpoline:      Y
rhelversion:    7.8
srcversion:     27F5567B9755BE00C8A08B5
alias:          pci:v00001D0Fd0000EC21sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd0000EC20sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00001EC2sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00000EC2sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00000051sv*sd*bc*sc*i*
depends:
vermagic:       3.10.0-1127.13.1.el7.x86_64 SMP mod_unload modversions
parm:           debug:Debug level (0=none,...,16=all) (int)
parm:           rx_queue_size:Rx queue size. The size should be a power of 2. Max value is 8K
 (int)
parm:           force_large_llq_header:Increases maximum supported header size in LLQ mode to 224 bytes, while reducing the maximum TX queue size by half.
 (int)
parm:           num_io_queues:Sets number of RX/TX queues to allocate to device. The maximum value depends on the device and number of online CPUs.

dmesg.txt
ena_errors.txt

Upgrading driver and setting vm.min_free_kbytes to 128MB (2x default) initially seemed to correct issues.

$ sysctl vm.min_free_kbytes
vm.min_free_kbytes = 135168

However, after testing new application that added additional network load, the errors are back. All core cpu avg is ~10% and network traffic is ~10/~15Mbps respectively for rx/tx.

@AWSNB
Copy link
Contributor

AWSNB commented Jul 19, 2020 via email

@I-gor-C
Copy link
Contributor

I-gor-C commented Jul 26, 2020

@arainero The triaging has finished and fix has been implemented, please see if the issues are resolved

@tuapuikia
Copy link

Is this issue only happen on AMD Zen based instance type?

@I-gor-C
Copy link
Contributor

I-gor-C commented Jul 31, 2020

@arainero Can you please indicate if the issue was resolved?

@arainero
Copy link
Author

arainero commented Aug 4, 2020

@arainero Can you please indicate if the issue was resolved?

Unfortunately, we had to migrate the problematic server off of AWS before a fix was applied due to the issues happening. I don't have a reliable way to test this now.

@ubarar
Copy link

ubarar commented Aug 18, 2020

@I-gor-C we switched our machines back to machines with the new ENA drivers, and haven't had this issue over the last week. I think this issue is definitely resolved.

Thanks!

@AWSNB
Copy link
Contributor

AWSNB commented Aug 18, 2020

@ubarar thanks for confirming, we'll go ahead and close the issue

@I-gor-C I-gor-C closed this as completed Aug 18, 2020
@wcurry
Copy link

wcurry commented Nov 9, 2020

I'm having a similar issue: coreos/fedora-coreos-tracker#665

The issue exists in all Fedora coreos versions between 31.20200323.2.0 and the latest FCOS 32.

31.20200323.2.0

filename:       /lib/modules/5.5.10-200.fc31.x86_64/kernel/drivers/net/ethernet/amazon/ena/ena.ko.xz
version:        2.1.0K
license:        GPL
description:    Elastic Network Adapter (ENA)
author:         Amazon.com, Inc. or its affiliates
srcversion:     DAAE6CFC0FC2113B5776480
alias:          pci:v00001D0Fd0000EC21sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd0000EC20sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00001EC2sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00000EC2sv*sd*bc*sc*i*
depends:
retpoline:      Y
intree:         Y
name:           ena
vermagic:       5.5.10-200.fc31.x86_64 SMP mod_unload
sig_id:         PKCS#7
signer:         Fedora kernel signing key
sig_key:        67:90:9D:B2:92:99:F6:87:CC:07:EF:39:B6:7A:EC:9D:E7:E2:A2:60
sig_hashalgo:   sha256
signature:      7D:97:AB:FB:9C:FD:7B:70:E9:C9:3F:39:3B:9A:3A:B7:42:77:41:15:
                60:7B:1D:BD:B6:08:62:DA:64:B6:5E:F7:46:1A:2F:6D:8B:5E:80:2A:
                8F:88:5B:05:1F:AF:2C:B3:53:52:E0:8D:CB:BB:2C:D3:8E:E1:D1:DC:
                90:3C:27:CD:44:9E:7A:4B:14:1E:A9:D8:CA:72:7D:BB:F3:2B:59:85:
                B2:BB:48:83:75:45:24:28:B1:8F:EC:AA:79:E4:B9:CA:92:2F:09:4E:
                55:2D:28:11:EC:88:80:DC:D3:95:2E:BF:0F:67:59:76:5E:83:05:08:
                2E:CF:B2:FE:3E:C3:7A:3B:15:0F:67:73:14:C1:92:AF:4F:40:F1:51:
                2C:9D:D1:45:2E:F4:BC:59:50:51:B9:BC:AC:02:27:E6:2E:6F:E8:DB:
                48:EF:A8:AA:B8:28:8C:1D:B5:42:A0:73:4F:41:CC:1E:26:6F:21:93:
                50:2A:CF:B6:65:5F:35:29:3D:39:7B:6B:BC:62:0B:6D:2A:7E:7B:65:
                C4:E2:D4:CA:1D:6B:68:B7:B1:CE:94:08:60:37:D2:ED:0B:F2:FC:D1:
                BD:91:CA:30:67:39:1A:E0:64:97:BA:5A:FE:FE:4C:E3:8B:FD:56:52:
                DE:5D:A3:B8:A0:40:D7:46:07:70:4C:B7:8C:CD:CE:5C:F7:52:C2:5F:
                5F:AF:4E:FB:55:17:CF:89:C0:AA:49:38:A7:66:B2:53:74:96:7A:42:
                65:85:7F:18:95:B4:A1:87:31:88:30:57:4C:E8:C9:9D:55:12:87:07:
                35:72:BC:FD:85:C9:F4:85:B6:0A:96:F9:73:BA:F0:22:8A:EA:7B:CF:
                FB:92:B2:BA:82:98:F3:27:83:B3:D4:9F:D2:39:3C:37:90:99:A2:BD:
                43:41:A7:C7:03:76:86:EC:A6:8D:16:F9:25:14:E7:97:34:EC:E5:EE:
                00:E4:19:2A:B8:23:AD:7B:00:54:79:96:BC:00:F5:47:B2:7C:AC:CF:
                6D:26:64:FD:B3:01:15:98:DF:09:B4:F0:09:ED:87:FA:E1:90:0F:98:
                E5:F8:BE:EF:12:32:ED:AC:57:8C:CD:8F:AF:E7:AD:0A:3D:01:8F:EE:
                1D:4C:D1:62:38:59:F4:FF:B1:D3:B7:B7:1F:97:F3:A8:28:0C:A3:3B:
                CC:A5:E7:E6:FD:85:9F:7A:E5:0B:D0:E5:16:4B:D5:72:66:95:8F:7C:
                C1:B4:BA:A7:0C:01:25:39:03:B4:76:18:C6:0B:D1:B8:1B:F5:45:FA:
                5E:B9:78:3F:24:D5:BE:E6:91:59:87:FC:04:4C:3F:BB:57:A3:4B:4C:
                45:89:D2:A2:62:61:5D:A6:D2:95:DF:2A
parm:           debug:Debug level (0=none,...,16=all) (int)

@akiyano
Copy link
Contributor

akiyano commented Nov 10, 2020

Hi @wcurry,

Thanks for your report.
I'm looking into this issue.
Meanwhile could you please contact me via [email protected] so that I can get some more details.

Thanks,
Arthur

@akiyano
Copy link
Contributor

akiyano commented Nov 21, 2020

For the record the fedora issue from the last 2 comments was handled in #147

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants