-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENA keeps restarting #134
Comments
Sorry you are hitting this issue, and thanks for sharing instance-id and details
We are looking at things on our side
… On Jul 14, 2020, at 9:31 AM, arainero ***@***.***> wrote:
Hello,
I have a T3.a instance that is experiencing dropped RX packets due to ENA resetting. I don't know what's causing ENA to reset constantly and I was hoping you could shed some light on the matter. I have multiple servers based off of the same AMI so I don't know why this one is having these issues. The server is heavy on UDP traffic compared to TCP if that helps. The instance ID is "i-0c068f87c4161e736".
When ENA resets the following logs are generated in /var/log/messages and jounralctl
Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 918. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 919. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 920. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 921. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 922. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 923. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 924. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 925. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 926. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 927. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 928. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 929. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 930. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 931. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 932. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 933. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 934. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 935. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 936. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 937. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 938. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 939. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 940. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 941. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 942. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 943. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 944. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 945. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 946. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 947. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 948. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 949. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 950. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 951. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 952. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 953. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 954. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 955. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 956. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 957. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 958. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 959. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 960. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 961. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 962. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 963. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 964. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 965. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 966. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 967. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 968. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 969. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 970. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 971. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 972. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 973. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 974. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 975. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 976. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 977. Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: Keep alive watchdog timeout. Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: Trigger reset is on Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: tx_timeout: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: suspend: 0 Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 944. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 945. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 946. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 947. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 948. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 949. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 950. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 951. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 952. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 953. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 954. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 955. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 956. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 957. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 958. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 959. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 960. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 961. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 962. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 963. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 964. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 965. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 966. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 967. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 968. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 969. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 970. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 971. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 972. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 973. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 974. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 975. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 976. Jul 14 10:10:21 myserver.hostname kernel: ena 0000:00:05.0 eth0: Found a Tx that wasn't completed on time, qid 2, index 977. Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: Keep alive watchdog timeout. Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: Trigger reset is on Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: tx_timeout: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: suspend: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: resume: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: wd_expired: 1 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: interface_up: 1 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: interface_down: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: admin_q_pause: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_cnt: 2112023 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_bytes: 533808489 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_queue_stop: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_queue_wakeup: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_dma_mapping_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_linearize: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_linearize_failed: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_napi_comp: 4403324 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_tx_poll: 4403326 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_doorbells: 2089811 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_prepare_ctx_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_bad_req_id: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_tx_missed_tx: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_cnt: 2371476 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bytes: 512952189 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_refil_partial: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_csum: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_page_alloc_fail: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_skb_alloc_fail: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_dma_mapping_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_desc_num: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_rx_copybreak_pkt: 142811 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_bad_req_id: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_0_rx_empty_rx_ring: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_cnt: 1861158 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_bytes: 1698200576 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_queue_stop: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_queue_wakeup: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_dma_mapping_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_linearize: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_linearize_failed: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_napi_comp: 3672075 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_tx_poll: 3672121 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_doorbells: 1743809 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_prepare_ctx_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_bad_req_id: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_tx_missed_tx: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_cnt: 1913582 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bytes: 412499776 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_refil_partial: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_csum: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_page_alloc_fail: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_skb_alloc_fail: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_dma_mapping_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_desc_num: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_rx_copybreak_pkt: 53095 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_bad_req_id: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_1_rx_empty_rx_ring: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_cnt: 2405368 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_bytes: 547235419 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_queue_stop: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_queue_wakeup: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_dma_mapping_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_linearize: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_linearize_failed: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_napi_comp: 4279065 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_tx_poll: 4279175 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_doorbells: 2382262 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_prepare_ctx_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_bad_req_id: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_tx_missed_tx: 60 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_cnt: 1953528 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bytes: 430232614 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_refil_partial: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_csum: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_page_alloc_fail: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_skb_alloc_fail: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_dma_mapping_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_desc_num: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_rx_copybreak_pkt: 72039 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_req_id: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_bad_req_id: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_2_rx_empty_rx_ring: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_cnt: 2696836 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_bytes: 731741920 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_queue_stop: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_queue_wakeup: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_dma_mapping_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_linearize: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_linearize_failed: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_napi_comp: 5190044 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_tx_poll: 5190044 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_doorbells: 2682972 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_prepare_ctx_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_bad_req_id: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_tx_missed_tx: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_cnt: 2556581 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bytes: 701259477 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_refil_partial: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_csum: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_page_alloc_fail: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_skb_alloc_fail: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_dma_mapping_err: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_desc_num: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_rx_copybreak_pkt: 218516 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_bad_req_id: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: queue_3_rx_empty_rx_ring: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_aborted_cmd: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_submitted_cmd: 28 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_completed_cmd: 28 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_out_of_space: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: ena_admin_q_no_completion: 0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 0 idx 0x217 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 1 idx 0x226 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 2 idx 0x0 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0 eth0: free uncompleted tx skb qid 3 idx 0x284 Jul 14 10:10:22 myserver.hostname kernel: ena: ena device version: 0.10 Jul 14 10:10:22 myserver.hostname kernel: ena: ena controller version: 0.0.1 implementation version 1 Jul 14 10:10:22 myserver.hostname kernel: ena 0000:00:05.0: irq 24 for MSI/MSI-X Jul 14 10:10:22 drb kernel: ena 0000:00:05.0: Device reset completed successfully
modinfo gives the following:
filename: /lib/modules/3.10.0-957.21.3.el7.x86_64/kernel/drivers/net/ethernet/amazon/ena/ena.ko.xz version: 1.5.0K license: GPL description: Elastic Network Adapter (ENA) author: Amazon.com, Inc. or its affiliates retpoline: Y rhelversion: 7.6 srcversion: 1B9931F07C26733BA8D4F94 alias: pci:v00001D0Fd0000EC21sv*sd*bc*sc*i* alias: pci:v00001D0Fd0000EC20sv*sd*bc*sc*i* alias: pci:v00001D0Fd00001EC2sv*sd*bc*sc*i* alias: pci:v00001D0Fd00000EC2sv*sd*bc*sc*i* depends: intree: Y vermagic: 3.10.0-957.21.3.el7.x86_64 SMP mod_unload modversions signer: CentOS Linux kernel signing key sig_key: 1E:5F:1D:87:70:4B:F3:38:01:2F:A2:B0:FE:16:94:59:97:B3:31:27 sig_hashalgo: sha256 parm: debug:Debug level (0=none,...,16=all) (int)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#134>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE2BJJK5IPQWGW6IGABG4JLR3SB63ANCNFSM4OZV73HA>.
|
This happened again with a little more information that I noticed. Something that stands out to me is "The number of lost tx completions is above the threshold (248 > 128). Reset the device" According to https://nxmnpg.lemoda.net/4/ena "Packet was pushed to the NIC but not sent within given time limit; it may be caused by hang of the IO queue." I want to investigate the IO queue mentioned here, do you have any advice for that or what to look at / look for?
|
Thanks @arainero for additional info. |
Also we strongly recommend to update the driver to the latest version |
arainero@, could you please reach out directly to me [email protected], and we'll guide you. |
Hi @zorikm, seeing similar errors as well as others with
Upgrading driver and setting
However, after testing new application that added additional network load, the errors are back. All core cpu avg is ~10% and network traffic is ~10/~15Mbps respectively for rx/tx. |
Andrew
We’ve triaging some failures on our side (server/ena FW) that matches this sighting, ena and ec2 team working on it in high priority and will update you on progress.
…Sent from my iPhone
On Jul 19, 2020, at 9:21 AM, Andrew Choo ***@***.***> wrote:
Hi @zorikm, seeing similar errors as well as others with v2.2.9 of driver. Also on t3a instance but have seen on other instance types as well.
$ modinfo ena
filename: /lib/modules/3.10.0-1127.13.1.el7.x86_64/extra/ena.ko.xz
version: 2.2.9g
license: GPL
description: Elastic Network Adapter (ENA)
author: Amazon.com, Inc. or its affiliates
retpoline: Y
rhelversion: 7.8
srcversion: 27F5567B9755BE00C8A08B5
alias: pci:v00001D0Fd0000EC21sv*sd*bc*sc*i*
alias: pci:v00001D0Fd0000EC20sv*sd*bc*sc*i*
alias: pci:v00001D0Fd00001EC2sv*sd*bc*sc*i*
alias: pci:v00001D0Fd00000EC2sv*sd*bc*sc*i*
alias: pci:v00001D0Fd00000051sv*sd*bc*sc*i*
depends:
vermagic: 3.10.0-1127.13.1.el7.x86_64 SMP mod_unload modversions
parm: debug:Debug level (0=none,...,16=all) (int)
parm: rx_queue_size:Rx queue size. The size should be a power of 2. Max value is 8K
(int)
parm: force_large_llq_header:Increases maximum supported header size in LLQ mode to 224 bytes, while reducing the maximum TX queue size by half.
(int)
parm: num_io_queues:Sets number of RX/TX queues to allocate to device. The maximum value depends on the device and number of online CPUs.
dmesg.txt
ena_errors.txt
Upgrading driver and setting vm.min_free_kbytes to 128MB (2x default) initially seemed to correct issues.
$ sysctl vm.min_free_kbytes
vm.min_free_kbytes = 135168
However, after testing new application that added additional network load, the errors are back. All core cpu avg is ~10% and network traffic is ~10/~15Mbps respectively for rx/tx.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
@arainero The triaging has finished and fix has been implemented, please see if the issues are resolved |
Is this issue only happen on AMD Zen based instance type? |
@arainero Can you please indicate if the issue was resolved? |
Unfortunately, we had to migrate the problematic server off of AWS before a fix was applied due to the issues happening. I don't have a reliable way to test this now. |
@I-gor-C we switched our machines back to machines with the new ENA drivers, and haven't had this issue over the last week. I think this issue is definitely resolved. Thanks! |
@ubarar thanks for confirming, we'll go ahead and close the issue |
I'm having a similar issue: coreos/fedora-coreos-tracker#665 The issue exists in all Fedora coreos versions between 31.20200323.2.0 and the latest FCOS 32. 31.20200323.2.0
|
Hi @wcurry, Thanks for your report. Thanks, |
For the record the fedora issue from the last 2 comments was handled in #147 |
Hello,
I have a T3.a instance that is experiencing dropped RX packets due to ENA resetting. I don't know what's causing ENA to reset constantly and I was hoping you could shed some light on the matter. I have multiple servers based off of the same AMI so I don't know why this one is having these issues. The server is heavy on UDP traffic compared to TCP if that helps. The instance ID is "i-0c068f87c4161e736".
When ENA resets the following logs are generated in /var/log/messages and jounralctl
modinfo gives the following:
The text was updated successfully, but these errors were encountered: