Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker: Improve the video recording container graceful shutdown #2527

Merged
merged 3 commits into from
Dec 25, 2024

Conversation

VietND96
Copy link
Member

@VietND96 VietND96 commented Dec 24, 2024

User description

Thanks for contributing to the Docker-Selenium project!
A PR well described will help maintainers to quickly review and merge it

Before submitting your PR, please check our contributing guidelines, applied for this repository.
Avoid large PRs, help reviewers by making them as simple and short as possible.

Description

  • Add a function to check and wait a while to ensure the file is integrity and openable.
  • Update trap signal for approach recording with static video file name. Whenever FFmpeg PID exited, it should not terminate the PID of supervisor, then another recording PID can be spawned and standby for the next screen open from Node container.

Motivation and Context

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • I have read the contributing document.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

PR Type

Bug fix


Description

  • Added new function wait_for_file_integrity() to verify video file completeness and readability using FFmpeg
  • Introduced configurable environment variables for wait attempts:
    • SE_VIDEO_FILE_READY_WAIT_ATTEMPTS for file integrity checks
    • SE_VIDEO_WAIT_UPLOADER_SHUTDOWN_ATTEMPTS for uploader shutdown
  • Split graceful shutdown logic into two functions:
    • graceful_exit() for basic cleanup
    • graceful_exit_force() for full shutdown including supervisor
  • Improved recording process by checking file integrity after stopping recordings
  • Updated polling intervals to use consistent timing across different wait operations

Changes walkthrough 📝

Relevant files
Enhancement
video.sh
Enhance video recording reliability and graceful shutdown process

Video/video.sh

  • Added file integrity check function to ensure video files are complete
    and readable
  • Improved graceful shutdown handling with separate functions for
    different scenarios
  • Added configurable wait attempts for file readiness and uploader
    shutdown
  • Updated trap signal handling to prevent premature supervisor
    termination
  • +32/-7   

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Potential Race Condition

    The new file integrity check function may not handle edge cases where the file is still being written by FFmpeg. Consider adding additional checks or synchronization mechanisms.

    function wait_for_file_integrity() {
      retry=0
      if [[ ! -f "${video_file}" ]]; then
        echo "$(date -u +"${ts_format}") [${process_name}] - Video file is not found, might be the recording is not started."
        return 0
      fi
      until ffmpeg -v error -i "${video_file}" -f null - ; do
        echo "$(date -u +"${ts_format}") [${process_name}] - Waiting for video file ${video_file} to be ready."
        sleep ${poll_interval}
        retry=$((retry + 1))
        if [[ $retry -ge ${file_ready_max_attempts} ]]; then
          echo "$(date -u +"${ts_format}") [${process_name}] - Video file is not ready after ${file_ready_max_attempts} attempts, skipping..."
          break
        fi
      done
    }
    Error Handling

    The FFmpeg file integrity check does not properly handle or log FFmpeg error output, which could mask underlying issues. Consider capturing and logging the error output.

    until ffmpeg -v error -i "${video_file}" -f null - ; do

    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Prevent potential deadlock by adding timeout handling for upload process termination

    Add a timeout check in the wait_util_uploader_shutdown function for the RCLONE
    process to prevent infinite loops if the process becomes stuck.

    Video/video.sh [104-110]

     if [[ "${VIDEO_UPLOAD_ENABLED}" = "true" ]] && [[ -n "${UPLOAD_DESTINATION_PREFIX}" ]] && [[ "${VIDEO_INTERNAL_UPLOAD}" = "true" ]]; then
    -  while [[ $(pgrep rclone | wc -l) -gt 0 ]]; do
    +  wait=0
    +  while [[ $(pgrep rclone | wc -l) -gt 0 ]] && [[ ${wait} -lt ${wait_uploader_shutdown_max_attempts} ]]; do
         echo "exit" >>${UPLOAD_PIPE_FILE} &
         echo "$(date -u +"${ts_format}") [${process_name}] - Recorder is waiting for RCLONE to finish"
         sleep ${poll_interval}
    +    wait=$((wait + 1))
       done
    +  if [[ ${wait} -ge ${wait_uploader_shutdown_max_attempts} ]]; then
    +    echo "$(date -u +"${ts_format}") [${process_name}] - Force killing RCLONE after timeout"
    +    pkill -9 rclone
    +  fi
     fi
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    Why: This is a critical improvement that prevents potential system hangs by implementing a timeout mechanism for stuck RCLONE processes. The addition of force-kill after timeout ensures the system can gracefully recover from upload process failures.

    9
    Add proper error handling and return codes for file integrity check failures

    Add error handling in the wait_for_file_integrity function to detect and handle
    cases where ffmpeg fails with a non-zero exit code. Currently, the function only
    breaks the loop but doesn't indicate failure.

    Video/video.sh [162-177]

     function wait_for_file_integrity() {
       retry=0
       if [[ ! -f "${video_file}" ]]; then
         echo "$(date -u +"${ts_format}") [${process_name}] - Video file is not found, might be the recording is not started."
    -    return 0
    +    return 1
       }
       until ffmpeg -v error -i "${video_file}" -f null - ; do
         echo "$(date -u +"${ts_format}") [${process_name}] - Waiting for video file ${video_file} to be ready."
         sleep ${poll_interval}
         retry=$((retry + 1))
         if [[ $retry -ge ${file_ready_max_attempts} ]]; then
           echo "$(date -u +"${ts_format}") [${process_name}] - Video file is not ready after ${file_ready_max_attempts} attempts, skipping..."
    -      break
    +      return 1
         fi
       done
    +  return 0
     }
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: The suggestion significantly improves error handling by adding proper return codes (1 for failure, 0 for success), which is crucial for detecting and responding to video file integrity issues. This enhancement helps prevent silent failures and enables better error recovery.

    8

    Copy link

    qodo-merge-pro bot commented Dec 24, 2024

    CI Failure Feedback 🧐

    (Checks updated until commit 2b5c6b3)

    Action: Rerun workflow when failure

    Failed stage: Authenticate GitHub CLI for PR [❌]

    Failure summary:

    The action failed because the GitHub authentication token lacks the required 'read:org' permission
    scope. The error occurred during the gh auth login command, which requires proper authorization
    scopes to interact with GitHub's API.

    Relevant error logs:
    1:  ##[group]Operating System
    2:  Ubuntu
    ...
    
    28:  SecurityEvents: write
    29:  Statuses: write
    30:  ##[endgroup]
    31:  Secret source: Actions
    32:  Prepare workflow directory
    33:  Prepare all required actions
    34:  Getting action download info
    35:  Download action repository 'actions/checkout@main' (SHA:cbb722410c2e876e24abbe8de2cc27693e501dcb)
    36:  Complete job name: Rerun workflow when failure
    ...
    
    48:  show-progress: true
    49:  lfs: false
    50:  submodules: false
    51:  set-safe-directory: true
    52:  env:
    53:  GH_CLI_TOKEN: ***
    54:  GH_CLI_TOKEN_PR: ***
    55:  RUN_ID: 12486809612
    56:  RERUN_FAILED_ONLY: true
    ...
    
    119:  ##[group]Run sudo apt update
    120:  �[36;1msudo apt update�[0m
    121:  �[36;1msudo apt install gh�[0m
    122:  shell: /usr/bin/bash -e {0}
    123:  env:
    124:  GH_CLI_TOKEN: ***
    125:  GH_CLI_TOKEN_PR: ***
    126:  RUN_ID: 12486809612
    127:  RERUN_FAILED_ONLY: true
    ...
    
    164:  0 upgraded, 0 newly installed, 0 to remove and 48 not upgraded.
    165:  ##[group]Run echo "$GH_CLI_TOKEN_PR" | gh auth login --with-token
    166:  �[36;1mecho "$GH_CLI_TOKEN_PR" | gh auth login --with-token�[0m
    167:  shell: /usr/bin/bash -e {0}
    168:  env:
    169:  GH_CLI_TOKEN: ***
    170:  GH_CLI_TOKEN_PR: ***
    171:  RUN_ID: 12486809612
    172:  RERUN_FAILED_ONLY: true
    173:  RUN_ATTEMPT: 1
    174:  ##[endgroup]
    175:  error validating token: missing required scope 'read:org'
    176:  ##[error]Process completed with exit code 1.
    

    ✨ CI feedback usage guide:

    The CI feedback tool (/checks) automatically triggers when a PR has a failed check.
    The tool analyzes the failed checks and provides several feedbacks:

    • Failed stage
    • Failed test name
    • Failure summary
    • Relevant error logs

    In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:

    /checks "https://github.com/{repo_name}/actions/runs/{run_number}/job/{job_number}"
    

    where {repo_name} is the name of the repository, {run_number} is the run number of the failed check, and {job_number} is the job number of the failed check.

    Configuration options

    • enable_auto_checks_feedback - if set to true, the tool will automatically provide feedback when a check is failed. Default is true.
    • excluded_checks_list - a list of checks to exclude from the feedback, for example: ["check1", "check2"]. Default is an empty list.
    • enable_help_text - if set to true, the tool will provide a help message with the feedback. Default is true.
    • persistent_comment - if set to true, the tool will overwrite a previous checks comment with the new feedback. Default is true.
    • final_update_message - if persistent_comment is true and updating a previous checks message, the tool will also create a new message: "Persistent checks updated to latest commit". Default is true.

    See more information about the checks tool in the docs.

    @VietND96 VietND96 merged commit f5679f5 into trunk Dec 25, 2024
    5 of 35 checks passed
    @VietND96 VietND96 deleted the recorder-shutdown branch December 25, 2024 02:30
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    [🐛 Bug]: Stopping node container now stops attached video container
    1 participant