Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8s: Add test results for Grid autoscaling with KEDA #2490

Merged
merged 1 commit into from
Dec 3, 2024

Conversation

VietND96
Copy link
Member

@VietND96 VietND96 commented Dec 3, 2024

User description

Thanks for contributing to the Docker-Selenium project!
A PR well described will help maintainers to quickly review and merge it

Before submitting your PR, please check our contributing guidelines, applied for this repository.
Avoid large PRs, help reviewers by making them as simple and short as possible.

Description

In Helm chart configs, updated:

  • maxReplicaCount: 24 - Assume cluster within min 8vcpu.
  • pollingInterval: 20 - Avoid hitting the GraphQL endpoint too much.
  • strategy: default (for ScaledJob) - Expecting this will work with enough Pods scaled up to pick up the requests.

Add test results for autoscaling Deployment and Job against patch scaler to the directory .keda/. The results can be added by committing directly or via a pull request.
See more: https://github.com/SeleniumHQ/docker-selenium/tree/trunk/.keda

Motivation and Context

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • I have read the contributing document.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

PR Type

Tests, Enhancement


Description

  • Enhanced the test infrastructure to support Kubernetes autoscaling tests, including scale-up and chaos scenarios.
  • Refactored the selenium grid scaler to improve metric handling and session management.
  • Updated GitHub Actions workflow to automate test execution and result publishing.
  • Added comprehensive documentation and test result files for autoscaling tests.

Changes walkthrough 📝

Relevant files
Enhancement
5 files
bootstrap.sh
Enhance test bootstrap script for autoscaling tests           

tests/bootstrap.sh

  • Added tracing option and default matrix tests variable.
  • Updated package installation to use requirements.txt.
  • Added conditional logic for running specific autoscaling tests.
  • +15/-3   
    chart_test.sh
    Update chart test script for autoscaling configurations   

    tests/charts/make/chart_test.sh

  • Added new environment variables for autoscaling tests.
  • Adjusted session handling for different browsers.
  • Added logic for autoscaling test iterations.
  • +21/-3   
    selenium_grid_scaler.go
    Refactor selenium grid scaler for improved metrics handling

    .keda/scalers/selenium_grid_scaler.go

  • Changed data types for session and slot counts to int64.
  • Refactored session queue length calculation.
  • Updated metric generation logic.
  • +49/-53 
    common.py
    Add common utilities for autoscaling tests                             

    tests/AutoscalingTests/common.py

  • Added utility functions for session and pod management.
  • Implemented CSV export functionality for test results.
  • +95/-0   
    k8s-scaling-test.yml
    Enhance GitHub Actions workflow for autoscaling tests       

    .github/workflows/k8s-scaling-test.yml

  • Added new inputs for test iterations and result publishing.
  • Updated job strategies for autoscaling tests.
  • Implemented result upload and PR creation steps.
  • +155/-60
    Tests
    3 files
    selenium_grid_scaler_test.go
    Update selenium grid scaler tests for new logic                   

    .keda/scalers/selenium_grid_scaler_test.go

  • Updated test cases to reflect changes in scaler logic.
  • Added new test scenarios for session handling.
  • +104/-88
    test_scale_chaos.py
    Implement chaos test for autoscaling scenarios                     

    tests/AutoscalingTests/test_scale_chaos.py

  • Implemented chaos testing for autoscaling.
  • Added logic to handle session creation and termination.
  • +62/-0   
    test_scale_up.py
    Implement scale-up test for autoscaling scenarios               

    tests/AutoscalingTests/test_scale_up.py

  • Implemented scale-up testing for autoscaling.
  • Added session management and result logging.
  • +65/-0   
    Documentation
    2 files
    README.md
    Update README with autoscaling test results and instructions

    .keda/README.md

  • Added links to test result markdown files.
  • Updated documentation for running autoscaling tests.
  • +14/-3   
    README.md
    Add README for running autoscaling tests                                 

    tests/README.md

  • Added instructions for running autoscaling tests.
  • Provided command examples for test execution.
  • +18/-0   
    Dependencies
    1 files
    requirements.txt
    Add requirements file for test dependencies                           

    tests/requirements.txt

    • Added dependencies for running autoscaling tests.
    +5/-0     

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    Copy link

    qodo-merge-pro bot commented Dec 3, 2024

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
    🧪 PR contains tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Code Refactoring
    Major refactoring of getCountFromSeleniumResponse() function to handle both new request nodes and ongoing sessions. Validate the logic handles all edge cases correctly.

    Error Handling
    The create_sessions_in_parallel() function should have more robust error handling and logging for failed session creation attempts.

    Resource Cleanup
    Ensure proper cleanup of sessions in error cases and when signals are received. Current signal handlers may not handle all edge cases.

    Copy link

    qodo-merge-pro bot commented Dec 3, 2024

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Add missing error return statement to prevent undefined behavior

    Add error handling for the missing return statement in the error block. The function
    could return undefined values if an error occurs during JSON marshaling.

    .keda/scalers/selenium_grid_scaler.go [193-194]

     if err != nil {
    +    return -1, -1, err
     }
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    Why: The missing error return statement could lead to undefined behavior and potential panics in production. Adding proper error handling is critical for reliability.

    9
    Add timeout handling for parallel operations to prevent test hangs

    Add timeout handling for session creation to prevent test hangs if the grid is
    unresponsive.

    tests/AutoscalingTests/test_scale_up.py [28-29]

    -new_sessions = create_sessions_in_parallel(new_request_sessions)
    -failed_sessions = new_request_sessions - len(new_sessions)
    +try:
    +    new_sessions = create_sessions_in_parallel(new_request_sessions)
    +    failed_sessions = new_request_sessions - len(new_sessions)
    +except concurrent.futures.TimeoutError:
    +    print("Session creation timed out")
    +    failed_sessions = new_request_sessions
    +    new_sessions = []
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: Adding timeout handling for parallel session creation is crucial to prevent test hangs and ensure proper test completion when the grid is unresponsive.

    8
    Add error handling around session cleanup to prevent crashes

    Add error handling around session.quit() calls to prevent crashes if a session is
    already closed or invalid.

    tests/AutoscalingTests/common.py [54-56]

     for session in sessions:
    -    session.quit()
    +    try:
    +        session.quit()
    +    except Exception as e:
    +        print(f"Failed to quit session: {e}")
     sessions.clear()
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: Adding try-except blocks around session.quit() calls prevents test failures due to already closed or invalid sessions, improving test reliability.

    7
    Add default value for environment variable to prevent deployment failures

    The environment variable AUTOSCALING_COOLDOWN_PERIOD is used without a default
    value, which could cause deployment failures if the variable is not set. Add a
    default value or validation.

    tests/charts/ci/DeploymentAutoscaling-values.yaml [8]

    -cooldownPeriod: ${AUTOSCALING_COOLDOWN_PERIOD}
    +cooldownPeriod: ${AUTOSCALING_COOLDOWN_PERIOD:-30}
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: Missing default value for AUTOSCALING_COOLDOWN_PERIOD could cause deployment failures. Adding a fallback value improves reliability and prevents configuration errors.

    7
    Provide default value for configuration variable to ensure consistent behavior

    The environment variable ENABLE_VIDEO_RECORDER is used without a default value,
    which could cause configuration issues if not set. Add a default value for
    reliability.

    tests/charts/ci/base-recorder-values.yaml [14]

    -enabled: ${ENABLE_VIDEO_RECORDER}
    +enabled: ${ENABLE_VIDEO_RECORDER:-true}
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: Missing default value for ENABLE_VIDEO_RECORDER could lead to inconsistent behavior. Adding a default value ensures predictable configuration even when the variable is not set.

    7
    General
    Reduce the default cooldown period to enable faster test iterations

    The cooldown period of 1800 seconds (30 minutes) seems excessive for an autoscaling
    test environment. Consider reducing AUTOSCALING_COOLDOWN_PERIOD to a more reasonable
    value like 300 seconds (5 minutes) to allow for faster test iterations.

    tests/charts/make/chart_test.sh [26]

    -AUTOSCALING_COOLDOWN_PERIOD=${AUTOSCALING_COOLDOWN_PERIOD:-"1800"}
    +AUTOSCALING_COOLDOWN_PERIOD=${AUTOSCALING_COOLDOWN_PERIOD:-"300"}
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: A 30-minute cooldown period is excessive for testing and could significantly slow down test execution. Reducing it to 5 minutes would improve test efficiency while still maintaining valid test scenarios.

    7
    Optimize test execution time by reducing the number of autoscaling iterations

    The test autoscaling iterations value of 20 may be too high for CI testing,
    potentially leading to long test execution times. Consider reducing
    TEST_AUTOSCALING_ITERATIONS to a lower value like 5 or 10 while still maintaining
    test coverage.

    tests/charts/make/chart_test.sh [443]

    -export TEST_AUTOSCALING_ITERATIONS=${TEST_AUTOSCALING_ITERATIONS:-"20"}
    +export TEST_AUTOSCALING_ITERATIONS=${TEST_AUTOSCALING_ITERATIONS:-"10"}
    • Apply this suggestion
    Suggestion importance[1-10]: 6

    Why: Running 20 iterations in tests could lead to unnecessarily long test execution times. Reducing to 10 iterations would provide sufficient test coverage while improving CI pipeline efficiency.

    6

    💡 Need additional feedback ? start a PR chat

    @VietND96 VietND96 merged commit 7e6b9b3 into trunk Dec 3, 2024
    35 of 52 checks passed
    @VietND96 VietND96 deleted the test-autoscaling branch December 3, 2024 07:53
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant