Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat:finish backup and restore #2968

Merged
merged 4 commits into from
Dec 9, 2024

Conversation

buzhimingyonghu
Copy link
Contributor

@buzhimingyonghu buzhimingyonghu commented Dec 6, 2024

#2970

Provide BackupRepo

Fellow the kubeblock docs kubeblocks
Fellow [tools/kubeblocks_helm/BackupRepo_config]

Create a Backup

kbcli cluster backup pika-master-slave-cluster --method datafile

image

View Backups

kbcli cluster list-backups

image

Restore a Cluster from a Backup
Select a backup and create a cluster:

kbcli cluster restore test-pika --backup backup-default-pika-master-slave-cluster-20241203170414

image

image

View the Backup Policy of a Cluster

kbcli cluster list-backup-policy mycluster

image

Summary by CodeRabbit

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a command for creating a backup repository using kbcli.
    • Added new scripts for backing up and restoring Redis databases.
    • New YAML configurations for managing backup actions and policies within the Kubeblocks framework.
    • Added commands for installing Helm charts for pika-master-slave and pika-master-slave-cluster.
  • Documentation

    • Updated README to include instructions for backing up and restoring a cluster.
  • Bug Fixes

    • Improved error handling in backup and restore scripts.
  • Chores

    • Added uninstall scripts for removing Helm releases.

Copy link

coderabbitai bot commented Dec 6, 2024

Walkthrough

A new command for the kbcli tool has been introduced to create a backup repository, allowing users to specify various configuration parameters. Additionally, the documentation has been updated to include a section on backing up and restoring clusters. Several installation scripts have been modified or added, including the removal of uninstallation commands and the introduction of new Helm installation commands. New YAML configuration files for backup actions and policies have been created, and enhancements have been made to existing scripts for better error handling and control flow.

Changes

File Change Summary
tools/kubeblocks_helm/BackupRepo_config Added command kbcli backuprepo create my-repo with parameters for backup repository configuration.
tools/kubeblocks_helm/README.md Added section on backing up/restoring cluster; included commands for backup and restore operations.
tools/kubeblocks_helm/install.sh Removed uninstallation commands and sleep command; streamlined installation process for Helm charts.
tools/kubeblocks_helm/install_ms.sh Added commands to install pika-master-slave and pika-master-slave-cluster Helm charts.
tools/kubeblocks_helm/installdebug.sh File deleted; contained commented-out Helm commands and helm template command.
tools/kubeblocks_helm/pika-master-slave-cluster/templates/role.yaml Updated Kubernetes Role with new rules for clusters and pods resources.
tools/kubeblocks_helm/pika-master-slave/dataprotection/backup.sh New script for backing up Redis database with error handling and backup process management.
tools/kubeblocks_helm/pika-master-slave/dataprotection/restore.sh Enhanced restore script with error handling and control flow improvements.
tools/kubeblocks_helm/pika-master-slave/templates/backupactionset.yaml New YAML file for managing backup actions within Kubeblocks framework.
tools/kubeblocks_helm/pika-master-slave/templates/backuppolicytemplate.yaml New YAML file for defining a backup policy for the Pika cluster.
tools/kubeblocks_helm/pika-master-slave/templates/componentdefinition-pika.yaml Modified ComponentDefinition to include new roles and lifecycle actions, and added a new container.
tools/kubeblocks_helm/uninstall.sh New script to uninstall pika-cluster and pika Helm releases.
tools/kubeblocks_helm/uninstall_ms.sh New script to uninstall pika-master-slave-cluster and pika-master-slave.

Possibly related PRs

  • feat:Update kubeblocks version to 0.9 #2860: The changes in this PR involve updates to the README.md file, which includes commands related to backup and restore operations using the kbcli tool, directly connecting to the new backup repository command introduced in the main PR.

Suggested labels

✏️ Feature, 🧹 Updates

Suggested reviewers

  • chejinge
  • machinly

🐰 In the world of code, a new command takes flight,
To back up our treasures, making everything right.
With scripts and YAML, we dance and we play,
Protecting our data in a seamless ballet.
So hop along, friends, let’s cheer and rejoice,
For backups are here, let’s all raise our voice! 🎉


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a1cdd24 and daf5bfc.

📒 Files selected for processing (1)
  • tools/kubeblocks_helm/BackupRepo_config (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tools/kubeblocks_helm/BackupRepo_config

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added Invalid PR Title 📒 Documentation Improvements or additions to documentation labels Dec 6, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Outside diff range and nitpick comments (6)
tools/kubeblocks_helm/BackupRepo_config (1)

3-8: Improve command documentation with parameter descriptions

The placeholder values (<...>) should be documented with:

  1. Expected format and constraints for each parameter
  2. Whether parameters are optional or required
  3. Example values for common scenarios
tools/kubeblocks_helm/pika-master-slave/templates/backuppolicytemplate.yaml (2)

22-26: Review backup schedule and retention policy

Consider the following for the backup schedule:

  1. Weekly backups might not be sufficient for production workloads
  2. 3-month retention period (3m) - ensure this meets compliance requirements
  3. Document the rationale for these choices in comments

10-11: Fix YAML formatting issues

Remove trailing spaces in line 11 as flagged by yamllint.

-  - componentDefs: 
-      - pika 
+  - componentDefs:
+      - pika
🧰 Tools
🪛 yamllint (1.35.1)

[error] 11-11: trailing spaces

(trailing-spaces)

tools/kubeblocks_helm/pika-master-slave/dataprotection/restore.sh (1)

6-11: Enhance directory validation error message

The error message could be more descriptive to help users understand and resolve the issue.

Consider updating the error message:

-  echo "${DATA_DIR} is not empty! Please make sure that the directory is empty before restoring the backup."
+  echo "Error: ${DATA_DIR} contains existing data and no .kb-data-protection file found."
+  echo "This could indicate:"
+  echo "1. An incomplete previous restore"
+  echo "2. Existing data that needs to be backed up first"
+  echo "Please backup or clear the directory before proceeding with restore."
tools/kubeblocks_helm/pika-master-slave/templates/backupactionset.yaml (2)

11-14: Consider adding validation for environment variables

The environment variables are critical for backup/restore operations but lack validation.

Consider adding a validation mechanism in the scripts to ensure these environment variables are set and contain valid values. Additionally, document the expected format and constraints for each variable in comments.


21-23: Review progress sync interval

The 5-second progress sync interval might be too frequent for large backups.

Consider making the interval configurable through a template value:

       syncProgress:
         enabled: true
-        intervalSeconds: 5
+        intervalSeconds: {{ .Values.backup.syncIntervalSeconds | default 5 }}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 937ebde and a1cdd24.

📒 Files selected for processing (13)
  • tools/kubeblocks_helm/BackupRepo_config (1 hunks)
  • tools/kubeblocks_helm/README.md (1 hunks)
  • tools/kubeblocks_helm/install.sh (0 hunks)
  • tools/kubeblocks_helm/install_ms.sh (1 hunks)
  • tools/kubeblocks_helm/installdebug.sh (0 hunks)
  • tools/kubeblocks_helm/pika-master-slave-cluster/templates/role.yaml (1 hunks)
  • tools/kubeblocks_helm/pika-master-slave/dataprotection/backup.sh (1 hunks)
  • tools/kubeblocks_helm/pika-master-slave/dataprotection/restore.sh (1 hunks)
  • tools/kubeblocks_helm/pika-master-slave/templates/backupactionset.yaml (1 hunks)
  • tools/kubeblocks_helm/pika-master-slave/templates/backuppolicytemplate.yaml (1 hunks)
  • tools/kubeblocks_helm/pika-master-slave/templates/componentdefinition-pika.yaml (2 hunks)
  • tools/kubeblocks_helm/uninstall.sh (1 hunks)
  • tools/kubeblocks_helm/uninstall_ms.sh (1 hunks)
💤 Files with no reviewable changes (2)
  • tools/kubeblocks_helm/installdebug.sh
  • tools/kubeblocks_helm/install.sh
🧰 Additional context used
🪛 Shellcheck (0.10.0)
tools/kubeblocks_helm/install_ms.sh

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

tools/kubeblocks_helm/pika-master-slave/dataprotection/backup.sh

[warning] 29-29: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

tools/kubeblocks_helm/pika-master-slave/dataprotection/restore.sh

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

tools/kubeblocks_helm/uninstall.sh

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

tools/kubeblocks_helm/uninstall_ms.sh

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

🪛 yamllint (1.35.1)
tools/kubeblocks_helm/pika-master-slave/templates/backuppolicytemplate.yaml

[error] 8-8: syntax error: could not find expected ':'

(syntax)


[error] 11-11: trailing spaces

(trailing-spaces)

tools/kubeblocks_helm/pika-master-slave/templates/backupactionset.yaml

[error] 8-8: syntax error: could not find expected ':'

(syntax)

tools/kubeblocks_helm/pika-master-slave/templates/componentdefinition-pika.yaml

[error] 51-51: trailing spaces

(trailing-spaces)


[error] 113-113: trailing spaces

(trailing-spaces)

🔇 Additional comments (8)
tools/kubeblocks_helm/install_ms.sh (1)

1-1: Verify chart existence and dependencies

Let's verify if the referenced Helm charts exist in the repository.

✅ Verification successful

Both Helm charts exist and have valid structure

The verification confirms that both Helm charts referenced in the install script exist and have the required structure:

  • ./tools/kubeblocks_helm/pika-master-slave/ contains Chart.yaml, values.yaml, and templates directory
  • ./tools/kubeblocks_helm/pika-master-slave-cluster/ contains Chart.yaml, values.yaml, and templates directory

Both charts appear to be properly structured Helm charts with all the necessary components, making the installation command valid.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if the Helm charts exist and have the required structure

# Check for chart directories
echo "Checking chart directories..."
fd -t d "pika-master-slave$" -x echo "Found chart directory: {}"

# Check for Chart.yaml files
echo -e "\nChecking Chart.yaml files..."
fd "Chart.yaml" -p "pika-master-slave" -x echo "Found Chart.yaml: {}"

# Check for values.yaml files
echo -e "\nChecking values.yaml files..."
fd "values.yaml" -p "pika-master-slave" -x echo "Found values.yaml: {}"

Length of output: 696


Script:

#!/bin/bash
# Let's check the content of the found directory and its parent
echo "Checking contents of the helm directory..."
ls -la ./tools/kubeblocks_helm/

echo -e "\nChecking if charts exist..."
ls -la ./tools/kubeblocks_helm/pika-master-slave/ 2>/dev/null || echo "pika-master-slave directory not accessible"
ls -la ./tools/kubeblocks_helm/pika-master-slave-cluster/ 2>/dev/null || echo "pika-master-slave-cluster directory not accessible"

# Let's also check if these might be helm chart archives (.tgz)
echo -e "\nChecking for helm chart archives..."
fd -e tgz . ./tools/kubeblocks_helm/ -x echo "Found chart archive: {}"

Length of output: 2044

🧰 Tools
🪛 Shellcheck (0.10.0)

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

tools/kubeblocks_helm/uninstall.sh (1)

1-1: 🛠️ Refactor suggestion

Essential improvements needed for robust uninstallation

The current implementation needs several improvements for reliability and maintainability:

Apply these changes to make the script more robust:

+#!/bin/bash
+set -eo pipefail
+
+# This script uninstalls the Pika cluster and its components
+
+echo "Uninstalling Pika cluster components..."
+
+# Function to safely uninstall a helm release
+uninstall_release() {
+    local release=$1
+    if helm status "$release" >/dev/null 2>&1; then
+        echo "Uninstalling $release..."
+        helm uninstall "$release" || echo "Failed to uninstall $release"
+    else
+        echo "Release $release not found, skipping..."
+    fi
+}
+
+# Uninstall releases
+uninstall_release "pika-cluster"
+uninstall_release "pika"
+
+echo "Uninstallation completed"

The improvements include:

  • Added shebang line for proper shell specification
  • Added error handling with set -eo pipefail
  • Added release existence validation
  • Added informative output messages
  • Modularized uninstallation logic into a function
  • Separated operations for better error handling

Let's verify if there are any additional cleanup steps needed:

✅ Verification successful

Let me gather more information about potential cleanup requirements by checking the Helm chart and related files.


Let me check the cluster definition and component templates to understand what resources need cleanup.


Enhanced uninstallation script with comprehensive resource cleanup

+#!/bin/bash
+set -eo pipefail
+
+# This script uninstalls Pika cluster and its components, including cleanup of associated resources
+
+echo "Starting Pika cluster uninstallation..."
+
+# Function to safely uninstall a helm release
+uninstall_release() {
+    local release=$1
+    if helm status "$release" >/dev/null 2>&1; then
+        echo "Uninstalling $release..."
+        helm uninstall "$release" || echo "Failed to uninstall $release"
+    else
+        echo "Release $release not found, skipping..."
+    fi
+}
+
+# Function to cleanup resources
+cleanup_resources() {
+    local namespace=$1
+    local component=$2
+    
+    echo "Cleaning up resources for $component..."
+    
+    # Clean up PVCs
+    kubectl delete pvc -l app.kubernetes.io/instance="$component" -n "$namespace" --ignore-not-found=true
+    
+    # Clean up ConfigMaps
+    kubectl delete cm -l app.kubernetes.io/instance="$component" -n "$namespace" --ignore-not-found=true
+    
+    # Clean up Services
+    kubectl delete svc -l app.kubernetes.io/instance="$component" -n "$namespace" --ignore-not-found=true
+}
+
+NAMESPACE=${NAMESPACE:-default}
+
+# Uninstall Helm releases
+uninstall_release "pika-cluster"
+uninstall_release "pika"
+
+# Cleanup resources
+cleanup_resources "$NAMESPACE" "pika-cluster"
+cleanup_resources "$NAMESPACE" "pika"
+
+echo "Uninstallation and cleanup completed"

The improved script includes:

  • Proper shell environment setup
  • Comprehensive cleanup of PVCs, ConfigMaps, and Services
  • Namespace-aware resource cleanup
  • Error handling and status messages
  • Resource cleanup for both pika and pika-cluster components
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for any potential leftover resources
echo "Checking for potential leftover resources..."

# Check for PVCs
kubectl get pvc -l app.kubernetes.io/instance=pika-cluster 2>/dev/null || true
kubectl get pvc -l app.kubernetes.io/instance=pika 2>/dev/null || true

# Check for ConfigMaps
kubectl get cm -l app.kubernetes.io/instance=pika-cluster 2>/dev/null || true
kubectl get cm -l app.kubernetes.io/instance=pika 2>/dev/null || true

Length of output: 352


Script:

#!/bin/bash
# Check for Helm chart files and resources
echo "Checking Helm chart resources..."

# Look for helm chart files
fd -e yaml -e tpl . tools/kubeblocks_helm

# Check for any cleanup or post-delete hooks in values files
rg -A 5 "helm.sh/hook.*delete" tools/kubeblocks_helm

# Check for any CRDs or other resources that might need cleanup
rg -l "kind: CustomResourceDefinition" tools/kubeblocks_helm

Length of output: 3215


Script:

#!/bin/bash
# Check cluster and component definitions for resource types
echo "Analyzing cluster resources..."

# Check cluster template for resource types
cat tools/kubeblocks_helm/pika-cluster/templates/cluster.yaml

# Check component definitions for resource types
cat tools/kubeblocks_helm/pika/templates/componentdefinition-pika-group.yaml
cat tools/kubeblocks_helm/pika/templates/componentdefinition-codis-proxy.yaml
cat tools/kubeblocks_helm/pika/templates/componentdefinition-pika-etcd.yaml

# Look for any service or statefulset definitions
rg -l "kind: (Service|StatefulSet)" tools/kubeblocks_helm

Length of output: 15075

🧰 Tools
🪛 Shellcheck (0.10.0)

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

tools/kubeblocks_helm/uninstall_ms.sh (1)

1-1: 🛠️ Refactor suggestion

Reduce code duplication with uninstall.sh

This script has identical issues to uninstall.sh and performs similar operations. Consider consolidating both scripts to reduce maintenance overhead.

Create a single unified script:

+#!/bin/bash
+set -eo pipefail
+
+# This script uninstalls Pika cluster components
+# Usage: ./uninstall.sh [--all|--master-slave]
+
+# Function to safely uninstall a helm release
+uninstall_release() {
+    local release=$1
+    if helm status "$release" >/dev/null 2>&1; then
+        echo "Uninstalling $release..."
+        helm uninstall "$release" || echo "Failed to uninstall $release"
+    else
+        echo "Release $release not found, skipping..."
+    fi
+}
+
+case "${1:-}" in
+    --all)
+        echo "Uninstalling all Pika components..."
+        uninstall_release "pika-cluster"
+        uninstall_release "pika"
+        uninstall_release "pika-master-slave-cluster"
+        uninstall_release "pika-master-slave"
+        ;;
+    --master-slave)
+        echo "Uninstalling Pika master-slave components..."
+        uninstall_release "pika-master-slave-cluster"
+        uninstall_release "pika-master-slave"
+        ;;
+    *)
+        echo "Usage: $0 [--all|--master-slave]"
+        exit 1
+        ;;
+esac
+
+echo "Uninstallation completed"

Consider these architectural improvements:

  1. Move all Helm-related scripts to a common directory
  2. Create a shared library for common functions
  3. Add documentation about the relationship between these components and the backup/restore feature

Let's verify the integration with backup functionality:

✅ Verification successful

Let me gather more information about the codebase structure and related scripts.


Based on the gathered information, I can now provide the final response:

Consolidate uninstall scripts and add backup cleanup

The review suggestion is valid but needs enhancement. The master-slave variant has additional backup-related components that need proper cleanup as shown in pika-master-slave/templates/backupactionset.yaml and backuppolicytemplate.yaml. Here's a refined suggestion:

+#!/bin/bash
+set -eo pipefail
+
+# This script uninstalls Pika cluster components
+# Usage: ./uninstall.sh [--all|--master-slave]
+
+# Function to safely uninstall a helm release
+uninstall_release() {
+    local release=$1
+    if helm status "$release" >/dev/null 2>&1; then
+        echo "Uninstalling $release..."
+        helm uninstall "$release" || echo "Failed to uninstall $release"
+    else
+        echo "Release $release not found, skipping..."
+    fi
+}
+
+# Function to cleanup backup resources
+cleanup_backups() {
+    echo "Cleaning up backup resources..."
+    kubectl delete backups.dataprotection.kubeblocks.io --all 2>/dev/null || true
+    kubectl delete backuppolicies --all 2>/dev/null || true
+}
+
+case "${1:-}" in
+    --all)
+        echo "Uninstalling all Pika components..."
+        uninstall_release "pika-cluster"
+        uninstall_release "pika"
+        uninstall_release "pika-master-slave-cluster"
+        uninstall_release "pika-master-slave"
+        cleanup_backups
+        ;;
+    --master-slave)
+        echo "Uninstalling Pika master-slave components..."
+        uninstall_release "pika-master-slave-cluster"
+        uninstall_release "pika-master-slave"
+        cleanup_backups
+        ;;
+    *)
+        echo "Usage: $0 [--all|--master-slave]"
+        exit 1
+        ;;
+esac
+
+echo "Uninstallation completed"

Key improvements:

  • Added backup resource cleanup for master-slave variant
  • Maintained alignment with documented uninstall procedures in README.md
  • Preserved the order of uninstallation (cluster first, then definition)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if there are any backup resources that need cleanup
echo "Checking for backup resources..."

# Check for backup CRDs and resources
kubectl get backups.dataprotection.kubeblocks.io 2>/dev/null || true
kubectl get backuprepositories.dataprotection.kubeblocks.io 2>/dev/null || true

Length of output: 201


Script:

#!/bin/bash
# Check for related helm scripts and their structure
fd -t f . tools/kubeblocks_helm/

# Look for any backup related code
rg -l "backup" tools/kubeblocks_helm/

# Check content of uninstall.sh for comparison
cat tools/kubeblocks_helm/uninstall.sh

# Look for any documentation about these scripts
cat tools/kubeblocks_helm/README.md 2>/dev/null || echo "No README found"

Length of output: 6398

🧰 Tools
🪛 Shellcheck (0.10.0)

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

tools/kubeblocks_helm/pika-master-slave-cluster/templates/role.yaml (2)

21-27: Validate pod access requirements

The role grants broad read access to all pods. Consider:

  1. Limiting pod access to specific labels/names
  2. Documenting why pod-level access is needed for backup operations
✅ Verification successful

Pod access permissions are appropriately scoped

The pod access permissions in the role are justified and appropriately scoped:

  1. The values.yaml explicitly documents that RBAC permissions are "used by cluster component pod" including event creation
  2. The role is tightly bound to a specific service account via RoleBinding in the same namespace
  3. The permissions are read-only (get, list, watch) without any modification rights
  4. The role is namespace-scoped rather than cluster-wide
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for pod interactions in backup/restore logic
rg -l "pods" | grep -i "backup\|restore"

Length of output: 40


Script:

#!/bin/bash
# Let's first check the broader context of the role.yaml file
cat tools/kubeblocks_helm/pika-master-slave-cluster/templates/role.yaml

# Then search for any pod-related operations in the codebase
rg -A 5 -B 5 "pods.*get|pods.*list|pods.*watch" 

# Also check if there are any comments or documentation about pod access
rg -l "pod.*access|access.*pod" --type md

Length of output: 1398


Script:

#!/bin/bash
# Let's check what this role is used for by examining the service account bindings
cat tools/kubeblocks_helm/pika-master-slave-cluster/templates/rolebinding.yaml

# Let's also check if there are any other related RBAC files
fd "role.*\.yaml|rolebinding.*\.yaml" tools/kubeblocks_helm/pika-master-slave-cluster/templates/

# Check for any pod-related operations in the chart's values or other template files
rg -A 3 "pod" tools/kubeblocks_helm/pika-master-slave-cluster/

Length of output: 1442


14-20: Review cluster-level permissions scope

The role grants broad read access to all clusters. Consider:

  1. Limiting scope to specific cluster names if possible
  2. Adding resource names restriction for better granularity
✅ Verification successful

Let me gather more context about the cluster-related permissions in the codebase.


Let me check the rolebinding to understand how this role is being used and its scope.


Let me check the helpers template to understand how the cluster name is scoped.


The current RBAC permissions are appropriately scoped

The role and rolebinding are already well-scoped for the following reasons:

  • The role is namespace-scoped (not cluster-wide)
  • The role name is templated with specific cluster name (kb-{{ include "clustername" . }})
  • The permissions are minimal and read-only for clusters (get/list/watch)
  • The role is bound to a specific service account that's unique to the cluster instance
  • The implementation follows the principle of least privilege by only granting necessary permissions
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for other roles/bindings that might grant similar permissions
rg -l "apps.kubeblocks.io" | grep -i "role\|binding"

Length of output: 120


Script:

#!/bin/bash
# Check the complete role definition and other RBAC-related configurations
cat tools/kubeblocks_helm/pika-master-slave-cluster/templates/role.yaml

# Look for any resource name restrictions or selectors
rg "resourceNames:" tools/kubeblocks_helm/pika-master-slave-cluster/templates/role.yaml

# Check if there are any rolebindings to understand the scope of access
fd rolebinding -t f tools/kubeblocks_helm/pika-master-slave-cluster/templates/

Length of output: 823


Script:

#!/bin/bash
# Check the rolebinding configuration
cat tools/kubeblocks_helm/pika-master-slave-cluster/templates/rolebinding.yaml

# Check if there are any cluster-specific templates or values that might restrict scope
fd "values.yaml|_helpers.tpl" -t f tools/kubeblocks_helm/pika-master-slave-cluster/

Length of output: 745


Script:

#!/bin/bash
# Check the helpers template for cluster name scoping
cat tools/kubeblocks_helm/pika-master-slave-cluster/templates/_helpers.tpl

# Check values.yaml for any default configurations
cat tools/kubeblocks_helm/pika-master-slave-cluster/values.yaml

Length of output: 3077

tools/kubeblocks_helm/pika-master-slave/templates/backuppolicytemplate.yaml (1)

14-21: Review backup method configuration

The backup configuration has potential concerns:

  1. snapshotVolumes: false means no volume snapshots - ensure this aligns with data consistency requirements
  2. Verify that the /data mountPath contains all necessary data for a complete backup
✅ Verification successful

Let's search for volume configurations more broadly to understand the data persistence and backup setup in the Pika templates.


Backup configuration is correctly aligned with data persistence setup

The backup configuration is properly configured based on the following findings:

  • The /data mountPath in backup policy matches the primary data volume mount in the Pika component definition
  • The volume mount is consistently used across containers (init, main pika container) for data persistence
  • The backup action set (pika-backup) is configured to use the same /data directory and includes proper backup/restore scripts
  • snapshotVolumes: false is appropriate since Pika's data files are being backed up directly from the mounted volume
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for volume mount configurations
rg -l "mountPath.*data" | grep -i "volume\|mount"

Length of output: 51


Script:

#!/bin/bash
# Search for volume mount configurations with broader pattern
rg -l "/data" --type yaml

# Search for volume configurations in pika templates
fd . "tools/kubeblocks_helm/pika-master-slave/templates" -e yaml -x cat {} 

Length of output: 6294

tools/kubeblocks_helm/pika-master-slave/templates/backupactionset.yaml (1)

19-20: Verify Redis image security implications

Running backup operations with the Redis image on target pod nodes requires careful security consideration.

Let's verify the image configuration and permissions:

✅ Verification successful

Redis backup operations appear safe with minimal privileges

The backup process performs standard Redis operations (BGSAVE) and file operations within a confined data directory. The script uses basic shell commands and doesn't require elevated privileges. Key observations:

  • Uses redis-cli for database operations
  • File operations limited to specific directories (log, db)
  • No privileged operations or system-level modifications detected
  • Runs with standard container permissions

While runOnTargetPodNode is set to true, the operations performed are safe and contained within the expected scope of a backup process.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check Redis image references in the codebase
rg -l 'image:.*redis' --type yaml

# Look for any security context configurations
rg -l 'securityContext' --type yaml

Length of output: 336


Script:

#!/bin/bash
# Check the content of files with Redis image references to understand the security context
cat tools/kubeblocks_helm/pika-master-slave/templates/backupactionset.yaml tools/kubeblocks_helm/pika-master-slave/templates/componentdefinition-pika.yaml

# Check for privilege-related configurations in the backup action
rg -A 5 'runOnTargetPodNode' --type yaml

Length of output: 4952


Script:

#!/bin/bash
# Check the backup script content to understand what operations are performed
cat tools/kubeblocks_helm/pika-master-slave/dataprotection/backup.sh

# Check if there are any specific security configurations for the backup process
rg -A 10 'securityContext|capabilities|privileged' tools/kubeblocks_helm/pika-master-slave/templates/

Length of output: 1153

tools/kubeblocks_helm/pika-master-slave/templates/componentdefinition-pika.yaml (1)

28-34: LGTM! Well-defined role configuration.

The roles are properly defined with clear attributes for master and slave nodes, following Kubernetes best practices.

@@ -0,0 +1 @@
helm install pika-master-slave ./pika-master-slave && helm install pika-master-slave-cluster ./pika-master-slave-cluster
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding pre-installation checks and cleanup options

The script should verify the environment and provide cleanup options:

  1. Check if Helm is installed
  2. Check if charts are already installed
  3. Provide options to clean up existing installations

Would you like me to provide an enhanced implementation with these features?

🧰 Tools
🪛 Shellcheck (0.10.0)

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)


⚠️ Potential issue

Add shebang and proper error handling

The script needs several improvements for reliability and portability:

  1. Missing shebang line
  2. No error handling
  3. Relative paths could fail depending on execution context

Consider this improved implementation:

+#!/bin/bash
+set -eo pipefail
+
+# Script directory
+SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
+
+# Function to check if chart exists
+check_chart() {
+    local chart_path="$1"
+    if [ ! -d "$chart_path" ]; then
+        echo "Error: Chart not found at $chart_path"
+        exit 1
+    fi
+}
+
+# Check charts exist
+check_chart "${SCRIPT_DIR}/pika-master-slave"
+check_chart "${SCRIPT_DIR}/pika-master-slave-cluster"
+
+# Install charts with error handling
+echo "Installing pika-master-slave chart..."
+if ! helm install pika-master-slave "${SCRIPT_DIR}/pika-master-slave"; then
+    echo "Failed to install pika-master-slave chart"
+    exit 1
+fi
+
+echo "Installing pika-master-slave-cluster chart..."
+if ! helm install pika-master-slave-cluster "${SCRIPT_DIR}/pika-master-slave-cluster"; then
+    echo "Failed to install pika-master-slave-cluster chart"
+    exit 1
+fi
+
+echo "Installation completed successfully"
-helm install pika-master-slave ./pika-master-slave && helm install pika-master-slave-cluster ./pika-master-slave-cluster
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
helm install pika-master-slave ./pika-master-slave && helm install pika-master-slave-cluster ./pika-master-slave-cluster
#!/bin/bash
set -eo pipefail
# Script directory
SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
# Function to check if chart exists
check_chart() {
local chart_path="$1"
if [ ! -d "$chart_path" ]; then
echo "Error: Chart not found at $chart_path"
exit 1
fi
}
# Check charts exist
check_chart "${SCRIPT_DIR}/pika-master-slave"
check_chart "${SCRIPT_DIR}/pika-master-slave-cluster"
# Install charts with error handling
echo "Installing pika-master-slave chart..."
if ! helm install pika-master-slave "${SCRIPT_DIR}/pika-master-slave"; then
echo "Failed to install pika-master-slave chart"
exit 1
fi
echo "Installing pika-master-slave-cluster chart..."
if ! helm install pika-master-slave-cluster "${SCRIPT_DIR}/pika-master-slave-cluster"; then
echo "Failed to install pika-master-slave-cluster chart"
exit 1
fi
echo "Installation completed successfully"
🧰 Tools
🪛 Shellcheck (0.10.0)

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

Comment on lines 1 to 9
kbcli backuprepo create my-repo \
--provider s3 \
--region <...>\
--bucket <...> \
--endpoint <...>\
--access-key-id <...> \
--secret-access-key <...> \
--access-method <...> \
--default \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Consider security implications of exposing credentials in command

The command contains sensitive credentials that should not be exposed in plain text. Consider:

  1. Using environment variables or credential files
  2. Supporting AWS IAM roles for authentication
  3. Adding documentation about secure credential management

Example secure usage:

export S3_ACCESS_KEY=<access_key>
export S3_SECRET_KEY=<secret_key>

kbcli backuprepo create my-repo \
  --provider s3 \
  --region ${S3_REGION} \
  --bucket ${S3_BUCKET} \
  --endpoint ${S3_ENDPOINT} \
  --access-key-id ${S3_ACCESS_KEY} \
  --secret-access-key ${S3_SECRET_KEY} \
  --access-method ${S3_ACCESS_METHOD} \
  --default

Comment on lines +15 to +20
backupFile="${DP_BACKUP_NAME}.tar.zst"
if [ "$(datasafed list ${backupFile})" == "${backupFile}" ]; then
datasafed pull -d zstd-fastest "${backupFile}" - | tar -xvf - -C ${DATA_DIR}
else
datasafed pull "${DP_BACKUP_NAME}.tar.gz" - | tar -xzvf - -C ${DATA_DIR}
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Verify backup file existence before restoration

The script should explicitly verify the backup file's existence before attempting restoration.

Add explicit validation:

 backupFile="${DP_BACKUP_NAME}.tar.zst"
+if ! datasafed list | grep -q "${backupFile}\|${DP_BACKUP_NAME}.tar.gz"; then
+  echo "Error: Backup file not found in repository"
+  exit 1
+fi
+
 if [ "$(datasafed list ${backupFile})" == "${backupFile}" ]; then
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
backupFile="${DP_BACKUP_NAME}.tar.zst"
if [ "$(datasafed list ${backupFile})" == "${backupFile}" ]; then
datasafed pull -d zstd-fastest "${backupFile}" - | tar -xvf - -C ${DATA_DIR}
else
datasafed pull "${DP_BACKUP_NAME}.tar.gz" - | tar -xzvf - -C ${DATA_DIR}
fi
backupFile="${DP_BACKUP_NAME}.tar.zst"
if ! datasafed list | grep -q "${backupFile}\|${DP_BACKUP_NAME}.tar.gz"; then
echo "Error: Backup file not found in repository"
exit 1
fi
if [ "$(datasafed list ${backupFile})" == "${backupFile}" ]; then
datasafed pull -d zstd-fastest "${backupFile}" - | tar -xvf - -C ${DATA_DIR}
else
datasafed pull "${DP_BACKUP_NAME}.tar.gz" - | tar -xzvf - -C ${DATA_DIR}
fi

Comment on lines +1 to +2
set -e
set -o pipefail
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add shebang and improve error handling setup

The script is missing a shebang line and could benefit from stricter error handling.

Apply this diff to improve the script initialization:

+#!/bin/bash
+set -euo pipefail
-set -e
-set -o pipefail
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
set -e
set -o pipefail
#!/bin/bash
set -euo pipefail
🧰 Tools
🪛 Shellcheck (0.10.0)

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

sleep 1
done

cd ${DATA_DIR}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add error handling for directory change

The cd command should include error handling as suggested by shellcheck.

Add error handling for directory change:

-cd ${DATA_DIR}
+cd "${DATA_DIR}" || {
+    echo "Error: Failed to change to directory ${DATA_DIR}"
+    exit 1
+}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
cd ${DATA_DIR}
cd "${DATA_DIR}" || {
echo "Error: Failed to change to directory ${DATA_DIR}"
exit 1
}
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 29-29: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

Comment on lines +31 to +36
if [ -d "log" ] || [ -d "db" ]; then
tar -cvf - ./log ./db | datasafed push -z zstd-fastest - "${DP_BACKUP_NAME}.tar.zst"
else
echo "no log db"
exit 1
fi
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error message and add size validation

The error message for missing directories is too brief, and there's no validation of directory sizes.

Enhance the validation and error handling:

+# Check directory existence and size
+MIN_BACKUP_SIZE=1024  # 1KB minimum
+
 if [ -d "log" ] || [ -d "db" ]; then
+    total_size=$(du -sb ./log ./db 2>/dev/null | awk '{sum += $1} END {print sum}')
+    if [ "$total_size" -lt "$MIN_BACKUP_SIZE" ]; then
+        echo "Warning: Backup size ($total_size bytes) is unusually small"
+    fi
     tar -cvf - ./log ./db | datasafed push -z zstd-fastest - "${DP_BACKUP_NAME}.tar.zst"
 else
-    echo "no log db"
+    echo "Error: Required directories 'log' and/or 'db' not found in ${DATA_DIR}"
+    echo "Please verify that the Redis instance is properly configured and running"
     exit 1
 fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [ -d "log" ] || [ -d "db" ]; then
tar -cvf - ./log ./db | datasafed push -z zstd-fastest - "${DP_BACKUP_NAME}.tar.zst"
else
echo "no log db"
exit 1
fi
# Check directory existence and size
MIN_BACKUP_SIZE=1024 # 1KB minimum
if [ -d "log" ] || [ -d "db" ]; then
total_size=$(du -sb ./log ./db 2>/dev/null | awk '{sum += $1} END {print sum}')
if [ "$total_size" -lt "$MIN_BACKUP_SIZE" ]; then
echo "Warning: Backup size ($total_size bytes) is unusually small"
fi
tar -cvf - ./log ./db | datasafed push -z zstd-fastest - "${DP_BACKUP_NAME}.tar.zst"
else
echo "Error: Required directories 'log' and/or 'db' not found in ${DATA_DIR}"
echo "Please verify that the Redis instance is properly configured and running"
exit 1
fi

Comment on lines +21 to +27
while true; do
end_save=$(${connect_url} LASTSAVE)
if [ $end_save -ne $last_save ];then
break
fi
sleep 1
done
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add timeout to BGSAVE wait loop

The while loop waiting for BGSAVE completion lacks a timeout mechanism.

Add a timeout to prevent infinite waiting:

+MAX_WAIT_SECONDS=300
+start_time=$(date +%s)
 while true; do
   end_save=$(${connect_url} LASTSAVE)
   if [ $end_save -ne $last_save ];then
      break
   fi
+  current_time=$(date +%s)
+  if [ $((current_time - start_time)) -gt $MAX_WAIT_SECONDS ]; then
+    echo "Error: BGSAVE operation timed out after ${MAX_WAIT_SECONDS} seconds"
+    exit 1
+  fi
   sleep 1
 done
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
while true; do
end_save=$(${connect_url} LASTSAVE)
if [ $end_save -ne $last_save ];then
break
fi
sleep 1
done
MAX_WAIT_SECONDS=300
start_time=$(date +%s)
while true; do
end_save=$(${connect_url} LASTSAVE)
if [ $end_save -ne $last_save ];then
break
fi
current_time=$(date +%s)
if [ $((current_time - start_time)) -gt $MAX_WAIT_SECONDS ]; then
echo "Error: BGSAVE operation timed out after ${MAX_WAIT_SECONDS} seconds"
exit 1
fi
sleep 1
done

Comment on lines +81 to +93
### Back up and restore a cluster
Ensure that the default BackupRepo is defined
Fellow the kubeblock docs [kubeblocks](https://www.kubeblocks.io/docs/preview/user_docs/maintenance/backup-and-restore/backup/backup-repo)

create backup
```bash
kbcli cluster backup pika-master-slave-cluster --method datafile
```

Select a backup and create a cluster.

```bash
kbcli cluster restore <clusterName> --backup <backup-name>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Documentation needs enhancement for better user guidance.

The backup and restore section requires several improvements:

  1. Add prerequisites and setup steps for BackupRepo
  2. Include the command to view existing backups: kbcli cluster list-backups
  3. Provide a complete example of the restore command with actual values
  4. Fix typo: "Fellow" should be "Follow"
  5. Add proper closure for the last code block

Here's the suggested improvement:

 ### Back up and restore a cluster
-Ensure that the default BackupRepo is defined
-Fellow the kubeblock docs [kubeblocks](https://www.kubeblocks.io/docs/preview/user_docs/maintenance/backup-and-restore/backup/backup-repo)
+#### Prerequisites
+1. Ensure that the default BackupRepo is defined:
+   ```bash
+   kbcli backuprepo create my-repo
+   ```
+2. Follow the Kubeblocks docs for detailed setup: [Backup Repository](https://www.kubeblocks.io/docs/preview/user_docs/maintenance/backup-and-restore/backup/backup-repo)
 
-create backup
+#### Create a backup
 ```bash
 kbcli cluster backup pika-master-slave-cluster --method datafile

-Select a backup and create a cluster.
+#### List available backups
+bash +kbcli cluster list-backups +

+#### Restore from a backup

-kbcli cluster restore <clusterName> --backup <backup-name>
+kbcli cluster restore test-pika --backup backup-default-pika-master-slave-cluster-20241203170414

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines +36 to +53
roleProbe:
builtinHandler: custom
customHandler:
image: {{ include "redis.image" .}}
exec:
command:
- /bin/bash
- -c
- |
IFS=',' read -r -a pod_name_array <<< ${KB_CLUSTER_COMPONENT_POD_NAME_LIST}
# Ergodic name array
headless=".pika-master-slave-cluster-pika-headless"
for i in "${!pod_name_array[@]}"; do
pod_name=${pod_name_array[${i}]}
# Extracting the digital portion of the pod
pod_number=$(echo "${pod_name}" | grep -o '[0-9]*')
# If the pod number is greater than 0
if (( pod_number > 0 )); then
pod0_svc=${pod_name_array[0]}${headless}
echo "exec:redis-cli -h ${pod_name}${headless} -p 9221 slaveof ${pod0_svc}:9221"
redis-cli -h "${pod_name}${headless}" -p 9221 slaveof "${pod0_svc}" 9221
fi
done
preCondition: ComponentReady
- /bin/bash
- -c
- |
replication_info=$(redis-cli -h 127.0.0.1 -p 9221 info replication)
role=$(echo "${replication_info}" | awk -F':' '/^role:/ {print $2}' | tr -d '[:space:]')
if [ "$role"x = "master"x ]; then
echo -n "master"
else
echo -n "slave"
fi
periodSeconds: 1
timeoutSeconds: 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance roleProbe reliability with proper error handling.

The current implementation has several potential reliability issues:

  1. No error handling for redis-cli command failures
  2. No timeout handling for redis-cli command
  3. Hardcoded port number should be parameterized

Suggested improvement:

       customHandler:
         image: {{ include "redis.image" .}}
         exec:
           command:
             - /bin/bash
             - -c
             - |
-              replication_info=$(redis-cli -h 127.0.0.1 -p 9221 info replication)
+              REDIS_PORT=${REDIS_PORT:-9221}
+              timeout 5s redis-cli -h 127.0.0.1 -p ${REDIS_PORT} info replication > /tmp/replication_info || {
+                echo "Failed to get replication info" >&2
+                exit 1
+              }
+              replication_info=$(cat /tmp/replication_info)
               role=$(echo "${replication_info}" | awk -F':' '/^role:/ {print $2}' | tr -d '[:space:]')
+              if [ -z "$role" ]; then
+                echo "Failed to determine role" >&2
+                exit 1
+              }
               if [ "$role"x = "master"x ]; then
                 echo -n "master"
               else
                 echo -n "slave"
-              fi  
+              fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
roleProbe:
builtinHandler: custom
customHandler:
image: {{ include "redis.image" .}}
exec:
command:
- /bin/bash
- -c
- |
IFS=',' read -r -a pod_name_array <<< ${KB_CLUSTER_COMPONENT_POD_NAME_LIST}
# Ergodic name array
headless=".pika-master-slave-cluster-pika-headless"
for i in "${!pod_name_array[@]}"; do
pod_name=${pod_name_array[${i}]}
# Extracting the digital portion of the pod
pod_number=$(echo "${pod_name}" | grep -o '[0-9]*')
# If the pod number is greater than 0
if (( pod_number > 0 )); then
pod0_svc=${pod_name_array[0]}${headless}
echo "exec:redis-cli -h ${pod_name}${headless} -p 9221 slaveof ${pod0_svc}:9221"
redis-cli -h "${pod_name}${headless}" -p 9221 slaveof "${pod0_svc}" 9221
fi
done
preCondition: ComponentReady
- /bin/bash
- -c
- |
replication_info=$(redis-cli -h 127.0.0.1 -p 9221 info replication)
role=$(echo "${replication_info}" | awk -F':' '/^role:/ {print $2}' | tr -d '[:space:]')
if [ "$role"x = "master"x ]; then
echo -n "master"
else
echo -n "slave"
fi
periodSeconds: 1
timeoutSeconds: 1
roleProbe:
builtinHandler: custom
customHandler:
image: {{ include "redis.image" .}}
exec:
command:
- /bin/bash
- -c
- |
REDIS_PORT=${REDIS_PORT:-9221}
timeout 5s redis-cli -h 127.0.0.1 -p ${REDIS_PORT} info replication > /tmp/replication_info || {
echo "Failed to get replication info" >&2
exit 1
}
replication_info=$(cat /tmp/replication_info)
role=$(echo "${replication_info}" | awk -F':' '/^role:/ {print $2}' | tr -d '[:space:]')
if [ -z "$role" ]; then
echo "Failed to determine role" >&2
exit 1
}
if [ "$role"x = "master"x ]; then
echo -n "master"
else
echo -n "slave"
fi
periodSeconds: 1
timeoutSeconds: 1
🧰 Tools
🪛 yamllint (1.35.1)

[error] 51-51: trailing spaces

(trailing-spaces)

Comment on lines +92 to +112
- name: redis-cli-runner
image: {{ include "redis.image" . }}
imagePullPolicy: {{ include "pika.imagePullPolicy" . }}
command:
- /bin/bash
- -c
- |
while true; do
hostname=$KB_POD_NAME
if [[ "$hostname" =~ -([0-9]+)$ ]]; then
ordinal=${BASH_REMATCH[1]}
if [[ $ordinal -ge 1 && $ordinal -le 9 ]]; then
replication_info=$(redis-cli -h 127.0.0.1 -p 9221 info replication)
role=$(echo "${replication_info}" | awk -F':' '/^role:/ {print $2}' | tr -d '[:space:]')
if [ "$role"x = "master"x ]; then
redis-cli -h 127.0.0.1 -p 9221 slaveof pika-master-slave-cluster-pika-0.pika-master-slave-cluster-pika-headless 9221
fi
fi
fi
sleep 10
done
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve redis-cli-runner container reliability and maintainability.

The current implementation has several areas for improvement:

  1. Infinite loop without proper termination
  2. Missing error handling for redis-cli commands
  3. Hardcoded sleep interval
  4. Magic numbers in ordinal check
  5. Limited logging for debugging

Suggested improvement:

       - name: redis-cli-runner
         image: {{ include "redis.image" . }}
         imagePullPolicy: {{ include "pika.imagePullPolicy" . }}
+        env:
+          - name: REDIS_PORT
+            value: "9221"
+          - name: SLEEP_INTERVAL
+            value: "10"
+          - name: MAX_REPLICAS
+            value: "9"
         command:
           - /bin/bash
           - -c
           - |
+            set -eo pipefail
+            
+            log() {
+              echo "$(date -u +"%Y-%m-%d %H:%M:%S") $1" >&2
+            }
+            
             while true; do
               hostname=$KB_POD_NAME
               if [[ "$hostname" =~ -([0-9]+)$ ]]; then
                 ordinal=${BASH_REMATCH[1]}
-                if [[ $ordinal -ge 1 && $ordinal -le 9 ]]; then
+                if [[ $ordinal -ge 1 && $ordinal -le ${MAX_REPLICAS} ]]; then
+                  log "Checking replication status for ordinal $ordinal"
-                  replication_info=$(redis-cli -h 127.0.0.1 -p 9221 info replication)
+                  replication_info=$(timeout 5s redis-cli -h 127.0.0.1 -p ${REDIS_PORT} info replication) || {
+                    log "Failed to get replication info"
+                    sleep ${SLEEP_INTERVAL}
+                    continue
+                  }
                   role=$(echo "${replication_info}" | awk -F':' '/^role:/ {print $2}' | tr -d '[:space:]')
                   if [ "$role"x = "master"x ]; then
-                    redis-cli -h 127.0.0.1 -p 9221 slaveof pika-master-slave-cluster-pika-0.pika-master-slave-cluster-pika-headless 9221
+                    log "Setting up replication for ordinal $ordinal"
+                    timeout 5s redis-cli -h 127.0.0.1 -p ${REDIS_PORT} slaveof pika-master-slave-cluster-pika-0.pika-master-slave-cluster-pika-headless ${REDIS_PORT} || {
+                      log "Failed to set up replication"
+                    }
                   fi
                 fi
               fi
-              sleep 10
+              sleep ${SLEEP_INTERVAL}
             done
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: redis-cli-runner
image: {{ include "redis.image" . }}
imagePullPolicy: {{ include "pika.imagePullPolicy" . }}
command:
- /bin/bash
- -c
- |
while true; do
hostname=$KB_POD_NAME
if [[ "$hostname" =~ -([0-9]+)$ ]]; then
ordinal=${BASH_REMATCH[1]}
if [[ $ordinal -ge 1 && $ordinal -le 9 ]]; then
replication_info=$(redis-cli -h 127.0.0.1 -p 9221 info replication)
role=$(echo "${replication_info}" | awk -F':' '/^role:/ {print $2}' | tr -d '[:space:]')
if [ "$role"x = "master"x ]; then
redis-cli -h 127.0.0.1 -p 9221 slaveof pika-master-slave-cluster-pika-0.pika-master-slave-cluster-pika-headless 9221
fi
fi
fi
sleep 10
done
- name: redis-cli-runner
image: {{ include "redis.image" . }}
imagePullPolicy: {{ include "pika.imagePullPolicy" . }}
env:
- name: REDIS_PORT
value: "9221"
- name: SLEEP_INTERVAL
value: "10"
- name: MAX_REPLICAS
value: "9"
command:
- /bin/bash
- -c
- |
set -eo pipefail
log() {
echo "$(date -u +"%Y-%m-%d %H:%M:%S") $1" >&2
}
while true; do
hostname=$KB_POD_NAME
if [[ "$hostname" =~ -([0-9]+)$ ]]; then
ordinal=${BASH_REMATCH[1]}
if [[ $ordinal -ge 1 && $ordinal -le ${MAX_REPLICAS} ]]; then
log "Checking replication status for ordinal $ordinal"
replication_info=$(timeout 5s redis-cli -h 127.0.0.1 -p ${REDIS_PORT} info replication) || {
log "Failed to get replication info"
sleep ${SLEEP_INTERVAL}
continue
}
role=$(echo "${replication_info}" | awk -F':' '/^role:/ {print $2}' | tr -d '[:space:]')
if [ "$role"x = "master"x ]; then
log "Setting up replication for ordinal $ordinal"
timeout 5s redis-cli -h 127.0.0.1 -p ${REDIS_PORT} slaveof pika-master-slave-cluster-pika-0.pika-master-slave-cluster-pika-headless ${REDIS_PORT} || {
log "Failed to set up replication"
}
fi
fi
fi
sleep ${SLEEP_INTERVAL}
done

@chejinge chejinge changed the title finish backup and restore feat:finish backup and restore Dec 6, 2024
@chejinge chejinge added 3.5.6 4.0.2 and removed 📒 Documentation Improvements or additions to documentation labels Dec 6, 2024
@baerwang baerwang merged commit 713e6fb into OpenAtomFoundation:unstable Dec 9, 2024
18 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants