Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DR diagnose and troubleshoot tool #1702

Open
nirs opened this issue Dec 5, 2024 · 0 comments
Open

DR diagnose and troubleshoot tool #1702

nirs opened this issue Dec 5, 2024 · 0 comments
Assignees

Comments

@nirs
Copy link
Member

nirs commented Dec 5, 2024

We want to make DR debugging easier for developers, testers and support teams.

Debugging DR issues is quite complex currently due to the number of components involved and overall complexity. It takes too much time and it's hard to get into it for new joiners. If more issues will be reported, developer resources don't scale to tackle many issue at the same time and keep up work on new features at the same time.

We want to have a command line tool that can help debugging DR issues in multi-cluster environment. The tool should be easy to install and work with Kubernetes and OpenShift clusters.

The tool can be a standalone tool, or a kubectl or oc plugins (a standalone command designed to be called as a kubectl or oc sub command.

Common DR problems that a tool can help with

XXX Need input from @BenamarMk and @netzzer

Operation ideas

  • Validate existing environment
    • check the creation of the VRC ManifestWork
    • validate S3 secrets
    • test the connection to the S3 store using the Ramen ConfigMap configuration
    • verify if mirroring is enabled
    • ensure that an application can be deployed, failed over, and relocated back to the primary cluster

The tool can be expanded in the future to include more sophisticated diagnostics and provide actionable recommendations for resolving issues.

Usage ideas

  • Command line tool like subctl with sub commands like validate, diagnose and more
  • A kubectl/oc plugin: installed via krew. This is basically same as command line tool, but optimized to run as a sub command (e.g. kubectl dr validate ...).
  • Using e2e as a tool: command line built from the e2e tests, running test suites like TestValidate for running validation

Tool name ideas

  • drctl - like subctl
  • kubectl dr - as a plugin for kubectl
  • oc dr - as a plugin for oc
  • ramenctl - we allready have a tool for deploying and configuring in drenv environment. We can teach it to do more.

Using the "dr" for the name is better than "ramen". We want to grab this name, for example in krew plugin repository, and it is shorter and makes more sense to users compared with ramen.

Relation to e2e

Some code needed by the tool (getting dr and related resources, performing dr actions) are already implemented in the e2e framework. It will be useful to share the same code in both e2e and the tool.

Supported environments

We want a tool that can work with all environments. Developers use a drenv based environments daily for debugging and testing (e.g. using e2e). Users use OpenShift clusters, and in the future we hope that ramen will become the DR solution for Kubernetes so we want to make the tool compatible with any Kubernetes cluster.

  • drenv testing environment - in this case we have an environment yaml describing the environment, and the local kubeconfig file contains configuration for all the clusters (e.g. hub, dr1, dr2)
  • OpenShift environment - in this case we may have 3 kubeconfigs downloaded from the openshift cluster console. They may have the same names so we cannot add them as is to the local kubeconfig, or we don't want to add them to the config since they are temporary clusters needed only during debugging. drenv also generates kubeconfig files in ~/.config/drenv/envname/
    • The oc-cluterset tool can be used to import kubeconfigs from existing OpenShift environment into a local kubeconfig. This makes it easy to work with the cluster when you need to re-login every day to the same clusters.
  • Kubernetes environment: this should work just like OpenShift environment.

Managing kubeconfigs

The common use case is having 3 kubeconfig files, one per cluster. People use them today by exporting KUBECONFIG=/path/to/cluster/config in multiple shells, or using the kubectl/oc --kubecnofig=/path/to/cluster/config.

We want to support multiple kubeconfigs case, because users are already there. But this is not a useful way to work with the tool. The tool need to be able to access all clusters and passing 3 kubeconfig files to every command is pretty bad user experience.

Ideally after we import the kubeconfigs, we can also use kubectl or oc with the same kubeconfigs to access multiple clusters from the same shell. This can be done today by oc-cluterset. We can integrate this code in the tool so users to not have to use multiple tools, or use assume that users will install it.

The e2e tests use a config file to point to the cluster kubeconfigs. We can use same config file. User will need to prepare the config file to use the tool. Another option used by kubectl-ramen is to import the kubeconfig files into internal config file, so users only need to pass the kubeconfigs once, and then they can use a name to refer to the cluster set. This is more useful when working with the same clusters for longer time, for example a testing environment used for few days or weeks, or real clusters used for several yers.

Relation to kubectl-ramen plugin

We started kubectl-ramen plugin, which has related dr commands. It makes sense to have the debugging and diagnose features in the same tool similar to subctl.

We did not make lot of progress with the plugin yet. The only thing implemented yet is importing set of clusters to local config. We can reuse this code if we want to go with this approach.

See also #673

Similar tools

  • subctl - the verify and diagnose are good examples for debugging commands
  • virtctl - I don't think it has debugging and troubleshooting features, but it is an example for tool helping to manage a complex system like kubevirt, and an example for naming the tool in a more generic way.
  • rook-cpeh kubectl/oc plugin krew plugin has a "dr health" command verifying rbd mirroring health.
  • odf-cli may be a good place to host the dr related commands. It already have dr related sub commands:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant