Skip to content

v2.0.0

Compare
Choose a tag to compare
@pchandra19 pchandra19 released this 07 Feb 21:35
· 668 commits to develop since this release

Release v2.0.0

Release Date: 8th February 2023

Summary

The 2.0.0 release is a major release, primarily focusing on enhancing the availability, stability and observability of Mayastor, the lightning fast NVMe-based block storage solution for Kubernetes stateful workloads.

Features

1. Availability features

On-demand nexus switch-over on failure detection

A nexus is a data structure created for every Mayastor volume, within the Mayastor instance, which acts as an NVMe controller and performs IO operations for that volume. Each Mayastor volume comprises of a single nexus and one or more replicas. Prior to the 2.0.0 version, nexus had been a single point of failure. To mitigate this, a switch-over(and fail-over) logic has been added in 2.0.0. With this Nexus switch-over/fail-over logic in place, if a nexus is unavailable either as a consequence of failure scenarios associated with software errors, faulty nodes/network paths/underlying storage, or planned events like upgrades, the nexus is recreated on a node that’s most optimal and applications are reconnected to the newly created nexus instances automatically, to ensure I/O continuity.

It should be noted that the switch-over will not happen if the node housing the nexus also contains the last (or only) healthy replica of the volume and that replica is currently inaccessible.

Cordon a node

In 2.0.0, Mayastor introduces the ability to cordon a node, effectively preventing any resources from being provisioned on the said node. Cordoning a node is an indication to the control plane that the node is going to have something done to it which is outside of its usual mode of operation, therefore the control plane should temporarily omit it from its scheduling logic.

Node cordoning is desirable when performing operations where the creation of new resources on the node would be problematic, such as during:

  • Maintenance
  • Upgrades
  • Debugging

Drain a node

In 2.0.0, Mayastor introduces the ability to drain a node, effectively removing/deleting resources from the said node, such that the node ends up in an “empty“ state. Only nexus draining is supported in this release. Node draining helps lay the foundation to support non-disruptive upgrades in the future.

2. Observability features

Pool metrics exporter

A metrics exporter runs as a container in every IO-engine pod and exposes pool capacity metrics in the Prometheus format.

Volume stats exporter

Implemented NodeGetVolumeStats RPC service in Mayastor CSI node-plugin. These metrics are file-system statistics and are exposed by the kubelet on each node.

3. Stability features

  • Fixed a possible deadlock scenario with NVMe controller events in IO-engine.
  • Fixed a possible data corruption issue when a rebuilding replica encounters a fault, under heavy load.
  • Fixed a panic in the nexus_destroy codepath.
  • Fixed an issue in CSI node_unstage codepath.
  • Added a Kubernetes watcher to process deletion events of Persistent Volumes, left behind by the deletion of PVCs with the reclaim policy set to “retain“.
  • Fixed an issue with de-registration of gRPC service on receiving sigterm event.
  • Fixed a memory corruption issue during parallel shutdown of multiple nexuses.
  • Added a fix to prevent file-system errors during shutdown.
  • Fixed NVMF sub-system issues with error handling during nexus unshare.
  • Added allowed-hosts control to NVMe targets to prevent access from outdated nodes, as part of switchover.
  • Added persistent NVMe reservations on replicas to prevent split-brain scenarios.

4. Supportability features

The 2.0.0 release provides an option in the Mayastor kubectl plug-in to collect supportability information from the cluster for better debugging purposes. The information is collected as a bundle and includes (not limited to) Mayastor component logs, command line and system outputs, Kubernetes resource specs of Mayastor resources and dump of Mayastor etcd data. For storing and retrieving historical logs, the 2.0.0. installer configures and installs a Loki stack alongside the Mayastor components.

5. Ease-of-use features

Single Helm chart-based installation

Mayastor installation process has been simplified in 2.0.0 with a single Helm chart, an enhancement from the prior releases where there were separate Helm charts for control-plane and data-plane.

NodePort dependency removal

Prior to 2.0.0, it was required to configure the etcd and api-rest as services of NodePort type in addition to opening up these ports on the firewall for accessing these services from the Mayastor kubectl-plugin, from outside the cluster. This issue has now been resolved in 2.0.0 with an enhancement to the plug-in to use HTTP and TCP port-forwarding via the kube-proxy crate.

6. Other notable changes

Component nomenclature changes

With this release, Mayastor has moved towards a more consistent and taxonomical scheme of naming for components. This scheme follows {release name}-{class}-{object} for every component. The release name is considered from the Helm chart (default is mayastor). For eg, the rest api component is now named as mayastor-api-rest, the engine mayastor has now become mayastor-io-engine and so on.

Schema changes in etcd

Mayastor uses etcd as a persistent store for storing configuration and state information of its resources like pool, volume, nexus, replica, etc. The etcd cluster is created as a part of Mayastor installation. Prior to 2.0.0, a Mayastor installation was designed to exclusively use the etcd cluster, expecting no interference from other users of etcd. This could lead to data inconsistency if a user-driven standalone etcd cluster is offered to more than one active Mayastor installations.

In 2.0.0, a separate namespace is allotted for every Mayastor cluster.

IO-engine gRPC versioning

An enhanced API scheme with versioning v1 has been introduced in release 2.0.0 for all IO-engine RPC operations, with backward-compatible support to older v0 versioning APIs.

Only NVMe, no more iSCSI

With 2.0.0, Mayastor completely removes the support for iSCSI transport.

NATS deprecated

NATS is no longer needed for communication between agent-core and IO-engine components. This communication is over gRPC transport.

Known behavioural limitations

  • As with the previous versions, the Mayastor IO engine makes full utilisation of the allocated CPU cores even when there is less or no I/O load. This is the poller operating at full speed, waiting for I/O.
  • As with the previous versions, a Mayastor disk pool is limited to a single block device and cannot span across more than one block device.

Known issues

  • Mayastor does not support creation of thin-provisioned volumes as of v2.0.0. This a work-in-progress feature.
  • Mayastor does not support creation of volume snapshots and clones as of v2.0.0. This is a work-in-progress feature.
  • Mayastor does not support capacity expansion for volumes v2.0.0.
  • Mayastor does not support capacity expansion of disk pools as of v2.0.0.
  • Under heavy IO and constant scaling up-down of volume replicas, the io-engine pod has been observed to restart occasionally.

Testing

Mayastor is subject to extensive unit, component and system-level testing throughout the development and release cycle. Resources for system-level (E2E) testing are currently provided by DataCore Software.

At this time, personnel and hardware resource limitations constrain testing by the maintainers to linux builds on x86. This reflects the primary use-case which the maintainers are currently targeting with the OpenEBS Mayastor project. Therefore, the use of Mayastor with other operating systems and/or architectures, if even possible, should be considered serendipitous and wholly experimental.

This release has been subject to End-to-End testing under Ubuntu 20.04.5_LTS (kernel: ubuntu-5.15.0-50-generic)

  • Tested k8s versions
    • 1.23.7
    • 1.22.10
    • 1.21.13

Getting Started

Mayastor user documentation, including a quick deployment guide, can be found here

Upgrade

Upgrading from 1.0.x is not supported at this time. To use Mayastor version 2.0.0, a fresh install is required.

Support

If you are having issues during installation, configuration or upgrade, you can contact us via:

"Unsupported" Architectures and Operating Systems (inc. ARM, Raspberry Pi, MacOS)

As described in the section on software testing above, the maintainers build and test Mayastor only on linux, on x86-64. The use of Mayastor in other environments is therefore not necessarily possible, at least without modification. Where possible, this is currently largely coincidental - it is not "fully" tested and therefore this should be considered an entirely experimental use-case.

The maintainers will be pleased to receive contributions in this area, with the following understanding:

  • Such PR's will be reviewed for correctness, good practice, licensing compliance and general quality
  • PR's will be accepted on the basis that testing by the maintainers is restricted to demonstrating no negative affect on the stability of x86-64 builds
  • The maintainers will not perform acceptance testing or "positive release" of such functionality on any other OS or architecture, which is in accordance with their designation of these environments as experimental use cases at this time.
  • The maintainers will not provide build artifacts or container images for these environments