Skip to content

V1.0.1

Compare
Choose a tag to compare
@GlennBullingham GlennBullingham released this 22 Feb 20:23

Release v1.0.1

Release Date: 22nd February 2022

Summary

This patch release contains important fixes for issues identified after the release of V1.0.0
Users of V1.0.0 are advised to upgrade to this version. New users are advised to install only V1.0.1

Fixes

  • Critical fix(nexus): do not persist faulted state of the last replica
    • Replica retirement logic in the nexus allowed the last remaining replica of a volume to be marked as faulted in both the nexus and the persistent store (etcd). If this occurs, even if the device last retired is returned to normal service, the volume remains inaccessible and in a faulted state. This is a terminal condition which the user cannot rectify.
  • Major fix(helm chart): remove local pool config store
    • The V1.0.0 release incorrectly included a deprecated argument for the Mayastor container in both the prepared daemonset definition file and the Helm chart template from which it is generated. When used, this causes an unexpected duplication of stored configuration between local (file) and remote (etcd) representations.
  • fix(nexus): recreate faulted nexuses when appropriate
    • If healthy replicas of a faulted nexus are Online, the control plane will now attempt to destroy the faulty instance and recreate a new nexus in order to restore access to the affected volume
  • fix(nexus): serialised nexus i/o suspend/resume
    • Suspend/resume operations on NVMe subsystems are now serialised for nexuses, which properly handles simultaneous I/O suspension/resume operations in the case where multiple replicas are being retired simultaneously
  • chore: add env variable to control max number of qpairs
    • Added an environment variable to Mayastor, NVMF_TCP_MAX_QPAIRS_PER_CTRL, to allow control over the maximum number of qpairs per controller. This allows a user to tune the configuration for better stability in scenarios where there is a significant imbalance between the CPU core count used by the Mayastor process and that of the host on which it is scheduled. EXPERIMENTAL USE ONLY
  • Sending a node online event should not block
    • On startup if the cluster had more than 5 Nodes the send event notification per node would block. This is because the queue has only 5 elements. With this fix, if the queue is full additional events are dropped, which has only marginal impact on startup performance
  • fix(jaeger): reduce the OTEL_BSP_MAX_EXPORT_BATCH_SIZE
    • Some traces can be too long for export: OpenTelemetry trace error occurred. Exporter jaeger encountered the following error(s): thrift agent failed with message too long. Batch size has been reduced from 512 to 64.
  • fix: use dns name to reach rest
    • Using the host name created a race as the REST server's node port must start ahead of its consumers. With this fix the DNS name is used, which will be updated whenever REST is ready. Resolves: openebs/mayastor#1076
  • fix(node): node status overridden with stale status
    • Instead of using a copy of the node to reload, use the node itself with a grpc locked client to update the nodes when the registry is polling Mayastor.
  • fix(node): don't stall during (re-)registration of flaky nodes
    • Ensure that a node is alive and loaded before adding it to the configuration.
  • chore: don't trace reconciliation unless TRACE is set or there is work to do
    • For clusters with high number of volumes, the volume of traces can overwhelm the (Open)telemetry exporter. When in reconciliation loops do not create spans needlessly unless TRACE level is set, or unless the reconciler actually did work.

Testing

Mayastor is subject to extensive unit, component and system-level testing throughout the development and release cycle. Resources for system-level (E2E) testing are currently provided by DataCore Software.

At this time, personnel and hardware resource limitations constrain testing by the maintainers to linux builds on x86. This reflects the primary use-case which the maintainers are currently targeting with the OpenEBS Mayastor project. Therefore, the use of Mayastor with other operating systems and/or architectures, if even possible, should be considered serendipitous and wholly experimental.

This release has been subject to End-to-End testing under Ubuntu 20.04.3_LTS (kernel: ubuntu-5.13.0.27-generic)

  • Tested k8s versions
    • 1.21.7

Known Issues

  • The Pool Operator is unable to provision pools directly using a file as the backing device. The operator attempts to validate any device path supplied in the pool specification as an accessible block device attached to the corresponding Mayastor node. In the case of a file store, there is no block device and the validation fails, causing provisioning of the pool to be aborted. This will be addressed in a future release

    • workaround: Mount the file/image as a loopback device (losetup) and use the device path of the loopback device in the pool spec
  • Deploying an application pod on a worker node which hosts both Mayastor and Prometheus exporter causes that node to restart.

    • workaround: Use kernel version extra-5.31.0 or later

Getting Started

Mayastor user documentation, including a quick deployment guide, can be found here

Upgrade

Upgrades from versions of Mayastor prior to V1.0.0 are not supported. Any earlier release should be removed prior to installing this version.

Support

If you are having issues during installation, configuration or upgrade, you can contact us via:

"Unsupported" Architectures and Operating Systems (inc. ARM, Raspberry Pi, MacOS)

As described in the section on software testing above, the maintainers build and test Mayastor only on linux on x86-64. The use of Mayastor in other environments is therefore not necessarily possible, at least without modification. Where possible, this is currently largely coincidental - it is not "fully" tested and therefore this should be considered an entirely experimental use-case.

The maintainers will be pleased to receive contributions in this area, with the following understanding:

  • Such PR's will be reviewed for correctness, good practice, licensing compliance and general quality
  • PR's will be accepted on the basis that testing by the maintainers is restricted to demonstrating no negative affect on the stability of x86 builds
  • The maintainers will not perform acceptance testing or "positive release" of such functionality on any other OS or architecture, which is in accordance with their designation of these environments as experimental use cases at this time.
  • The maintainers will not provide build artifacts or container images for these environments