canonical · tonyandrewmeyer · Jan 22, 2025 · Jan 12, 2025 · Jan 20, 2025 · Jan 20, 2025
diff --git a/docs/explanation/storedstate-uses-limitations.md b/docs/explanation/storedstate-uses-limitations.md
@@ -5,41 +5,52 @@
 
 ## Purpose of this doc
 
-This is an explanatory doc covering how charm authors might track local state in a Juju unit. We'll cover the Operator Framework's concept of [`StoredState`](https://juju.is/docs/sdk/constructs#heading--stored-state), along with some differences in how it works between machine charms and Kubernetes charms. We'll talk about [Peer Relations](https://juju.is/docs/sdk/relations#heading--peer-relations) as an alternative for storing some kinds of information, and also talk about how charm authors probably should avoid recording state when they can avoid doing so. Relying on the SDK's built in caching facilities is generally the preferred direction for a charm.
+This is an explanatory doc covering how charm authors might track local state in a Juju unit. We'll cover the `ops` concept of [](ops.StoredState), along with some differences in how it works between machine charms and Kubernetes charms. We'll talk about Peer Relations as an alternative for storing some kinds of information, and also talk about how charm authors probably should avoid recording state when they can avoid doing so.
+
+<!-- UPDATE LINKS
+"Peer Relations", above
+-->
 
 ## A trivial example
 
 We'll begin by setting up a simple scenario. A charm author would like to charm up a (made up) service called `ExampleBlog`. The ideal cloud service is stateless and immutable, but `ExampleBlog` has some state: it can run in either a `production` mode or a `test` mode. 
 
-The standard way to set ExampleBlog's mode is to write either the string `test` or `production` to `/etc/example_blog/mode`, then restart the service. Leaving aside whether this is *advisable* behavior, this is how `ExampleBlog` works, and an `ExampleBlog` veteran user would expect a `ExampleBlog` charm to allow them to toggle modes by writing to that config file. (I sense a patch to upstream brewing, but let's assume, for our example, that we can't dynamically load the config.)
+The standard way to set ExampleBlog's mode is to write either the string `test` or `production` to `/etc/example_blog/mode`, then restart the service. Leaving aside whether this is *advisable* behavior, this is how `ExampleBlog` works, and an `ExampleBlog` veteran user would expect a `ExampleBlog` charm to allow them to toggle modes by writing to that config file. (I sense a patch to upstream brewing, but let's assume, for our example, that we can't dynamically load the config).
 
 Here's a simplified charm code snippet that will allow us to toggle the state of an already running instance of `ExampleBlog`.
 
 ```python
-def _on_config_changed(self, event):
+def _on_config_changed(self, event: ops.ConfigChangedEvent):
     mode = self.model.config['mode']
+    if mode not in ('production', 'test'):
+        self.unit.status = ops.BlockedStatus(f'Invalid mode: {mode!r})
+        return
 
     with open('/etc/example_blog/mode', 'w') as mode_file:
         mode_file.write(f'{mode}\n')
 
     self._restart()
 ```
 
-Assume that `_restart` does something sensible to restart the service -- e.g., calls `service_restart` from the [systemd](https://charmhub.io/operator-libs-linux/libraries/systemd) library in a machine version of this charm.
+Assume that `_restart` does something sensible to restart the service -- for example, calls `service_restart` from the [systemd](https://charmhub.io/operator-libs-linux/libraries/systemd) library in a machine version of this charm.
 
 ## A problematic solution
 
-The problem with the code as written is that the `ExampleBlog` daemon will restart every time the config-changed hooked fires. That's definitely unwanted downtime! We might be tempted to solve the issue with `StoredState`:
+The problem with the code as written is that the `ExampleBlog` daemon will restart every time the `config-changed` hook fires. That's definitely unwanted downtime! We might be tempted to solve the issue with `StoredState`:
 
 ```python
-def __init__(self, *args):
-    super().__init__(*args)
-    self._stored.set_default(current_mode="test")
+def __init__(self, framework: ops.Framework):
+    super().__init__(framework)
+    framework.observe(self.on.config_changed, self._on_config_changed)
+    self._stored.set_default(current_mode='test')
 
 def _on_config_changed(self, event):
     mode = self.model.config['mode']
     if self._stored.current_mode == mode:
         return
+    if mode not in ('production', 'test'):
+        self.unit.status = ops.BlockedStatus(f'Invalid mode: {mode!r})
+        return
 
     with open('/etc/example_blog/mode', 'w') as mode_file:
         mode_file.write('{}\n'.format(mode)
@@ -49,18 +60,14 @@ def _on_config_changed(self, event):
     self._stored.current_mode = mode
 ```
 
-The `StoredState` [docs](https://juju.is/docs/sdk/constructs#heading--stored-state) advise against doing this, for good reason. We have added one to the list of places that attempt to track `ExampleBlog`'s "mode". In addition to the config file on disk, the juju config, and the actual state of the running code, we've added a fourth "instance" of the state: "current_mode" in our `StoredState` object. We've doubled the number of possible states of this part of the system from 8 to 16, without increasing the number of correct states. There are still only two: all set to `test`, or all set to `production`. We have essentially halved the reliability of this part of our code.
+We advise against doing this. We have added one to the list of places that attempt to track `ExampleBlog`'s "mode". In addition to the config file on disk, the Juju config, and the actual state of the running code, we've added a fourth "instance" of the state: "current_mode" in our `StoredState` object. We've doubled the number of possible states of this part of the system from 8 to 16, without increasing the number of correct states. There are still only two: all set to `test`, or all set to `production`. We have essentially halved the reliability of this part of our code.
 
 ## Differences in StoredState behaviour across substrates
 
 Let's say the charm is running on Kubernetes, and the container it is running in gets destroyed and recreated. This might happen due to events outside of an operator's control -- perhaps the underlying Kubernetes service rescheduled the pod, for example. In this scenario the `StoredState` will go away, and the flags will be reset.
 
 Do you see the bug in our example code? We could fix it by setting the initial value in our `StoredState` to something other than `test` or `production`. E.g., `self._stored.set_default(current_mode="unset")`. This will never match the actual intended state, and we'll thus always invoke the codepath that loads the operator's intended state after a pod restart, and write that to the new local disk.
 
-What if we are tracking some piece of information that *should* survive a pod restart?
-
-In this case, charm authors can pass `use_juju_for_storage=True` to the charm's `main` routine ([example](https://github.com/canonical/alertmanager-k8s-operator/blob/8371a1424c0a73d62d249ca085edf693c8084279/src/charm.py#L454)). This will allocate some space on the controller to store per unit data, and that data will persist through events that could kill and recreate the underlying pod. Keep in mind that this can cause trouble! In the case of `ExampleBlog`, we clearly would not want the `StoredState` record for "mode" to survive a pod restart -- the correct state is already appropriately stored in Juju's config, and stale state in the controller's storage might result in the charm skipping a necessary config write and restart cycle.
-
 ## Practical suggestions and solutions
 
 _Most of the time, charm authors should not track state in a charm._
@@ -70,10 +77,15 @@ More specifically, authors should only use `StoredState` when they are certain t
 In our example code, for instance, we might think about the fact that `config_changed` hooks, even in a busy cloud, fire with a frequency measured in seconds. It's not particularly expensive to read the contents of a small file every few seconds, and so we might implement the following, which is stateless (or at least, does not hold state in the charm):
 
 ```python
-def _on_config_changed(self, event):
+def _on_config_changed(self, event: ops.ConfigChangedEvent):
+    mode = self.model.config['mode']
+    if mode not in ('production', 'test'):
+        self.unit.status = ops.BlockedStatus(f'Invalid mode: {mode!r})
+        return
+
     with open('/etc/example_blog/mode') as mode_file:
         prev_mode = mode_file.read().strip()
-    if self.model.config['mode'] == prev_mode:
+    if mode == prev_mode:
         return
 
     with open('/etc/example_blog/mode', 'w') as mode_file:
@@ -82,9 +94,7 @@ def _on_config_changed(self, event):
     self._restart()
 ```
 
-One common scenario where charm authors get tempted to use `StoredState`, when a no-op would be better, is to use `StoredState` to cache information from the Juju model. The Operator Framework already caches information about relations, unit and application names, etc. It reads and loads the charm's config into memory during each hook execution. Authors can simply fetch model and config information as needed, trusting that the Operator Framework is avoiding extra work where it can, and doing extra work to avoid cache coherency issues where it must. 
-
-Another temptation is to track the occurrence of certain events like [`pebble-ready`](https://juju.is/docs/sdk/events#heading--pebble-ready). This is dangerous. The emission of a `pebble-ready` event means that Pebble was up and running when the hook was invoked, but makes no guarantees about the future. Pebble may not remain running -- see the note about the Kubernetes scheduler above -- meaning your `StoredState` contains an invalid cache value which will likely lead to bugs. In cases where charm authors want to perform an action if and only if the workload container is up and running, they should guard against Pebble issues by catching `ops.pebble.ConnectionError`:
+One common scenario where charm authors get tempted to use `StoredState` is to track the occurrence of certain events like [](ops.PebbleReadyEvent). This is dangerous. The emission of a `pebble-ready` event means that Pebble was up and running when the hook was invoked, but makes no guarantees about the future. Pebble may not remain running -- see the note about the Kubernetes scheduler above -- meaning your `StoredState` contains an invalid cache value which will likely lead to bugs. In cases where charm authors want to perform an action if and only if the workload container is up and running, they should guard against Pebble issues by catching [](ops.pebble.ConnectionError):
 
 ```python
 def some_event_handler(event):
@@ -95,10 +105,16 @@ def some_event_handler(event):
         return
 ```
 
-You shouldn't use the container's `can_connect()` method for the same reason - it's a point-in-time check, and Pebble could go away between calling `can_connect()` and when the actual change is executed - ie. you've introduced a race condition.
+In the other cases where state is needed, authors ideally want to relate a charm to a database, attach storage (see Juju storage), or simply be opinionated, and hard code the single "correct" state into the charm. (Perhaps `ExampleBlog` should always be run in `production` mode when deployed as a charm?)
 
-In the other cases where state is needed, authors ideally want to relate a charm to a database, attach storage ([see Juju storage](https://juju.is/docs/sdk/storage)), or simply be opinionated, and hard code the single "correct" state into the charm. (Perhaps `ExampleBlog` should always be run in `production` mode when deployed as a charm?) 
+<!-- UPDATE LINKS
+"Juju Storage", above
+-->
 
 In the cases where it is important to share some lightweight configuration data between units of an application, charm author's should look into [peer relations](https://juju.is/docs/sdk/integration#heading--peer-integrations). And in the cases where data must be written to a container's local file system (Canonical's Kubeflow bundle, for example, must do this, because the sheer number of services mean that we run into limitations on attached storage in the underlying cloud), authors should do so mindfully, with an understanding of the pitfalls involved.
 
+<!-- UPDATE LINKS
+"peer relations", above
+-->
+
 In sum: use state mindfully, with well chosen tools, only when necessary.
diff --git a/docs/howto/index.md b/docs/howto/index.md
@@ -16,6 +16,7 @@ Manage leadership changes <manage-leadership-changes>
 Manage libraries <manage-libraries>
 Manage interfaces <manage-interfaces>
 Manage secrets <manage-secrets>
+Manage stored state <manage-stored-state>
 Manage the charm version <manage-the-charm-version>
 Manage the workload version <manage-the-workload-version>
 Get started with charm testing <get-started-with-charm-testing>

diff --git a/docs/howto/manage-stored-state.md b/docs/howto/manage-stored-state.md
@@ -0,0 +1,182 @@
+(manage-stored-state)=
+# How to manage stored state
+
+> See first: [](storedstate-uses-limitations)
+
+Data stored on a charm instance will not persist beyond the current Juju event,
+because a new charm instance is created to handle each event. In general, charms
+should be stateless, but in some situations storing state is required. There are
+two approaches (outside of using a database or Juju storage): storing state in
+the charm machine or (for Kubernetes charms) container - for state that should
+have the same lifetime as the machine or container, and storing state in a Juju
+peer relation - for state that should have the same lifetime as the application.
+
+## Storing state for the lifetime of the charm container or machine
+
+Where some state is required, and the state should share the same lifetime as
+the machine or (for Kubernetes charms) container, `ops` provides
+[](ops.StoredState) where data is persisted to the `ops` unit database in the
+charm machine or container.
+
+[caution]
+Note that for Kubernetes charms, container recreation is expected: even if there
+are no errors that require the container to be recreated, the container will be
+recreated with every charm update.
+[/caution]
+
+[note]
+In Kubernetes charms that use the older 'podspec' model, rather than the sidecar
+pattern, or when the `use_juju_for_storage` option is set, this data will be
+stored in Juju instead, and will persist for the life of the application.
+Avoid using `StoredState` objects in these situations.
+[/note]
+
+A `StoredState` object is capable of persisting simple data types, such as
+integers, strings, or floats, and lists, sets, and dictionaries containing those
+types. For more complex data, serialise the data first, for example to JSON.
+
+### Implement the feature
+
+To store data in the unit state database, in your `src/charm.py` file, add a
+`StoredState` object to your charm class -- this is typically called `_stored`.
+You then need to use `set_default` to set an initial value; for example:
+
+```python
+class MyCharm(ops.CharmBase):
+
+    _stored = ops.StoredState()
+
+    def __init__(self, framework):
+        super().__init__(framework)
+        self._stored.set_default('expensive_value', None)
+```
+
+> See more: [](ops.StoredState)
+
+Now, in any event handler, you can read or write data to the object you are
+storing, and it will be persisted across Juju events.
+
+```python
+def _on_start(self, event: ops.StartEvent):
+    if self._stored.expensive_value is None:
+        self._stored.expensive_value = self._calculate_expensive_value()
+
+def _on_install(self, event: ops.InstallEvent):
+    # We can use self._stored.expensive_value here, and it will have the value
+    # set in the start event.
+    logger.info("Current value: %s", self._stored.expensive_value)
+```
+
+> Examples: [Kubernetes-Dashboard stores core settings](https://github.com/charmed-kubernetes/kubernetes-dashboard-operator/blob/03bf0f64d943e39176c804cd796a7a9838bf13ab/src/charm.py#L42)
+
+### Test the feature
+
+> See first: {ref}`get-started-with-charm-testing`
+
+You'll want to add unit tests.
+
+For integration tests: stored state isn't a feature, it's functionality that
+enables features, so your integration tests that make use of the stored state
+will verify that it works correctly. There are no special constructs to use in
+an integration test: just trigger multiple Juju events.
+
+#### Write unit tests
+
+> See first: {ref}`write-scenario-tests-for-a-charm`
+
+Add `StoredState` objects to the `State` with any content that you want to mock
+having persisted from a previous event. For example, in your
+`tests/unit/test_charm.py` file provide a `_stored` attribute that has a
+'expensive_value' key:
+
+```python
+def test_charm_sets_stored_state():
+    ctx = testing.Context(MyCharm)
+    state_in = testing.State()
+    state_out = ctx.run(ctx.on.start(), state_in)
+    ss = state_out.get_stored_state("_stored", owner_path="mycharm")
+    assert ss.content["expensive_value"] == 42
+
+def test_charm_logs_stored_state():
+    ctx = testing.Context(MyCharm)
+    state_in = testing.State(stored_states={
+        testing.StoredState(
+            "_stored",
+            owner_path="MyCharm",
+            content={
+                'expensive_value': 42,
+            })
+    })
+    state_out = ctx.run(ctx.on.install(), state_in)
+    assert ctx.juju_log[0].message == "Current value: 42"
+```
+
+## Storing state for the lifetime of the application
+
+To store state for the lifetime of the application, add a peer relation and
+store the data in the relation databag.
+
+### Implement the feature
+
+#### Define a peer relation
+
+Update the `charmcraft.yaml` file to add a `peers` block, as below:
+
+```yaml
+peers:
+  charm-peer:
+    interface: my_charm_peers
+```
+
+<!-- UPDATE LINKS
+> Read more: [File ‘charmcraft.yaml`]()
+-->
+
+#### Set and get data from the peer relation databag
+
+In your `src/charm.py` file, set and get the data from the peer relation
+databag. For example, to store an expensive calculation:
+
+```python
+def _on_start(self, event: ops.StartEvent):
+    peer = self.model.get_relation('charm-peer')
+    peer.data[self.app]['expensive-value'] = self._calculate_expensive_value()
+
+def _on_stop(self, event: ops.StopEvent):
+    peer = self.model.get_relation('charm-peer')
+    logger.info('Value at stop is: %s', peer.data[self.app]['expensive-value'])
+```
+
+[caution]
+Peer relations are not available early in the Charm lifecycle, so you'll need
+to wait until later events, like `start`, to store and retrieve data.
+[/caution]
+
+
+### Test the feature
+
+> See first: {ref}`get-started-with-charm-testing`
+
+You'll want to add unit tests.
+
+For integration tests: stored state isn't a feature, it's functionality that
+enables features, so your integration tests that make use of the stored state
+will verify that it works correctly. There are no special constructs to use in
+an integration test: just trigger multiple Juju events.
+
+#### Write unit tests
+
+> See first: {ref}`write-scenario-tests-for-a-charm`
+
+In your `tests/unit/test_charm.py` file, add tests that have an initial state
+that includes a [](ops.testing.PeerRelation) object.
+
+```python
+def test_charm_sets_stored_state():
+    ctx = testing.Context(MyCharm)
+    peer = testing.PeerRelation('charm-peer')
+    state_in = testing.State(relations={peer})
+    state_out = ctx.run(ctx.on.start(), state_in)
+    rel = state_out.get_relation(peer.id)
+    assert rel.local_app_data["expensive_value"] == "42"
+```