Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial version of APPL STATE DB & Response Path HLD #846

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
271 changes: 271 additions & 0 deletions doc/pins/appl_state_db_response_path_hld.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,271 @@
# APPL STATE DB & Response Path HLD

## Table of Content

* [Table of Content](#table-of-content)
* [Revision](#revision)
* [Scope](#scope)
* [Definitions/Abbreviations](#definitions-abbreviations)
* [Overview](#overview)
* [Requirements](#requirements)
+ [Functional Requirements](#functional-requirements)
* [Architecture Design](#architecture-design)
* [High-Level Design](#high-level-design)
+ [Application Layer Notification Channel](#application-layer-notification-channel)
+ [Return Status Code](#return-status-code)
+ [APPL STATE DB](#appl-state-db)
+ [APPL DB Cleanup](#appl-db-cleanup)
+ [PINS P4RT Examples (with APPL DB Cleanup)](#pins-p4rt-examples--with-appl-db-cleanup-)
* [SAI API](#sai-api)
* [Configuration and management](#configuration-and-management)
* [Warmboot and Fastboot Design Impact](#warmboot-and-fastboot-design-impact)
* [Restrictions/Limitations](#restrictions-limitations)
* [Testing Requirements/Design](#testing-requirements-design)
+ [Unit Test cases](#unit-test-cases)
* [Open/Action items - if any](#open-action-items---if-any)

## Revision

<table>
<tr>
<td>Rev
</td>
<td>Rev Date
</td>
<td>Author(s)
</td>
<td>Change Description
</td>
</tr>
<tr>
<td>v0.1
</td>
<td>6/18/2021
</td>
<td>Runming Wu, Srikishen Pondicherry Shanmugam
</td>
<td>Initial version
</td>
</tr>
</table>

## Scope

This document describes the high-level design of the APPL State DB and the application response path feature.

## Definitions/Abbreviations

**PINS**: P4 Integrated Network Stack (PINS) is a project that provides a remote interface to SAI using P4.

## Overview

The current SONiC system follows an eventual consistency model to operate the switch. When an application desires to perform an action, it writes their "intent" into the APPL DB without receiving any indication of success. The lower layer of the system will make the best effort to fulfill the intent. When the intent cannot be satisfied, the APPL DB and the real state will be out of sync.

When applying PINS into a controller based SDN network, the eventual consistency model is not sufficient:

* The SDN controller needs to have the knowledge of the exact switch state. For each programming request, the controller should get an acknowledgement of success or failure in a timely manner. In current SONiC, applications only write their intents in APPL DB and receive no notification of success or failure. In PINS, when handling a P4 RPC request, the operation must be synchronous from application level down to hardware ASIC. The RPC response must contain the real system status.
* There is no real "state" DB to represent the system state in the application layer in current SONiC.
* APPL DB is mostly intent-based as explained above.
* The existing STATE DB is mainly used for resolving dependencies between different components. It does not include the detailed entry status that is programmed by the applications.
* In synchronous mode, ASIC DB represents the hardware state. However, its schema is defined in SAI format. It is not meant for applications.

In PINS, there are applications such as P4RT and telemetry that need to report the real system state to the SDN controller. Those applications cannot easily read the system state from the DBs.

This document introduces the application response path feature, where the application can receive timely notifications of the status of their program request. It also introduces a new DB, APPL STATE DB, for applications to easily query the system status.

## Requirements

### Functional Requirements

* A notification channel to notify the applications that their requests have been fulfilled or not. The result should carry the status code and failure reason in case of failure.
* The new DB that represents the real system state for applications to access the system state information.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we support multiple application to subscribe the same app state db channel?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
APPL STATE DB is just a regular Redis hash. There is no special channel, nothing special on it.
Applications can use the Redis key space notification channel to subscribe to APPL STATE DB.
SONiC already has a library for this purpose: SubscriberStateTable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to mention that the SubscriberStateTable API can support multiple clients.
So yes, multiple applications can subscribe/listen to the same APPL STATE DB table.

We are working in progress to update this doc to address some of the review comments in the last meeting.


## Architecture Design

![drawing](images/appl_state_db_response_path_hld_design.png)

As illustrated in the above diagram, the red components are the SONiC architectural change of this feature:

* Enable synchronous mode in orchagent and syncd. This enables the return path from syncd to orchagent.
* Orchagent will receive the correct hardware response of a SAI call.
* ASIC DB will represent the real hardware state instead of the intent.
* Syncd won't throw an exception to shutdown the system in case of SAI error.
* Add a new notification channel from orchagent to the applications to report the return status.
* Add the new APPL STATE DB. Orchagent will write an APPL DB entry into APPL STATE DB if the entry has been successfully programmed. The APPL STATE DB will have the exact same schema of APPL DB. But it only contains the successfully programmed entries in APPL DB.

## High-Level Design

### Application Layer Notification Channel

The application layer will use NotificationProducer & NotificationConsumer APIs for returning the status. These APIs use pure redis channels in implementation. So that the redis DB namespace doesn't apply to these APIs. Each APPL DB table will use a different response channel name. The channel name format is: APPL_DB_&lt;table_name>_RESPONSE_CHANNEL. Orchagent will write the response status into the response channel whether the programming is successful or failed. Applications can listen to the response channel to receive the return status if they need to.

A message in the NotificationProducer & NotificationConsumer APIs contains an operation, data, and a list of field-value-pairs. In the pop() API, the arguments are operation, data, and field-value-pair list. However, in the pops() API, the three arguments are represented in a KeyOpFieldsValuesTuple structure. The corresponding fields in KeyOpFieldsValuesTuple are operation, key, and values. The document will refer to the arguments by using the pop() API naming. The notification channel will have the following schema:

* data: Exact same key format in APPL DB tables. This should uniquely identify an entry in an APPL DB table. This field is the key field in KeyOpFieldsValuesTuple for pops() API.
* operation: The string representation of the return status. This field is the operation field in KeyOpFieldsValuesTuple for pops() API.
* field-value-pair list: The first field will be "err_str" which contains the error message of the status. The "err_str" field will always be the first field. If there is no error string, "err_str" field will be an empty string. The rest of the fields will be exactly the same as the original request attributes. For delete operation, the list will only include the "err_str".

There are certain rare scenarios that the applications need to be aware of:

* Different applications might write to the same APPL DB table. If so, each application needs to be able to identify the entries that they manage. When an application receives a response, the response might be intended for other applications.
* One application might write to multiple APPL DB tables. And there will be multiple NotificationConsumer to listen to multiple channels. When a response is received, the application needs to be able to identify the NotificationConsumer or the channel. So that the application can get the corresponding table.
* When an application sends multiple requests for the same key in a short time, the orchagent might receive the aggregated request. The notifications orchagent sends will be a single aggregated entry instead of the original requests. It is recommended that the application should wait for the response of the previous request before sending a request of the same key.
* Application should wait for the response with a timeout. In a rare case that the application does not receive the response from the orchagent, the application can assume failure instead of waiting indefinitely.

### Return Status Code

A list of return codes will be defined for application layer response. The string representation of the code is encoded in the "operation" field in the notification channel as mentioned before. For SAI errors, orchagent will map the SAI error into the corresponding return code.

<table>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use markdown format here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

<tr>
<td><strong>Return Status Code</strong>
</td>
<td><strong>SAI Error</strong>
</td>
</tr>
<tr>
<td>SWSS_RC_SUCCESS
</td>
<td>SAI_STATUS_SUCCESS
</td>
</tr>
<tr>
<td>SWSS_RC_INVALID_PARAM
</td>
<td>SAI_STATUS_INVALID_PARAMETER
</td>
</tr>
<tr>
<td>SWSS_RC_DEADLINE_EXCEEDED
</td>
<td>
</td>
</tr>
<tr>
<td>SWSS_RC_UNAVAIL
</td>
<td>
</td>
</tr>
<tr>
<td>SWSS_RC_NOT_FOUND
</td>
<td>SAI_STATUS_ITEM_NOT_FOUND
</td>
</tr>
<tr>
<td>SWSS_RC_NO_MEMORY
</td>
<td>SAI_STATUS_NO_MEMORY
</td>
</tr>
<tr>
<td>SWSS_RC_EXISTS
</td>
<td>SAI_STATUS_ITEM_ALREADY_EXISTS
</td>
</tr>
<tr>
<td>SWSS_RC_PERMISSION_DENIED
</td>
<td>
</td>
</tr>
<tr>
<td>SWSS_RC_FULL
</td>
<td>SAI_STATUS_INSUFFICIENT_RESOURCES
<p>
SAI_STATUS_TABLE_FULL
</td>
</tr>
<tr>
<td>SWSS_RC_IN_USE
</td>
<td>SAI_STATUS_OBJECT_IN_USE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have a state about in progress? for example, route sync is waiting for nexthop object to be syncd.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is designed to be synchronous. The request comes form application all the way down to the hardware; and the response comes back to the application.

In p4orch, we avoid retry. If we retry, we should not send the response. We should only send one response for each request. So an "in progress" error code does not sound synchronous. Should we send another response after it completes? If a request failed due to missing dependencies at the moment, p4orch will return "invalid param" and not retry.

In the P4RT use case, controller won't send a batch request with internal dependencies. So dependency is not an issue in the P4RT request. That's one reason that p4orch does not do retry like other orchs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more specific, the P4 programming is synchronous, that's what Runming is describing the behavior of P4Orch using the response path. For existing orch that does retry, and if the applications want to know an request is being retry or not, we can extend the design to add a "status" attribute (pending/done...etc). But that is currently not in the scope of the initial MVP

</td>
</tr>
<tr>
<td>SWSS_RC_INTERNAL
</td>
<td>SAI_STATUS_FAILURE
</td>
</tr>
<tr>
<td>SWSS_RC_UNIMPLEMENTED
</td>
<td>SAI_STATUS_NOT_SUPPORTED
<p>
SAI_STATUS_NOT_IMPLEMENTED
</td>
</tr>
<tr>
<td>SWSS_RC_UNKNOWN
</td>
<td>Other SAI errors
</td>
</tr>
</table>

### APPL STATE DB

To record the successfully programmed entries in the APPL DB, a new DB is introduced: APPL STATE DB (ID: APPL_STATE_DB):

* The APPL STATE DB will use a new redis DB namespace, since all the keys are the same as the APPL DB.
* The APPL STATE DB will use the exact same keys, same schema as the existing APPL DB.
* When an orchagent operation is successful, orchagent will update the APPL STATE DB exactly the same as the request from the APPL DB. When an orchagent operation is failed, orchagent will not update the APPL STATE DB, so that the APPL STATE DB will have the correct system state.
* Applications can read/subscribe to the APPL STATE DB to get the system state.

In implementation, there will be no specific APIs for APPL STATE DB. This is different from the APPL DB, which uses ProducerStateTable and ConsumerStateTable as APIs. The APPL STATE DB will simply be a redis hash:

* Orchagent will directly set the redis hash once it needs to update the APPL STATE DB. This can be done by using the raw table APIs.
* Application can read the APPL STATE DB by using raw table APIs. Or it can subscribe to the APPL STATE DB tables by using the SubscriberStateTable API.

### APPL DB Cleanup

Optionally, applications can choose to remove an intent from APPL DB if it failed to program into the hardware. This can be useful in warmboot since the device after warmboot should have the same hardware programming as before. Removing the failed entries from APPL DB can prevent those entries to be re-program after warmboot. Since the previous intent has failed, this cleanup operation in APPL DB is only at the DB level; orchagent does not need to process the cleanup operation. Applications should not use the ProducerStateTable to do the cleanup operation since it will trigger notifications to the orchagent to process the cleanup. Applications should directly modify APPL DB to revert the failed intent.

The following are the types of failed intent and the possible cleanup operations:

* Failed create \
Delete the new key in APPL DB to cleanup.
* Failed modify \
Read APPL STATE DB for the original entry. Wipe the entry in APPL DB, and re-write it with the APPL STATE DB entry to cleanup.
* Failed delete \
Read APPL STATE DB for the original entry. Write it into APPL DB to cleanup.

### PINS P4RT Examples (with APPL DB Cleanup)

![drawing](images/appl_state_db_response_path_hld_example_success.png)

![drawing](images/appl_state_db_response_path_hld_example_failure.png)

## SAI API

N/A

## Configuration and management

N/A

## Warmboot and Fastboot Design Impact

N/A

## Restrictions/Limitations

N/A

## Testing Requirements/Design

### Unit Test cases

The notification channel and APPL STATE DB are tested in the swss py test. The test verifies:

* The correct schema in the notification channel.
* The correct schema in the APPL STATE DB.
* In both the success case and fail case.

## Open/Action items - if any

N/A
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.