-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Energy error codes for issues in data received by DH from SDH #506
Comments
Overview The most significant difference between the existing CDR landscape and the upcoming energy sector activation is the delivery of data by data holders sourced, essentially transparently, from secondary data holder(s), ostensibly AEMO. Challenges with this method arise, particularly during error conditions because, without suitable description, an ADR is unable to determine if an error received is due to a system error on the Holder side or simply a reflection of an error received via the back channel from AEMO. This is particularly relevant where error conditions may be transient in nature or network path dependent, for instance a faulty record stored within the AEMO data store, a transient API gateway issue within AEMO, a non-compliance to specification by AEMO, a network connectivity issue with AEMO internet connectivity, with AEMO MarketNet termination (which by default is an active/standby topology and is chosen on a case by case basis by a Retailer) or middleware between the CDR service delivery and AEMO (ie. a transformation proxy managed by a separate infrastructure team). The inverse is true also, whereby a Holder may have a transient error while accessing AEMO APIs (or preparing to access them) and yet the error response for the endpoint would likely be the same. These combinations represent a situation where all parties (Recipient, Holder, Secondary Holder) are unable to disambiguite the source of encountered problems. In addition, the mechanism for Data Recipients to open incidents is only with Holders which will potentially result in Holders now being responsible for managing an increasing number of support requests on behalf of the Secondary Holder. As a result this mechanism results in essentially "baked in" double handling of incidents by Holders, through no fault of their own, between one regulator (ACCC) and another (AEMO). Scenarios The following is a non-exhaustive list of scenarios where the ADR understanding the source of a failure would be of benefit but cannot currently be sufficiently communicated:
As a third party providing SaaS CDR solutions the challenges of the Secondary Data Holder model is one which Biza.io is quite familiar with because, in simplistic terms, our customers are accessed using a similar topology. As such we cannot state the above is exhaustive as error conditions and behaviours have been discovered over years of development which is why our preferred solution is one of a global nature. Potential Solutions We propose one of the following solutions:
Our current preference is (1) on the basis that it provides flexibility to communicate these errors in a consistent context with existing error behaviour. |
AGL has reviewed previous comment from biza-io and concur with the proposed Overview, Scenarios and Potential Solutions. From an AGL standpoint the current standards implementation for SDH do not support a mechanism to identify and respond with errors that are initiated by the SDH (in this case AEMO, however would apply equally to VEC and EME). With the current implementation, all AEMO errors will be returned as AGL errors. This makes it challenging on several fronts, namely:
All scenarios highlighted by Biza-io are concerning with regards to points made above. In particular, Scenario 6 is quite complicated as AGL needs to put forward an intentional design for this. For example, if the NFR between the ADR and the ADH is 30 seconds, then AGL would ideally respond to the ADR within 30 seconds indicating that AEMO hasn’t responded via an appropriate error code. With the current implementation, there is no way to respond to indicate who caused the NFR to be missed. Furthermore, AEMO does not have the same level of non-functional obligations as ADH’s. If AGL was to establish a true “shared responsibility” arrangement with a third party (in this case AEMO), then it would always establish a contractual arrangement between AGL and the third party to ensure SLA for non-functional obligations are clear and agreed by both parties. This is not possible with the current CDR model and so a solution is needed. With regard to Potential Solutions, AGL’s preference is (1) from Biza-io, i.e. “1. Introduce a new error sub-type of cds-sdh allowing for all error codes to be communicated under this namespace on a 1:1 basis”. This is a low impact solution that leverages the existing error codes and extends them to support the SDH concept. |
It would appear that there is a clear need to convey to the ADR that an error has been returned from the Secondary Data Holder for a variety of reasons. The initial assumption of the DSB was that errors would simply be propagated but a good case is made here that there would still be value in distinctively identifying propagated errors as being from the Secondary Data Holder specifically. Based on review the feedback we would propose the addition of new error code to the Primary Data Holder variants of the Shared Responsibility APIs (ie the contracts called by the ADRs). This error code would be as follows: Name: 500 - Secondary Data Holder Error The
This would provide a single identifiable error type to the ADR but would also propagate all of the underlying detail that may be needed to understand any issues that occurred downstream. It is important to note that there are some scenarios where a Primary Data Holder will receive an error from the Secondary Data Holder which is a valid scenario and should not be propagated. For instance, if the ADR calls the Get Service Points end point and the Primary Data Holder translates that into requests for three specific NMIs, one of which is invalid, they will receive an error that they should process correctly. This scenario should not result in error propagation. |
Origin energy concurs with the concerns raised by Biza and AGL above regarding the error handling and specific error codes to differentiate the error from primary data holder vs secondary data holder. This concern has been raised previously by Origin during multiple calls, DPs and consultation like DP 154. Considering the tight timelines , we support the Option 1 from Biza's suggested solution option. --> “1. Introduce a new error sub-type of cds-sdh allowing for all error codes to be communicated under this namespace on a 1:1 basis”. |
It's worth noting here that I think that stuffing these errors into a new 500 error actually makes things harder for the ADR, not easier. For errors that are about the ADR input like invalid field, the ADR now has to look in two places in their code to figure out if they messed up an input. It's much more consistent for a client to be able to have all 400 errors come out to the same http code and in the same format as usual. Their error processing code giving feedback to their user is very unlikely to care about whether the error comes from the secondary or primary data holder, but the information is useful none the less for their logging so that an ADR can talk directly to the secondary data holder if the error clearly came from them. There are two distinct issues here:
The 500 and new code makes 2 very obvious that the error came from the secondary data holder, but makes point one a lot harder because now the ADR needs two error parsing code paths for the secondary and primary data holders. Using the same http codes but a separate error code in the json would make things a lot easier. Now it's matching the error code in a cds-sdh or cds-au namespace but being able to treat them as equal things for UI purposes but for different purposes for logging. It's worth noting that this takes the code matching away from a simple string equality, but makes the most sense for maximum gain on the two points above. |
Below is the summary of the solution that was proposed, discussed and agreed by the participants during the MI call on 31st August 2022: Summary
Key benefits
ProposalBased on the above, the DSB recommends the following addition to the Error response structure
Additional Notes
Feedback on the above is welcome. |
This change has been staged for review: ConsumerDataStandardsAustralia/standards-staging@release/1.20.0...maintenance/506 Note: The FDO has been changed to May 15th 2023 in alignment with tranche 2 release of Energy sector as per feedback |
Description
This CR is being raised to consult on the need of specific error codes required to deal with issues in data received by an energy data holder from the secondary data holder. The need for this change request was identified during consultation of issue #478.
Area Affected
Energy APIs
Change Proposed
This CR is currently a placeholder. The DSB will publish recommended changes to be consulted on within the Maintenance Iteration it gets prioritised for.
DSB Proposed Solution
The current DSB proposal for this issue is in this comment.
The text was updated successfully, but these errors were encountered: