Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separating statements in wrapper and payload #316

Closed
timgdavies opened this issue Jun 1, 2020 · 5 comments
Closed

Separating statements in wrapper and payload #316

timgdavies opened this issue Jun 1, 2020 · 5 comments

Comments

@timgdavies
Copy link
Contributor

timgdavies commented Jun 1, 2020

This issue outlines a proposal for restructuring BODS statements to more clearly separate the statement meta-data from the statement payload.

Internal note: diagram sources are in this document

Motivation

By separating statements in their meta-data wrappers (id, type, date, disclosureID, publicationDetails, source and annotations), and payload (substantive entity, person or interests), it will be easier to communicate the BODS data model, and to start explanations from the substantive concepts, before introducing the statement wrapper required to handle change over time and aggregation of heterogeneous data sources.

What will change

New schema objects for Entity and Person will be created, and all the substantive properties from EntityStatement and PersonStatement will be moved into these respective objects. Interests are already contained in their own object nested within an array in the OwnershipOrControlStatement.

This structural change will be accompanied by updated documentation that explains:

  • Beneficial Ownership information is about linking entities and people via interests.

  • Because, when dealing with heterogeneous data, we can’t guarantee that entities and people have stable identifiers, we have to create ‘temporary’ statement identifiers that show the connections that have been asserted in a particular disclosure.

  • Consuming and producing applications may adopt different strategies to go from these connections asserted in the disclosures, to creating or updating their record of the interest connections between entities and people in their data.

Worked example (diagram)

The following is a rough sketch only. We should work on consistent presentation in documentation - including considering use of colour to represent different statement types.

Using BODS building blocks to collect and structure data

Source data may be collected through online systems or paper forms. In both cases, the forms collect information on entities, people and the interests that connect them.

An online system might assign an internal database ID to each of these, and record the connection between them using those internal database identifiers.

image

On a paper form, the connection may be evident simply as a result of these sections existing on the same form.

image

The BODS building blocks for entity, interest and person should be used to guide the data model in forms and systems.

This means that any data collected from front end forms components should be able to map to the underlying BODS models for these objects.

Using BODS statements to exchange data

Statements are a wrapper around the information being shared about an entity, interest or person. They provide the meta-data required for another system to access and work with this data.

A statement takes the information recorded about an entity, interest or person, and expresses this as information about X expressed by Y at time T.

image

Consuming data from a BODS statement

Most use cases for BODS data will require it to be converted from its statement form, into a graph that directly connects entities, interests and people.

This involves each application making choices about how to reconcile each statement about an entity, person or interest, to local database records about those entities, persons or interests, and how to handle statements that appear to update past data.

For example:

  • Application A may choose to only keep the latest set of statements about an entity and person, using the identifiers block to provide a primary key for each entity and person record in the database, and trusting that the data source they are using uniquely and reliably uses these identifiers.

By contrast

  • Application B may choose to keep a copy of each statement, but mark the most recent set as ‘active’ so that queries can either look up all historical data, or look up the latest stated set of relationships. Application B may assign an internal identifier to each entity and person, implementing a matching algorithm that looks at both identifiers, and other identifying information (names, addresses and dates) to decide when two statements are about the same entity or person.

The right approach will depend on both the data sources being used, and the needs of the application.

Worked example (data)

Below is an example disclosure restructured into this new model through the addition of 'person' and 'entity' objects. Interests requires no change.

[
    {
        "statementID": "243e8f5b-8699-4fac-9b88-21713e973951",
        "statementType": "entityStatement",
        "statementDate": "2019-05-16",
        "entity": {
            "entityType": "registeredEntity",
            "name": "Company B",
            "identifiers": [
                {
                    "scheme": "UA-EDR",
                    "id": "UA-XE-02"
                }
            ]
        }
    },
    {
        "statementID": "2ba6417f-bd77-4e8c-a4c6-6d6f10315bdb",
        "statementType": "personStatement",
        "statementDate": "2018-12-17",
        "person":{
            "personType": "knownPerson",
            "names": [
                {
                    "type": "individual",
                    "fullName": "Person 1"
                }
            ],
            "nationalities": [
                {
                    "name": "Ukrainian",
                    "code": "UA"
                }
            ],
            "birthDate": "1965-10"
        }
    },
    {
        "statementID": "4ddf22ac-1936-4d5f-acf4-4e2247679379",
        "statementType": "ownershipOrControlStatement",
        "statementDate": "2018-12-17",
        "subject": {
            "describedByEntityStatement": "243e8f5b-8699-4fac-9b88-21713e973951"
        },
        "interestedParty": {
            "describedByPersonStatement": "2ba6417f-bd77-4e8c-a4c6-6d6f10315bdb"
        },
        "interests": [
            {
                "type": "shareholding",
                "interestLevel": "indirect",
                "beneficialOwnershipOrControl": true,
                "share": {
                    "exact": 60,
                    "maximum": 60,
                    "minimum": 60
                },
                "startDate": "2017-11-01"
            }
        ]
    }
]

Backwards compatibility

Data in this updated model can be transformed into 0.2 data by a simple conversion operation, moving everything from person and entity objects up one level.

@timgdavies
Copy link
Contributor Author

An additional consequence of this change may be that we can update immutability rules, so that:

  • The payload identified by any statement must be immutable, but certain fields of the wrapper can be allowed to change.

@stevenday
Copy link

I like this idea and I agree with the benefits for understanding and explaining the metadata, but I wonder why you didn't move everything in an ownershipOrControlStatement under a single top-level key as well? I think the consistency of having it all under ownershipOrControl would be beneficial, i.e:

{
    "statementID": "4ddf22ac-1936-4d5f-acf4-4e2247679379",
    "statementType": "ownershipOrControlStatement",
    "statementDate": "2018-12-17",
    "ownershipOrControl": {
      "subject": {
          "describedByEntityStatement": "243e8f5b-8699-4fac-9b88-21713e973951"
      },
      "interestedParty": {
          "describedByPersonStatement": "2ba6417f-bd77-4e8c-a4c6-6d6f10315bdb"
      },
      "interests": [
          {
              "type": "shareholding",
              "interestLevel": "indirect",
              "beneficialOwnershipOrControl": true,
              "share": {
                  "exact": 60,
                  "maximum": 60,
                  "minimum": 60
              },
              "startDate": "2017-11-01"
          }
      ]
    }
}

@timgdavies
Copy link
Contributor Author

@stevenday Good question.

I was conceptually thinking of the 'interest' as the payload that would be stored in a database, and the subject and interestedParty cross-references as part of the statement meta-data which an individual application would want to resolve, rather than store in it's database as part of the payload.

@kd-ods
Copy link
Collaborator

kd-ods commented Jan 18, 2021

An additional consequence of this change may be that we can update immutability rules, so that:

  • The payload identified by any statement must be immutable, but certain fields of the wrapper can be allowed to change.

I'm not sure what the benefit of this would be. (If there's no clear benefit, then immutability will be retained for the restructured payload-wrapper statements and there are no implications for handling change over time.)

Regarding @stevenday's suggestion re the ownership-or-control statement payload. I agree that conceptually it makes sense to have references out to the subject and interested party as part of the payload.

@kathryn-ods
Copy link
Contributor

Closing as we have done this in 0.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants