Skip to content
Amir Malka edited this page Feb 7, 2024 · 3 revisions

Welcome to the synchronizer wiki!

Synchronizer core and adapters

flowchart LR
    subgraph Cluster 
        deployments---client1(client)
    pods---client2(client)
    sbomsspdxv2p3s---client3(client)
    profiles---client4(client)
    client1---inclusteradapter(In-cluster adapter)
    client2---inclusteradapter
    client3---inclusteradapter
    client4---inclusteradapter
    inclusteradapter---synchronizerIC(synchronizer)
    end
    subgraph Backend
    synchronizerIC---synchronizerBE(synchronizer)
    synchronizerBE---backendadapter(backend adapter)
    backendadapter---client
    client---pulsar
    end
Loading

Event based flows

The synchronization process is a fully event-based flow. It does not rely on distributed transactions to achieve synchronization between in-cluster and backend objects.

flowchart LR
    subgraph Cluster 
        ETCD---synchronizerincluster(Synchronizer In-Cluster)
    end
    subgraph Backend
    synchronizerincluster---synchronizerBE(Synchronizer BE)---Pulsar---Ingester---Postgres
    end
Loading

There are 3 components to consider:

  • synchronizerIC (in-cluster)
    • sourcing from ETCD and synchronizerBE (backend)
    • writing to ETCD
  • synchronizerBE
    • sourcing from Pulsar and synchronizerIC
    • writing to Pulsar
  • ingester
    • sourcing from Pulsar
    • writing to Postgres

Synchronizers are directly connected using a websocket over the Internet, therefore each synchronizer has a dead letter queue (or stack?) for messages not delivered to the other end. Processing these will be described further below.

Cluster -> backend

Sequence diagrams have been simplified for clarity, all messages are passed from one synchronizer to the other. Once it reache the Pulsar, the sequence is cut and a new asynchronous process is started. Message ordering within the same topic should be guaranteed. Cluster events are generated via (Kubernetes) go-client watches.

Added or Modified

Optimization for Modified: we can already start at the step after the “GetObject -> Get -> Resource” step and save the initial round trip with Checksum verification, since there is little (no) chance to have the modified object already in the backend.

sequenceDiagram
    participant ETCD
    participant SynchronizerIC
    participant SynchronizerBE
    participant Pulsar
    participant Ingester
    participant Postgres
    ETCD->>+SynchronizerIC: Added/Modified
    SynchronizerIC->>+SynchronizerIC: Calculate checksum
    SynchronizerIC->>+Ingester: Checksum
    Ingester->>+Postgres:Select
    Postgres->>+Ingester:Resource
    Ingester->>+Ingester: Calculate checksum
    alt missing or invalid checksum
        Ingester->>+SynchronizerIC: GetObject
        SynchronizerIC->>+ETCD: Get
        ETCD->>+SynchronizerIC: Resource
        alt no patch
            SynchronizerIC->>+Ingester: Object (new)
            Ingester->>+Postgres: Insert
        else
            SynchronizerIC->>+SynchronizerIC: Calculate path
            SynchronizerIC->>+Ingester: Patch-1
            Ingester->>+Postgres: Select
            Postgres->>+Ingester:Resource
            Ingester->>+Ingester:Apply patch
            alt patch OK
                Ingester->>+Postgres: Insert
            else
                Ingester->>+SynchronizerIC: Object (shadow)
                SynchronizerIC->>+SynchronizerIC: Update shadow
                SynchronizerIC->>+SynchronizerIC: Calculate patch
                SynchronizerIC->>+Ingester: Patch-2
                Ingester->>+Postgres:Select
                Postgres->>+Ingester:Resource
                Ingester->>+Ingester: Apply patch
                Ingester->>+Postgres:Insert
            end
        end
    end
Loading

Deleted

sequenceDiagram
    participant ETCD
    participant SynchronizerIC
    participant SynchronizerBE
    participant Pulsar
    participant Ingester
    participant Postgres
    ETCD->>+SynchronizerIC: Deleted
    SynchronizerIC->>+Ingester: Delete
    Ingester->>+Postgres:Drop 
Loading

Backend -> Cluster

Backend events are sourced from Pulsar. They might be generated by the UI, synchronizers or ingesters.

Added or Modified

sequenceDiagram
    participant ETCD
    participant SynchronizerIC
    participant SynchronizerBE
    participant Pulsar
    participant Ingester
    participant Postgres
    Pulsar->>+Ingester: Object
    Ingester->>+Postgres:Insert
    Pulsar->>+SynchronizerIC:Object
    SynchronizerIC->>+ETCD:Create/Replace
Loading

Deleted

sequenceDiagram
    participant ETCD
    participant SynchronizerIC
    participant SynchronizerBE
    participant Pulsar
    participant Ingester
    participant Postgres
    Pulsar->>+Ingester: Delete
    Ingester->>+Postgres:Drop
    Pulsar->>+SynchronizerIC:Delete
    SynchronizerIC->>+ETCD:Delete
    
Loading

Both directions

For bi-directional synchronization, we need to take into account the version of objects (in both locations) and consider Pulsar events as the reference.

TBD