Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(system-service-deployer): introduce new system service deployment system [fixes NET-487] #1623

Merged
merged 27 commits into from
Jun 20, 2023

Conversation

kmd-fl
Copy link
Contributor

@kmd-fl kmd-fl commented Jun 5, 2023

  1. New crate system-service-deployer. Installs all system services.
  2. Move decider and aqua-ipfs config to the node's config. Added default values for easier running.
  3. Refactor other small things a bit. Maybe, I needed to do it in a separate commit/pr, sorry.

Atm in the pr there are some after-testing stuff (like crates with "-test" suffix) that I didn't remove yet, but I'll do it.

Some description of the PR.


Service Configuration

Service configuration for aqua-ipfs and decider is moved to the node's config:

[system_services]
enable = [ 
    "aqua-ipfs",
     "registry"
]

[system_services.decider]
decider_period_sec = 1000
worker_period_sec = 123

[system_services.aqua_ipfs]
local_api_multiaddr = "/ip4/127.0.0.1/tcp/5001"

UPD: added enabled services

We may additionally list the system services we want to be deployed. By default, we enable ALL system services.

Note that atm newly disabled services aren't removed yet

Default values that we use in our network are used as default values, so right now, we don't need
to change the config in the nox-distro.

However, configuration for trust-graph (fluence certificates) is distributed within the crate, since
we decided that this configuration is part of the trust-graph and not of the node.

Old ENVs are supported now and overrides the configuration file.

System Service Distribution

There's no unified approach for this atm since I don't know how to provide a unified library for it.

I come up with the following approach. These structures are now created at the system service deployer

// For describing services
struct ServiceDistro {
    modules: HashMap<&'static str, &'static [u8]>,
    config: &'static [u8],
    name: &'static str, // is used as alias
}

// For describing spells
struct SpellDistro {
    name: &'static str, // is used a alias
    air: &'static str, // air script of the spell
    kv: HashMap<&'static str, JValue>, // initial KV of the spell
    trigger_config: TriggerConfig,
}

It's a slightly more difficult situation for spells since we need somehow to provide initial values from the node's config
in the format for a spell to understand.

Atm for decider, we provide a structure with the config to the decider distro crate; the crate takes the values
and returns us SpellDistro with spell-compatible KV init data.

I also wanted to create a PackageDistro which would unite spells and services of one package (like with decider),
but it doesn't unify properly because of the need to initialize aqua-ipfs and trust-graph.

Running deployer

System services deployment happens after running the main node loop. This is required for subscribing
system spells.

There's no specific reason for it. Maybe we need to move it to the main node loop, to the initialization phase,
where all the node's subsystems are started. This will allow us to stop the node easily if the system service deployer
fails.

ATM if the deployer wasn't able to deploy everything on the first try, it will stop trying again, and will do nothing

I will change it in the following PR.

Deployment process

Atm we need to install 3 stand-alone services and 1 spell with an aux service:

  1. aqua-ipfs, requires initialization with local and external api multi addresses of an IPFS node, provided in the node config; note that these values aren't very much in use in our spells, maybe we can remove them
  2. registry
  3. trust-graph, requires initialization with the Fluence certificates distributed with the service
  4. decider and its connector, require a load of initial values for the spell provided in the node config; the connector itself doesn't need any initialization

Deployment of every service and spell happens similarly:

  1. Detect if we need to install/update the service
  2. Remove old service/spell
  3. Install/Update
  4. Initialize if needed

In the code, the 1 and 2 steps are united in one function, deploy_system_service and deploy_system_spell.

Find existing services and spells

The rule is simple: two services/spells are the same if they share the same alias.

For services, if the new blueprint is different from the old one, install a new service

For spells, we don't compare blueprints, only scripts, and trigger configs. If the script or the config
is different, we update the spell.

Remove old service/spell

If we found existing service/sepll and detected that we need to update it, now first we will remove the old instance.

For services, we just plainly try to remove a service.

For spells, first, we try to unsubscribe them from the triggers. We use the same function as for Spell.remove. If removing fails,
we try just to unsubscribe the spell and update its trigger config to an empty trigger config to avoid resubscribing on restart.

ATM we install a new service/spell despite how the removing processes ended

I think we will change it in the following PRs.

Service/Spell Deployment

Service installation doesn't require much:

  1. Add modules of the service to the module repo to get a blueprint
  2. Create a service
  3. Add an alias

For spells:

  1. Create a new spell (using the function as in Spell.install)
  2. Add an alias

System Service Owner

The owner of system services (aka controller) is the node itself, services are controlled by HOST_PEER_ID.
Management PeerId is used only for assigning aliases.

This is done with way because worker_id and owner_id in the current node implementation are considered to be the same entities.

It would be nice to be able to control the system services with Management PeerId.

Initialization

Initialization happens after installation. It's not a unified process, it requires manual
implementation. For example, aqua-ipfs requires to set API multiaddr by calling two functions
set_local_api_multiaddr and set_external_api_multiaddr; on the other hand, trust-graph
wants us to call set_root with an address of a root node and also call insert_cert for
every provided certificate from the provided array.

Previously, service initialization was implemented via air scripts which were provided by the services.
We removed this for the node's simplification.

One of the approaches to unify this initialization step is to ask for the services to provide
setup or init functions, so the system deployer could do smth like

fn initialize_service(service: ServiceDistro, service_config: Option<ServiceSpecificConfig>) {
    let init_data_json = service.init_data(service_config);
	self.call_service(service.name, service.init_function_name, service.init_data_json);
	...
}

Cons:

  • Required rewriting of the existing services
  • Probably, we need to avoid services initialization and leave this possibility only to spells

Service Calls

During the deployment process, we use

const SYSTEM_SERVICE_DEPLOYER_TTL: u64 = 60_000;

as TTL for the service calls.

So, ATM if the service is too long to call, the whole initialization will fail. Good to know, right?

Questions

  1. Do we need to stop the node if the system deployer fails?
    • Yes, we want.
  2. Do we want to allow service initialization? Do we want to leave this option to spells only (via init data or KV)?
    • Yes, we want to allow system service initialization.
  3. Do we want to assign the HOST_PEER_ID as worker_id for the system services? Can't assign Management PeerId.
    • For now, we will use HOST_PEER_ID as an owner of all system services and spells.
  4. Do we want to remove old services? Do we want to remove old spells?
    • Yes, we want to remove old services. We want to try to update old spells.
  5. How to update a spell without erasing the old state? How to check if the update is a breaking change for the old state?
    • We will try to implement it in the next PRs.
  6. Do we want to be able to re-initialize or re-install services/spells on the changes in the system service config which is a part of the node's config? Some services cannot be re-initialized, only reinstalled (like aqua-ipfs)
    • We want to try when applicable. Try to re-initialize spells KV on each run.

@kmd-fl kmd-fl requested review from gurinderu, folex and justprosh June 5, 2023 17:09
Cargo.toml Outdated Show resolved Hide resolved
particle-node/src/lib.rs Outdated Show resolved Hide resolved
particle-node/src/node.rs Outdated Show resolved Hide resolved
particle-node/src/node.rs Outdated Show resolved Hide resolved
sorcerer/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Member

@folex folex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done!

@folex
Copy link
Member

folex commented Jun 6, 2023

[system_services_config.decider]

Maybe remove _config part? Just [system_services.decider] would look nicer as a config parameter

nox/src/node.rs Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
e2e Run e2e workflow
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants