Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Backend restructure - Integrations #1171

Open
Benaiah opened this issue Mar 13, 2018 · 9 comments
Open

(WIP) Backend restructure - Integrations #1171

Benaiah opened this issue Mar 13, 2018 · 9 comments
Labels

Comments

@Benaiah
Copy link
Contributor

Benaiah commented Mar 13, 2018

Integrations

What are they?

Expand...

Integrations hook into a number of different places in the CMS to replace different calls, most (all?) of which would normally go to a backend. Currently the assets store (at least) requires additional work There are currently two integration providers:

  • Algolia: wraps searching and entry retrieval for large repositories.
  • Assets store: stores binary assets in a separate data store from entry data. (Currently unclear how this interacts with the media library)

Integrations vs integration providers

Ostensibly, the integration providers are implementations of generalized APIs for a named set of integration hooks:

  • assetStore - global
  • listEntries - collection-specific
  • search - collection-specific

Each of these APIs indicates that certain methods will be available on the provider instance. There doesn't seem to be a list of these anywhere in the code, as they're implicitly required, but I've compiled one here:

  • assetStore
    • upload(file, privateUpload) -> { success, url, asset } todo: determine the required shape of the file object
    • retrieve(query, page, privateUpload) -> fileList
    • delete(assetID) -> OptionPromise (OptionPromise here just means that the value of the Promise doesn't appear to matter, just whether it succeeds or fails).
  • listEntries: listEntries(collection, page) -> entryList
  • search
    • search(collections, searchTerm, page) -> entryList
    • searchBy(field, collection, query) -> entryList The field here is expected to be a string, which doesn't seem to match how it's called in src/actions/search.js/query.

There are some other methods implemented by integration providers, like getEntry in the Algolia provider, but they don't appear to be used currently.

How they work currently

Expand...

Configuration

The integrations are added by creating an integrations list in the config. This looks like the following (all currently supported configuration providers and options are listed here):

integrations:
  - provider: algolia

    # provider-specific options
    applicationID: example # algolia API application ID
    apiKey: example # algolia API key
    indexPrefix: example-cms # prefix for index names, which are in the form "${indexPrefix}${collection}"

    # hooks to attach the algolia functions to
    hooks:
      - search
      - listEntries

    # collections to apply hooks to
    collections:
      - example-entries
      - example-pages

  - provider: assetStore

    # provider-specific options
    getSignedFormURL: /example-asset-store/ # Used to generate URLs to the asset store endpoint. This is a constant, not a getter. I'm not entirely sure what this represents.
    shouldConfirmUpload: true # should confirm the upload by PUTing `{ state: "uploaded" }` to the asset URL

    # The assetStore hook is not tied to a specific collection
    hooks:
      - assetStore

Redux state init

In reducers/integrations.js, the exported reducer updates its integrations on a CONFIG_SUCCESS action. Each integration is an object, with several special keys:

  • provider: integration key.
  • hooks: a Map that connects hook names to the provider handling them
  • collections: a list of collection names to apply hooks to, or the string * to apply to all collections (but not the global hooks table)

Other keys are stored as provider data. The final state produced by this has two main keys:

  • providers: provider data - a Map of each integration's provider to a Map of any unknown keys in the integration object.
  • hooks: a Map of hooks

If collections is not present on the integration object, then each hook name in hooks is set as a key in the state.hooks Map, with the provider name as the value.

If collections is present on the integration object, then the config

Since this is implemented by reducing over the configured list of integrations, the precedence of integration hooks is determined by the order of the integrations list in the config, so re-ordering the list in the config can cause errors if the precedence is important. This makes generating a config which includes integrations unsafe unless you strictly control the order of the generated integrations list.

Collection names can collide with hook names, since they're both used as keys to the same Map. (e.g., a collection named listEntries would collide with the listEntries hook name). This includes the names in both the collections list in the integration object and, if any integration sets its collections to *, the collections listed in the main CMS config. This has two effects, depending on whether the integration config that sets the global hook comes before or after the integration config that sets the collection hooks for the colliding collection:

  • If the integration configuration with the global hook comes first, the provider name set as the value of that global hook will have the collection-specific hooks set directly on its string value.

  • If the integration configuring collection-specific keys for the colliding collection name comes first, the Map of collection hooks for that collection will be overwritten with the provider name of the global hook.

Since the order of the integrations is dependent on the order of their listing in the config.yml, bugs due to this can show up in previously-working configs if that list is reordered, even without any other changes made.

There are two ways to create hooks that apply to the whole CMS: either list hooks without collections in an integration, or set * as the integration's collections to apply collection-specific hooks to all configured collections. It's unclear what the precise roles of these different categories of hooks are.

The source of the reducer described above is as follows:

https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/reducers/integrations.js#L4-L29

Integration provider init

The actual integration providers themselves are initialized in src/integrations/index.js, in resolveIntegrations, which creates the integration providers configured when called (if the integration is included more than once ). resolveIntegrations is called by getIntegrationProvider on its first call and cached (note that getIntegrationProvider is declared in an IIFE, so the integrations declaration happens immediately when the file is loaded). This caching means that integration providers are invisible to the rest of the code, and cannot be reinitialized. The integrations variable here is effectively a singleton, just implemented with a closure instead of a class:

https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/integrations/index.js#L21-L32

getIntegrationProvider is called in a few places:

Whichever of these lines run first in a particular CMS setup determines when the integrations objects are set up. For most setups this probably occurs in loadEntries, as this is called whenever a collection is displayed, which is the default view when logging in to the CMS.

Hooks

As described above, hooks are configured and stored as lists of strings. The strings must match both the predefined hook name and a method on the integration provider object.

A hook is called via the following process (this process is not wrapped by anything else, so every site where a hook may be called must implement selectIntegrationProxy and getIntegrationProvider as described below):

  • selectIntegration is called with the following arguments:

    • state: the integrations list from the config
    • collection: the name of the collection (this is set to null for assetStore, the only non-collection-specific hook)
    • hook: the name of the hook

    It then returns, from the integration state in the Redux store, either hooks.<collection name>.<hook> or hooks.<hook>, depending on whether the collection name is set. The value returned from here will either be null or the name of an integrations provider (see the "Redux state init" section above for the shape of the state.integration.hooks Map).

  • If an integration exists, getIntegrationProvider is called with the following arguments:

    • interationsConfig: the integrations list from the config
    • getToken: an async function for retrieving a token to authenticate to the backend (used only by the assetStore integration)
    • provider: name of the integrations provider

    It then returns the actual provider instance (initializing the provider instances with resolveIntegrations as necessary).

  • The provider instance is called with whatever methods can be assumed to exist on the provider instance based on the API specified in the call to selectIntegration. The methods are listed above.

An example hook call follows:

https://github.com/netlify/netlify-cms/blob/b4b584682473924556a41cae10f64f085b0f432b/src/actions/entries.js#L241-L256

Comparison of potential integration designs

Expand...

Integrations as is

Integrations are currently a very complex API, with multiple layers of setup and configuration. Parts of the API are very general and indirected (e.g., the integration/integration provider distinction), and others are tightly coupled to specific implementations of both integration providers and backends (e.g., getToken or the AssetProxy system, both of which require explicit support from backends).

This leads to a situation where the API is simultaneously so flexible in principle that it's difficult to follow or implement, while so specific in operation that it requires indirect support across wide swaths of the codebase.

It's also inherently stateful - the list of providers instances is a singleton, and providers themselves are class instances which store information on the instance's properties.

Finally, the integrations API has sole responsibility for some concerns, meaning they cannot be implemented by backends. Search, for instance, is either done locally or through an integration - there's no ability for a backend to implement server-side search. Adding this to the backend API as well would introduce further API duplication between the backend and integration API.

TODO: expand

Integrations as backend composition

One potential approach would be to unify backends and integrations into a single API, allowing them to be combined with normal code. For instance, an Algolia integration could be defined as a function which wraps an existing backend and calls its functions, except for search, searchBy, getEntry, and listEntries.

Benefits of this could include removing getToken from the backend API (currently only used for integrations), and unifying the media library integration and backend APIs.

TODO: expand

Integrations as middleware

Redux allows intercepting actions before they hit reducers using middleware. This is very powerful, but it does allow unrestricted access to our Redux actions, essentially making our current action structure the public integrations API.

TODO: expand

@erquhart
Copy link
Contributor

Great breakdown here, love it.

Couple of thoughts:

Middleware API

The middleware option doesn't have to involve exposing raw state - we can process the middleware functions however we like. I'd expect that we'd transform the state into a shape matching our published API, allow that to travel through the middleware functions, and then transform the result from there.

The precedence problem

You make it clear that precedence matters here, as it does in almost any plugin architecture. I'm wondering if we can construct the API in such a way that a backend/integration must declare what parts of the API it handles in order to be allowed to handle those parts - e.g., Algolia, or the CMS config, declares that it handles search requests, and is therefore given the ability to handle those requests.

This allows us to statically determine handlers for each action and where overrides occur.

This could easily be an enhancement for later.

@Benaiah
Copy link
Contributor Author

Benaiah commented Mar 29, 2018

@erquhart good point on middleware - if we do use that kind of design, we'd definitely want an abstraction layer.

@erquhart erquhart mentioned this issue Apr 17, 2018
10 tasks
@knpwrs
Copy link

knpwrs commented Apr 18, 2018

Is there a PR open for this anywhere? Is there anything the community can do to help with this?

@Benaiah
Copy link
Contributor Author

Benaiah commented Apr 18, 2018

@knpwrs currently I'm working on getting the GitLab and BitBucket backends wrapped up, as well as working on some of the preliminary refactoring that'll be required before we move to a new backend API. You can follow along with that work here: #517 (GitLab) and #525 (BitBucket).

Once that work is done, the last step to prepare for combining integrations and backends will be to move the media library integrations into backend.js, so the core code will be calling a single endpoint for all backend and integration functionality. At that point, we can replace the backend API with a compatible version that includes integrations without affecting anything above src/backends/backend.js.

As for the new backend API itself, it's still very much in the design phase. The primary issue for the backend API design is here: #1134. The best way to help out with that now is to add suggestions and critiques to that issue. As a quick intro, the core ideas of the backend restructure are as follows:

  • Backends should be stateless functions (not instances of classes) which have state and async setState(newState) passed in as arguments.
  • The behavior currently implemented by integrations can be done by composing backends (either by simply changing/wrapping the functions that are returned to the CMS, or by some more complex API such as hook-based middleware).
  • We use a lot of layers to add functionality to backends, but their implementations are coupled to current backend implementations (i.e., everything assumes that the backend is filesystem-based, to the point that there's a backend API method deleteFile which accepts a path argument. These layers can be better implemented as shared code, so backends can choose to reimplement functionality if our built-in version is incompatible.

@erquhart
Copy link
Contributor

@Benaiah for when you dig into this:

  • Provide a heavily condensed, high level proposal (per recent discussion)
  • Let's go docs driven for this - the documentation would chart out how to create a custom integration/backend. Recommend staying high level to avoid bog down, we can improve the final docs later - go for speed.

@jnthnclrk
Copy link

Is this released?

@erquhart
Copy link
Contributor

No, this is currently in proposal stage.

@stale
Copy link

stale bot commented Oct 29, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@theetrain
Copy link
Contributor

Hello, is there a status update on this? I've been following #432 searching for a way to integrate NetlifyCMS with another media store such as S3 or GCP Storage. If there already exists an API (such as the one proposed in #1602 (comment)) I'd be happy to help contribute documentation and leverage Netlify Functions if a backend is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants