Skip to content

Data normalization

Eduardo Bouças edited this page Mar 16, 2020 · 5 revisions

Source plugins are expected to add data to two specific data buckets.

models

Products like content management systems typically have the concept of models or content types. Source plugins are expected to add information about these models to the models data bucket, as objects with the following properties:

Property Type Description Example
fieldNames Array An array containing the names of the model fields ['title', 'subtitle', 'author']
source String The name of the source plugin as used in its package.json sourcebit-source-contentful
modelName String The ID or machine-friendly name of the model blog
modelLabel String The human-friendly name of the model Blog Posts
projectId String The ID of the project within the source platform Contentful space ID
projectEnvironment String The environment within the source platform Contentful space environment

💡 For data sources that don't have the concept of a project ID or environment, these values can be set to an empty string.

objects

The objects data bucket contains all entries coming from the various data sources. Source plugins must normalize all entries before adding them to the data bucket. This normalization consists of adding a property called __metadata, containing an object with the following properties:

Property Type Description Example
id String A unique identifier for the object 123456789
source String The name of the source plugin as used in its package.json sourcebit-source-contentful
modelName String The ID or machine-friendly name of the model blog
modelLabel String The human-friendly name of the model Blog Posts
projectId String The ID of the project within the source platform Contentful space ID
projectEnvironment String The environment within the source platform Contentful space environment
createdAt String The ISO 8601 representation of the entry's creation date 2011-10-05T14:48:00.000Z
updatedAt String The ISO 8601 representation of the entry's last update date 2011-10-05T15:30:00.000Z

All content fields should be placed at the root level of the entry object.

  • 🚫

    {
      "type": "blog",
      "meta": {
        "_id": "123456789",
        "created_at": "2011-10-05T14:48:00.000Z",
        "updated_at": "2011-10-05T15:30:00.000Z"
      },
      "fields": {
        "title": "Normalizing entries",
        "subtitle": "Because normal is good"
      }
    }
  • {
      "title": "Normalizing entries",
      "subtitle": "Because normal is good",
      "__metadata": {
        "id": "123456789",
        "source": "source-source-contentful",
        "modelName": "blog",
        "modelLabel": "Blog posts",
        "projectId": "1q2w3e4r",
        "projectEnvironment": "master",
        "createdAt": "2011-10-05T14:48:00.000Z",
        "updatedAt": "2011-10-05T15:30:00.000Z"
      }
    }

💡 For data sources that don't have the concept of a unique ID for each entry, you're advised to auto-generate one using a package like https://www.npmjs.com/package/uuid.

Assets

Asset objects are subject to an additional normalization routine. The structure of the objects will be changed so that they always contain the following properties, in addition to any user-defined properties if the source supports it:

  • contentType (String): The MIME type describing the asset type
  • fileName (String): The name of the original file
  • url (String): The full asset URL

A __metadata block will still be added, but the value of the modelName property will be set to __asset.

Clone this wiki locally