Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relations and referential integrity in NoSQL databases #2127

Closed
bajtos opened this issue Dec 6, 2018 · 8 comments
Closed

Relations and referential integrity in NoSQL databases #2127

bajtos opened this issue Dec 6, 2018 · 8 comments
Labels
needs discussion Relations Model relations (has many, etc.) Repository Issues related to @loopback/repository package

Comments

@bajtos
Copy link
Member

bajtos commented Dec 6, 2018

Our current implementation of model relations (has-many, has-one, belongs-to) is deeply rooted in SQL and based on the assumption that the database take care of referential integrity for us.

Example 1: "Customer has many Order instances" and "Order belongs to Customer". When creating a new Order instance, we expect the database to verify that Order.customerId is matching the id value of an existing Customer record. We don't have any reliable & atomic way to do this check at LoopBack side.

Example 2: "Customer has one Credentials instance". When creating a new Credentials instance, we expect the database to verify that there are no other Credentials instances already created for the user. We don't have any reliable & atomic way to do this check at LoopBack side.

SQL databases provide FOREIGN KEY and UNIQUE constraints that work great for this flavor of relations.

The situation becomes more tricky when we try to map this approach to NoSQL databases. Many NoSQL databases do not provide FOREIGN KEY and UNIQUE constraints, this is often a constraint caused by CAP theorem.

For example, it's not possible to enforce UNIQUE constraint when the model data is stored in multiple physical machines and a network partition occurs (a node performing a write operation is not able to reach other nodes because of networking problems, and thus is cannot verify that the new value is not violating uniqueness constraint for records stored on those nodes).

I think we should rethink the way how we are modelling relations and offer different flavors optimized for different backends.

For example, instead of storing a foreign key in the target model, we can store id of related model(s) in the source model and use optimistic locking scheme to enforce the constraints

class Customer {
  // hasMany relation
  orderIds: string[];

  // hasOne relation
  credentialsId: string;
}

// Algorithm for creating a new Order

// 1. Check that the customer exists
const customer = await customerRepo.findById(customerId);

// 2. Create the order
const order = await orderRepo.create(orderData);
try {
  // 3. Add the new order to the customer
  customer.orderIds.push(order.id);
  await customerRepo.replace(customer);
} catch (err) {
  if (/* a conflict occurred, e.g. somebody deleted the Customer */) {
    // 4. Roll back on conflict
    await orderRepo.deleteById(order.id);
  }
  throw err;
}

We can even store the related models as embedded documents, this should work great for Document databases.

class Customer {
  // hasMany relation
  orders: Order[];

  // hasOne relation
  credentials: Credentials;
}

Related issues & discussions:

@strongloop/loopback-next @strongloop/loopback-maintainers thoughts?

@bajtos bajtos added discussion Relations Model relations (has many, etc.) Repository Issues related to @loopback/repository package labels Dec 6, 2018
@b-admike
Copy link
Contributor

b-admike commented Dec 7, 2018

@bajtos thank you for starting the discussion around NoSQL backend support for relations; I get the gist of your proposal and since we're not using the LB3 relation engine, it's logical to introduce different flavours of relations for NoSQL DBs.

@b-admike
Copy link
Contributor

b-admike commented Dec 7, 2018

I do like to add though that the bulk of the issues that arose with MongoDB and relations in LB4 have to do with the strictObjectIdCoercion flag and how the connector treats ID values. Maybe it's worth it to test it out with cloudant and see the behaviour there as well.

@raymondfeng
Copy link
Contributor

For example, instead of storing a foreign key in the target model, we can store id of related model(s) in the source model and use optimistic locking scheme to enforce the constraints

The referencesMany and referencesOne relations store FKs with the source model. For example:

Customer
  - publicProfileId ( a customer has a public profile)

Customer
  - emailIds ( a customer has multiple emails)

@bajtos
Copy link
Member Author

bajtos commented Dec 13, 2018

See my #1718 (comment). I think we need to look at the bigger picture first.

@bajtos
Copy link
Member Author

bajtos commented Feb 4, 2019

Based on @raymondfeng's comment above and the discussion we had elsewhere, I am proposing the following:

  • Treat 1-N relation as a concept that requires different implementation (schema design) depending on the database used. Explain this in our documentation for relations.

  • Make it very clear in our documentation that hasMany and belongsTo works best with SQL databases that are able to enforce referential integrity. When these relations are used with NoSQL, referential integrity is not enforced and we end up with a "weak relation"

  • In the docs for hasMany and belongsTo, explain NoSQL users that hasMany/belongsTo relations are not suited for NoSQL databases and that we are working on a better solution.

  • Create a new section (or even a new page) in our docs for relations and explain users what relation type to choose depending on which database they use.

    For example: if you are building a 1-N relation, if you have SQL - use HasMany, if you have Cloudant - use ReferencesMany, if you have MongoDB - use embedsMany, etc.

    We should verify that our recommended solution is actually a good one that works well for the target database. This will most likely require research on our side, as we don't have deep knowledge of different NoSQL databases.

  • Create a new spike user story for each NoSQL database we have an official connector for. In this spike, we will research what tools are offered for ensuring referential integrity, come up with a proposal on how LoopBack applications should implement 1-N (hasMany), N-1 (belongsTo) and 1-1 (hasOne/belongsTo) relations. For example, we can recommend referencesMany or embedsMany, but also find out that a completely new relation type is needed.

  • Based on the outcome of these spikes, create user stories to implement the missing relation types identified as needed for the databases we support. These stories should cover both the implementation and documentation updates. For many (if not all) relation types, a spike story to figure out implementation details may be needed first.

I am going to convert this GH issue into an Epic.

@bajtos bajtos added the epic label Feb 4, 2019
@bajtos
Copy link
Member Author

bajtos commented Feb 4, 2019

Converting back into issue in favor of a newly created epic #2331

@bajtos bajtos removed the epic label Feb 4, 2019
@raymondfeng
Copy link
Contributor

Based on @raymondfeng's comment above and the discussion we had elsewhere, I am proposing the following:

Great summary. +1.

@bajtos
Copy link
Member Author

bajtos commented Feb 5, 2019

  • Treat 1-N relation as a concept that requires different implementation (schema design) depending on the database used. Explain this in our documentation for relations.
  • Make it very clear in our documentation that hasMany and belongsTo works best with SQL databases that are able to enforce referential integrity. When these relations are used with NoSQL, referential integrity is not enforced and we end up with a "weak relation"
  • In the docs for hasMany and belongsTo, explain NoSQL users that hasMany/belongsTo relations are not suited for NoSQL databases and that we are working on a better solution.

Moved these steps into a new issue: Document hasMany/belongsTo/hasOne limitations for NoSQL databases #2340

Create a new section (or even a new page) in our docs for relations and explain users what relation type to choose depending on which database they use.

See #2341

Create a new spike user story for each NoSQL database we have an official connector for. In this spike, we will research what tools are offered for ensuring referential integrity, come up with a proposal on how LoopBack applications should implement 1-N (hasMany), N-1 (belongsTo) and 1-1 (hasOne/belongsTo) relations. For example, we can recommend referencesMany or embedsMany, but also find out that a completely new relation type is needed.

Created the following spike stories:

I feel the discussion is over and since we have a list of small actionable tasks to follow up, I am closing this GH issue as done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs discussion Relations Model relations (has many, etc.) Repository Issues related to @loopback/repository package
Projects
None yet
Development

No branches or pull requests

4 participants