Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better replication techniques for Fedora / Drupal synchronization. #121

Closed
daniel-dgi opened this issue Nov 26, 2015 · 1 comment
Closed

Comments

@daniel-dgi
Copy link
Contributor

The current sync machinery will inevitably yield inconsistencies under load. In a recent committers call (Nov 25th, 2015), a handful of techniques for better replication were discussed.

@nigelgbanks mentioned Bitcoin's block chain algorithm. Another option is per-resource version vectors. There's other options, such as hash histories (which would lead to a simplistic block chain algorithm), etc...

Please use this issue to discuss pros/cons of various approaches, and how we can best handle the multi-master setup that our project requirements dictate.

@DiegoPino
Copy link
Contributor

I depends (as always) which our master will be.
We are defining Drupal as the "main" originator (not exclusive, multi master), and we are tied somehow, on drupal side, to our ability to lock/wait on concurrent DB writes + locking until:

  • we get back our updated info from Fedora4 means
    • after the transactions ends,
    • activeMQ messages gets written,
    • consumed and
    • drupal is updated back.

On the other side we don't control blocking on fedora4, but we still do drupal, so basically our whole chain is based on what happens after the message is in ActiveMQ and we are writing back to drupal (at least i see this so).

If this is so, then really, ActiveMQ is our master. Some basic useful info here, have to research really what values do we have (like is the transaction id in the message?)

So, question/acertions:

  • we can't block on fedora4 (done internally), we can block only on drupal right?
  • we can ensure our messages gets delivered in a certain order to fedora4. We need to make sure they are handled by camel as a chained pipe?. example (creation)
  1. Create rdf/ldp base resource
  2. Create container for binary
  3. Create binary resource in container
  4. some other operation with the binary resource (scaling)
  5. Create derivative container for second binary
  6. Create binary resource in second container
  7. etc.
  • If using transactions, do we have to wait for activeMQ messages after each previous step?

I really like the version vectors idea, not sure if it's optimal in a really distributed, multi origin environment, but if we relay only on activeMQ as a source for "what comes next" then it's Ok.
This one is my favourite so far. https://github.com/ricardobcl/Interval-Tree-Clocks. The ability to join, etc, and don't relay on a pre-known origin and discrete steps is fine for me. Java implementation could work fine for us also.

On a simpler scenario, we can use http://symfony.com/doc/current/components/filesystem/lock_handler.html
based on resource ID + transaction

and/or (drupal 8 always in my mind)
https://api.drupal.org/api/drupal/vendor%21symfony%21http-foundation%21Session%21Storage%21Handler%21PdoSessionHandler.php/class/PdoSessionHandler/8

Since from how i see currently our lock needs (or better said what we are in control of) are on the drupal side. What i don't would like to see is blocking the UI during operations. We could just create a drupal entity with rdf and let all other expensive operations to depending entities as result of activeMQ getting consumed after all happened in the background. Then we can "sync" this depending entities in background (locking briefly) to a version of the main one and discard afterwards.

In any scenario we should define a few things as needed/enforced:

  • Fedora resource versioning must be enforced. We can't pretend to make every change on the base version of a resource.
  • Change topic to queue in ActiveMQ. We need a way to make messages persist for multiple consumers.
  • Drupal entity versioning also. Every "form submit" should generate a new version. We can the later, in background sync versions coming from drupal to fedora and from fedora to drupal
  • Sync should be a resource by resource option.
    • No need to remove automagicly a resource from drupal if removed from fedora4 if the user does not want this.
    • No need to sync in realtime a drupal entity to fedora4 is user is doing multiple small tasks on a resource. Basically the user should be in control of this.
    • There could be only "show and display" drupal instances, that fetch stuff from drupal but don't generate a operation on fedora4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants