Skip to content
This repository has been archived by the owner on Sep 16, 2024. It is now read-only.

Distribute replica forests evenly #330

Closed
dmcassel opened this issue Jan 25, 2019 · 6 comments
Closed

Distribute replica forests evenly #330

dmcassel opened this issue Jan 25, 2019 · 6 comments
Milestone

Comments

@dmcassel
Copy link

Given some number of forests per host and a number of replicas (in my case, 1), ml-gradle currently puts each forest's replica on the next host -- so all of host1's forests are replicated on host2, all of host2's forests are replicated on host3, etc. Erin Miller's Hardware Reference Architecture: Direct Attached Storage recommends not doing this: "Assuming 6 primary and 6 replica forests per host, it’s important to distribute forests equally across hosts. Specifically, you don’t want to replicate all forests from host 1 to host 2. If host 1 then goes down, host 2 will be supporting 12 primary forests, since the six replicas will have changed roles to primary. "

Modify the replica host assignment such that replicas for a host's forests are evenly distributed around the cluster.

@rjrudin
Copy link
Contributor

rjrudin commented Jan 28, 2019

Thanks @dmcassel - are you able to use this feature to show what the forest plan is for your config - https://github.com/marklogic-community/ml-gradle/wiki/Creating-forests#previewing-forest-creation ?

I haven't tried to reproduce this yet, but I don't recall adding support for this.

@dmcassel
Copy link
Author

Looks like the preview feature doesn't look at the forest directory. My staging database has forests laid out under ml-config/forests/(staging-db-name)/staging-forests.json, with a total of 54 forests over six hosts. (I'm hoping to make this property driven at some point, but that's what we have right now.) When I run the preview command, it doesn't see that forest config:

 gradle -Pdatabase=my-staging mlPrintForestPlan

> Task :mlPrintForestPlan
{
  "forest-name" : "my-staging-1",
  "host" : "ml1.local",
  "database" : "my-staging",
  "forest-replica" : [ {
    "host" : "ml2.local",
    "replica-name" : "my-staging-1-replica-1"
  } ]
}
{
  "forest-name" : "my-staging-2",
  "host" : "ml2.local",
  "database" : "my-staging",
  "forest-replica" : [ {
    "host" : "ml3.local",
    "replica-name" : "my-staging-2-replica-1"
  } ]
}
{
  "forest-name" : "my-staging-3",
  "host" : "ml3.local",
  "database" : "my-staging",
  "forest-replica" : [ {
    "host" : "ml1.local",
    "replica-name" : "my-staging-3-replica-1"
  } ]
}

The 3 forests (and replicas if applicable) that will be created the next time the database 'my-staging' is deployed (e.g. via the mlDeploy task) are listed above.

Note that I'm exploring this with a 3-node Docker cluster, rather than the six nodes we have in prod.

@dmcassel
Copy link
Author

I just added mlForestsPerHost=my-staging,2 to my properties file and the preview does pay attention to that:

> gradle -Pdatabase=my-staging mlPrintForestPlan

> Task :mlPrintForestPlan
{
  "forest-name" : "my-staging-1",
  "host" : "ml1.local",
  "database" : "my-staging",
  "forest-replica" : [ {
    "host" : "ml2.local",
    "replica-name" : "my-staging-1-replica-1"
  } ]
}
{
  "forest-name" : "my-staging-2",
  "host" : "ml1.local",
  "database" : "my-staging",
  "forest-replica" : [ {
    "host" : "ml2.local",
    "replica-name" : "my-staging-2-replica-1"
  } ]
}
{
  "forest-name" : "my-staging-3",
  "host" : "ml2.local",
  "database" : "my-staging",
  "forest-replica" : [ {
    "host" : "ml3.local",
    "replica-name" : "my-staging-3-replica-1"
  } ]
}
{
  "forest-name" : "my-staging-4",
  "host" : "ml2.local",
  "database" : "my-staging",
  "forest-replica" : [ {
    "host" : "ml3.local",
    "replica-name" : "my-staging-4-replica-1"
  } ]
}
{
  "forest-name" : "my-staging-5",
  "host" : "ml3.local",
  "database" : "my-staging",
  "forest-replica" : [ {
    "host" : "ml1.local",
    "replica-name" : "my-staging-5-replica-1"
  } ]
}
{
  "forest-name" : "my-staging-6",
  "host" : "ml3.local",
  "database" : "my-staging",
  "forest-replica" : [ {
    "host" : "ml1.local",
    "replica-name" : "my-staging-6-replica-1"
  } ]
}

The 6 forests (and replicas if applicable) that will be created the next time the database 'my-staging' is deployed (e.g. via the mlDeploy task) are listed above.

BUILD SUCCESSFUL in 1s
1 actionable task: 1 executed

What I'd like to see is my-staging-1's replica on host2 and my-staging-2's replica on host3. I haven't thought through the generalized algorithm yet, but I think you see what I'm going for, right?

@dmcassel
Copy link
Author

@rjrudin fyi I'm working on a PR for this

@rjrudin rjrudin transferred this issue from marklogic/ml-gradle Jan 31, 2019
@rjrudin rjrudin added this to the 3.12.0 milestone Feb 8, 2019
@rjrudin
Copy link
Contributor

rjrudin commented Feb 8, 2019

I added this commit after merging in the PR:

d546775

BuildForestTest is passing, and I used ConfigureReplicaForestsDebug on a local 3-host cluster to try out both strategies, and all appears well.

I may do a 3.12.beta release of ml-gradle though so you can try this out ASAP.

@rjrudin rjrudin closed this as completed Feb 8, 2019
@rjrudin
Copy link
Contributor

rjrudin commented Feb 8, 2019

Added docs at https://github.com/marklogic-community/ml-gradle/wiki/Creating-forests#replica-forest-creation . This is in case we run into any issues with the Distributed implementation in the 3.12.0 release.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants