Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing: ability to load balance indexing tasks across multiple app (Glassfish) servers #1757

Closed
pdurbin opened this issue Mar 25, 2015 · 4 comments
Assignees

Comments

@pdurbin
Copy link
Member

pdurbin commented Mar 25, 2015

@scolapasta and I have been talking about adding the ability to load balance indexing tasks across multiple app (Glassfish) servers in person and in the Dataverse 4.0 Search Index Functional Requirements Document doc.

Imagine if you had three Glassfish servers and each one does part of "index all".

I stubbed out some code and tests at c4fb067 but we need to keep working on this to make it real.

@scolapasta please advise on when this work should continue and who should do it.

@scolapasta
Copy link
Contributor

Query in sql to get only the modular values:
select * from dataverse where id % 3 = 1;

It does seem like EJBQL supports this with MOD( 7, 3 ) = 1, so we probably don't have to write is as a native query.

@pdurbin
Copy link
Member Author

pdurbin commented Mar 26, 2015

As of 317bcbf "index all" can be load balanced across multiple Glassfish servers. I provided a script that explains how to select the number of partitions (i.e. number of Glassfish servers) and for the given server, which partitionId should be run.

Here's the script: https://github.com/IQSS/dataverse/blob/master/scripts/search/index

Here's some example output:

curl 'http://localhost:8080/api/admin/index?numPartitions=2&partitionIdToProcess=0&previewOnly=true'

{
  "data": {
    "availablePartitionIds": [
      0,
      1
    ],
    "args": {
      "partitionIdToProcess": 0,
      "numPartitions": 2
    },
    "previewOfPartitionWorkload": {
      "partitionId": 0,
      "datasetCount": 1,
      "dataverseCount": 4,
      "dvContainerIds": {
        "datasets": [
          10
        ],
        "dataverses": [
          2,
          4,
          6,
          8
        ]
      }
    }
  },
  "status": "OK"
}

curl 'http://localhost:8080/api/admin/index?numPartitions=2&partitionIdToProcess=0'

{
  "data": {
    "message": "indexAllOrSubset has begun of 4 dataverses and 1 datasets.",
    "args": {
      "partitionIdToProcess": 0,
      "numPartitions": 2
    },
    "availablePartitionIds": [
      0,
      1
    ]
  },
  "status": "OK"
}

Passing to QA. Heads up to @scolapasta and @landreev and @ekraffmiller

@pdurbin pdurbin removed their assignment Mar 26, 2015
@kcondon kcondon self-assigned this Apr 1, 2015
@pdurbin
Copy link
Member Author

pdurbin commented Jul 2, 2015

Since this issue is still open I figure it's fair game to leave a comment here. :)

I just wanted to note that in 772813e @scolapasta added a "continue" endpoint which I haven't used personally but is a way to pick up where you left off.

@kcondon and I discussed how much of this we should document in the guides and we're not at all sure that these endpoints should be documented and encouraged in their current form. Perhaps we need a new issue about defining what we want, cleaning up those endpoints and documenting them. It would be good to look at the Dataverse 4.0 Search Index Functional Requirements Document if we do. There are a variety of index issues still open and not in QA such as #50 #702 #1408 #1749 #2279.

@kcondon
Copy link
Contributor

kcondon commented Jul 28, 2015

@scolapasta @pdurbin The current mod implementation has a continue flag and is intended to continue but it doesn't seem to work. It may have to do with it not updating indextime in the db as it indexes so there is not reference point to continue from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants