Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tsdb: Fix split16 to avoid indexing 14 docs from the end #378

Merged
merged 1 commit into from
Feb 6, 2023

Conversation

pquentin
Copy link
Member

@pquentin pquentin commented Feb 6, 2023

We drop 2 documents from the corpus so that the document count is a multiple of 16 which will allow Rally to split at exact boundaries.

To confirm the fix, I've run the following queries at 2% of indexing and nothing from 2021-04-29 got indexed:

% curl http://localhost:39200/tsdb/_search --data-binary @d.json -H 'Content-Type: application/json' | jq .                                                   
{                                                                                                                                                              
  "took": 2557,                                                                
  "timed_out": false,                                                          
  "_shards": {                                                                                                                                                 
    "total": 1,                                                                
    "successful": 1,                                                                                                                                           
    "skipped": 0,                                                                                                                                              
    "failed": 0                                                                                                                                                
  },                                                                                                                                                           
  "hits": {                                                                    
    "total": {                                                                 
      "value": 10000,                                                          
      "relation": "gte"                                                        
    },                                                                         
    "max_score": null,                                                         
    "hits": []                                                                                                                                                 
  },                                                                           
  "aggregations": {                                                            
    "NAME": {                                                                  
      "buckets": [                                                             
        {                                                                      
          "key_as_string": "2021-04-28T17:00:00.000Z",                         
          "key": 1619629200000,                                                
          "doc_count": 2099499                                                 
        },                                                                     
        {                                                                      
          "key_as_string": "2021-04-28T18:00:00.000Z",                         
          "key": 1619632800000,                                                
          "doc_count": 484385                                                  
        }                                                                      
      ]                                                                                                                                                        
    }                                                                          
  }                                                                                                                                                            
}

with d.json having the following contents:

{
  "size": 0,
  "aggs": {
    "NAME": {
      "date_histogram": {
        "field": "@timestamp",
        "calendar_interval": "hour"
      }
    }
  }
}

We drop 2 documents from the corpus so that the document count is a
multiple of 16 which will allow Rally to split at exact boundaries.
@pquentin pquentin added the bug label Feb 6, 2023
@pquentin pquentin self-assigned this Feb 6, 2023
Copy link
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

### Generating the split16 corpus

By default, with N indexing clients Rally will split documents.json in N parts and bulk index from
them in parallel. As a result, by default ingest is not done in order, which makes TSDB sorting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two sentences might be useful to add in the Rally docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants