You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While performing training on with Cohere 100k corpus and 30k training vectors and encorder pq ( m=8 , codesize=8), training step fails silently with out any error
Running train-knn-model [ 0% done]
[ERROR] Cannot execute-test. Error in load generator [0]
Cannot run task [train-knn-model]: Failed to create model: test-model within 100 retries
Also set the
"knn.model.cache.size.limit" : "25%",
What is the expected behavior?
Training should have gone through with out any failure
What is your host/environment?
Running on r6i.4xlarge node ( SIngle data node)
with a pod
opensearchJavaOpts: "-Xmx12G -Xms12G"
resources:
requests:
cpu: "2000m"
memory: "8Gi"
limits:
memory: "32Gi"
cpu: "4"
OS version 15.0
Do you have any screenshots?
If applicable, add screenshots to help explain your problem.
Do you have any additional context?
Add any other context about the problem.
The text was updated successfully, but these errors were encountered:
layavadi
changed the title
[BUG] Training step crashes with out any error for FAISS IVF with quantisation strep
[BUG] Training step crashes with out any error for FAISS IVF with quantisation step
Oct 24, 2024
What is the bug?
While performing training on with Cohere 100k corpus and 30k training vectors and encorder pq ( m=8 , codesize=8), training step fails silently with out any error
Running train-knn-model [ 0% done]
[ERROR] Cannot execute-test. Error in load generator [0]
Cannot run task [train-knn-model]: Failed to create model: test-model within 100 retries
How can one reproduce the bug?
PARAM file used in the benchmark
cat train-faiss-cohere-100k-768-ip.json
{
"target_index_name": "target_index",
"target_field_name": "target_field",
"target_index_body": "indices/faiss-index.json",
"target_index_primary_shards": 1,
"target_index_replica_shards": 0,
"target_index_dimension": 768,
"target_index_space_type": "innerproduct",
"target_index_bulk_size": 100,
"target_index_bulk_index_data_set_format": "hdf5",
"target_index_bulk_index_data_set_corpus": "cohere-100k",
"target_index_bulk_indexing_clients": 10,
}
Running
opensearch-benchmark execute-test --target-hosts ${ENDPOINT} --workload vectorsearch --workload-params ${PARAMS_FILE} --pipeline benchmark-only --test-procedure train-test --kill-running-processes
Also set the
"knn.model.cache.size.limit" : "25%",
What is the expected behavior?
Training should have gone through with out any failure
What is your host/environment?
Running on r6i.4xlarge node ( SIngle data node)
with a pod
opensearchJavaOpts: "-Xmx12G -Xms12G"
resources:
requests:
cpu: "2000m"
memory: "8Gi"
limits:
memory: "32Gi"
cpu: "4"
OS version 15.0
Do you have any screenshots?
If applicable, add screenshots to help explain your problem.
Do you have any additional context?
Add any other context about the problem.
The text was updated successfully, but these errors were encountered: