-
Notifications
You must be signed in to change notification settings - Fork 765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Open Distro Elastic Search's KNN plugin support. Closes #174. #202
Conversation
@alexklibisz could you pls help to review?
|
Yeah, you're limited in terms of memory and cores by Docker, so I don't think it will bring anything to change config within the container. Almost all algorithms benefit from more memory/CPU/cores so we try to keep the benchmark simple. |
This looks great – thanks! |
Almost certainly CPU bottlenecked, but you never know without profiling. You can add
I'm not sure about that setting, but yes you generally want to specify parallelism=1 at every level. The runner pins your container to a single core, so if you try to do parallel CPU-bound work on many threads you'll either get killed by the container runtime or waste time context switching. |
Thank you for reviewing @alexklibisz . I think Here is a snapshot of
|
Oops I meant 4xlarge! Must’ve been a typo. Opendistro is using nmslib to
run the hnsw model. That likely consumes a chunk of memory. Also the JVM
Itself uses memory. The 3gb is just what’s allocated for ES.
…On Sun, Dec 13, 2020 at 3:18 AM Marie Stephen Leo ***@***.***> wrote:
Thank you for reviewing @alexklibisz <https://github.com/alexklibisz> . I
think EC2 c5.xlarge is insufficient for --parallelism 3 since it only has
8GB RAM while we need 3*3GB. Instead, I've run it successfully on a GCP n1-standard-4
(4 vCPUs, 15 GB RAM) machine.
Here is a snapshot of docker stats during the query (I wasn't successful
in getting VisualVM to show the stats). I'm not sure why the memory usage
for each container is exceeding 4GB despite setting -Xmx3G. What do you
think?
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
21d5eae537df beautiful_hermann 98.53% 4.019GiB / 4.511GiB 89.10% 1.44kB / 0B 23.3MB / 1.11GB 59
73b22733b190 dazzling_swartz 98.79% 4.209GiB / 4.512GiB 93.29% 1.44kB / 0B 43.5MB / 942MB 59
5d08d3625cfa pedantic_ishizaka 98.95% 4.24GiB / 4.511GiB 93.98% 1.44kB / 0B 722MB / 571MB 59
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#202 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB5E27DG46DWTYTVI23EZ5DSUR2FBANCNFSM4UXTGSLQ>
.
|
is this ready to be merged? |
I'm good to merge unless @alexklibisz has any concerns? |
LGTM. Excited to see all the results side-by-side. |
@erikbern slight tangent: did travis get removed for the repo? If you'd be interested, I've recently converted some other repos to use Github Actions. I could take a pass at that here. |
i'm not sure what happened to travis. it's still running, but something is broken with the PR integration, I think will merge this! |
Strange, the master build failed. Though the PR itself passed... |
i'll take a look at it |
found the issue @erikbern , open distro 1.12.0 was released on 14th and breaks the installation. I've submitted a PR with the fix. |
I would actually like to see CI via github actions. I like that we could just upload the produced plot artifacts to the builds. |
I'm not sure if I follow this part:
|
Sorry for the imprecision. I would like to see the plots generated via https://github.com/erikbern/ann-benchmarks/blob/master/.travis.yml#L43-L44 to be uploaded to the CI build via https://github.com/actions/upload-artifact. This could be useful for performance bug hunting. |
that would be a good idea – I'd be supportive of the change! |
Adds Open Distro Elastic Search's KNN plugin support. Closes #174.
Adding Open Distro Elasticsearch KNN plugin support by borrowing on the ES setup from
elasticsearch
andelastiknn
. Below is a comparison onfashion-mnist
between thisopendistroknn
andelastiknn
, whereopnedistroknn
has ~3X better queries/s at comparable recall.