Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LabelEncoder kernel creation improvement #16516

Merged

Conversation

adityagoel4512
Copy link
Contributor

@adityagoel4512 adityagoel4512 commented Jun 28, 2023

Description

This PR updates the initialisation of the _map in LabelEncoder_2 to be more memory efficient.

  1. Firstly, we switch from std::unordered_map to absl::flat_hash_map. The latter has a more compact layout and doesn't have the overhead of maintaining reference validity like std::unordered_map, which is a feature we do not need here since we only initialise _map once at creation and then perform lookups duing compute. Abseil is already used extensively within onnxruntime.
  2. Secondly, space is reserved before inserting to prevent rehashing and reallocation.

Motivation and Context

For very large lookups, the LabelEncoder's kernel creation can require more RAM than necessary. These simple changes in _map initialisation improve initialisation speed and memory allocated.

@adityagoel4512 adityagoel4512 changed the title Preallocate unordered_map in label encoder and use emplace Make LabelEncoder more memory efficient. Jul 2, 2023
@adityagoel4512 adityagoel4512 changed the title Make LabelEncoder more memory efficient. Make LabelEncoder creation more memory efficient. Jul 2, 2023
@adityagoel4512
Copy link
Contributor Author

Closes #16575

@adityagoel4512 adityagoel4512 changed the title Make LabelEncoder creation more memory efficient. LabelEncoder kernel creation improvement Jul 3, 2023
@adityagoel4512
Copy link
Contributor Author

@baijumeswani not sure if you are the right person to ask, but would it be possible to get a review on this?

@baijumeswani
Copy link
Contributor

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux OpenVINO CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, Linux QNN CI Pipeline

@baijumeswani
Copy link
Contributor

/azp run Windows CPU CI Pipeline, Windows GPU CI Pipeline, Windows GPU TensorRT CI Pipeline, Windows ARM64 QNN CI Pipeline, orttraining-linux-ci-pipeline, orttraining-linux-gpu-ci-pipeline, orttraining-ortmodule-distributed, ONNX Runtime React Native CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 9 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 8 pipeline(s).

@adityagoel4512
Copy link
Contributor Author

@baijumeswani looks like the CI has passed

@baijumeswani baijumeswani merged commit 9799d43 into microsoft:main Jul 5, 2023
@baijumeswani
Copy link
Contributor

Thank you for your contribution.

@adityagoel4512 adityagoel4512 deleted the preallocate_label_encoder_map branch July 5, 2023 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants