Skip to content

Commit

Permalink
Proportional Uneven Inference Sharding
Browse files Browse the repository at this point in the history
Summary:
Support bucketization aware inference sharding in TGIF for ZCH bucket boundaries from training.
A "best effort" sharding is performed across bucket boundaries proportional to memory list.

Differential Revision: D69057627
  • Loading branch information
kausv authored and facebook-github-bot committed Feb 7, 2025
1 parent 26e0732 commit 04276ed
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions torchrec/distributed/quant_state.py
Original file line number Diff line number Diff line change
Expand Up @@ -441,6 +441,7 @@ def sharded_tbes_weights_spec(
shard_sizes: List[int] = [table.local_rows, table.local_cols]
shard_offsets: List[int] = table_metadata.shard_offsets
s: str = "embedding_bags" if is_sqebc else "embeddings"
s = ("_embedding_module." if is_sqmcec else "") + s
unsharded_fqn_weight: str = f"{module_fqn}.{s}.{table_name}.weight"

sharded_fqn_weight: str = (
Expand Down

0 comments on commit 04276ed

Please sign in to comment.