Reduce memory usage in TF building #24046

Rocketknight1 · 2023-06-06T13:59:20Z

This PR reduces the default shape of dummy inputs from (3, 3) to (2, 2). This slightly reduces the memory usage when building TF models, which should hopefully fix some of our pipeline tests.

We could replace the dummy inputs with symbolic tensors, which would mean we could build TF models with 0 memory usage, but this would make TF model building slower (~4X) because it would implicitly compile the model when building, which is probably not an acceptable tradeoff.

cc @ydshieh and @amyeroberts as core maintainer

HuggingFaceDocBuilderDev · 2023-06-06T14:13:48Z

The documentation is not available anymore as the PR was closed or merged.

ydshieh

Thanks, the change itself is good for me.

ydshieh · 2023-06-06T14:36:54Z

Let me run it on CI and see.

amyeroberts

Change LGTM - thanks for updating!

Happy to merge once @ydshieh gives the 👍 from CI runs

amyeroberts · 2023-06-06T14:54:55Z

src/transformers/modeling_tf_utils.py

@@ -1116,16 +1116,16 @@ def dummy_inputs(self) -> Dict[str, tf.Tensor]:
        dummies = {}
        sig = self._prune_signature(self.input_signature)
        for key, spec in sig.items():
-            # 3 is the most correct arbitrary size. I will not be taking questions
-            dummies[key] = tf.ones(shape=[dim if dim is not None else 3 for dim in spec.shape], dtype=spec.dtype)
+            # 2 is the most correct arbitrary size. I will not be taking questions


I wish to file this diff as evidence to the contrary #team3

Rocketknight1 · 2023-06-06T15:14:58Z

Sorry for the delay - there's an issue with Funnel that wasn't reproducing on my machine. I eventually figured out that the problem is the classic TF one: indices for tf.gather are not validated on GPU but are validated on CPU, and so the bug only becomes apparent on CPU. Will fix in just a sec!

ydshieh · 2023-06-06T15:22:04Z

I also tried to run the change in this PR, and got

FAILED tests/pipelines/test_pipelines_common.py::PipelineUtilsTest::test_load_default_pipelines_tf - tensorflow.python.framework.errors_impl.ResourceExhaustedError: {{function_node __wrapped__Transpose_device_/job:localhost/replica:0/task:0/device:GPU:0}} OOM when allocating tensor with shape[768,768] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:Transpose]
FAILED tests/pipelines/test_pipelines_common.py::PipelineUtilsTest::test_load_default_pipelines_tf_table_qa - tensorflow.python.framework.errors_impl.ResourceExhaustedError: Exception encountered when calling layer 'tapas' (type TFTapasMainLayer).

{{function_node __wrapped__StatelessTruncatedNormalV2_device_/job:localhost/replica:0/task:0/device:GPU:0}} OOM when allocating tensor with shape[30522,768] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:StatelessTruncatedNormalV2]

Call arguments received by layer 'tapas' (type TFTapasMainLayer):
  • input_ids=tf.Tensor(shape=(2, 2), dtype=int32)
  • attention_mask=tf.Tensor(shape=(2, 2), dtype=float32)
  • token_type_ids=tf.Tensor(shape=(2, 2, 7), dtype=int32)
  • position_ids=None
  • head_mask=None
  • inputs_embeds=None
  • output_attentions=False
  • output_hidden_states=False
  • return_dict=True
  • training=False

and 5 other ones (probably due to the above one).

@Rocketknight1 I think we will have to reiterate (change->run->change->run) a bit more before we merge.

Rocketknight1 · 2023-06-06T15:29:54Z

Yep, working on it now!

ydshieh · 2023-06-06T15:41:43Z

The tests/pipelines/test_pipelines_common.py::PipelineUtilsTest::test_load_default_pipelines_tf run against a list of models, so it's kind normal it fails with other models even some fixes are done previously.

I am OK to trigger the run (a subset) whenever you feel it's time. Otherwise I can show you a modified workflow file for you to trigger manually.

Rocketknight1 · 2023-06-06T16:26:35Z

@ydshieh the issues with Funnel have been resolved, so this should be ready for a CI run now!

ydshieh · 2023-06-06T16:41:06Z

You can watch it live here. It will take 20-30 min to finish.

Rocketknight1 · 2023-06-06T17:10:07Z

Looks like they're still failing even with very small dummies. I'll investigate those models and try to figure out why - the new dummies should be smaller than the old ones!

Rocketknight1 · 2023-06-06T17:11:19Z

Maybe this is a sign that we should transition the dummies to symbolic tensors for those models, even if it's probably too slow for our tests to do it across the whole codebase.

* Make the default dummies (2, 2) instead of (3, 3) * Fix for Funnel * Actually fix Funnel

Make the default dummies (2, 2) instead of (3, 3)

1fdb055

Rocketknight1 requested review from ydshieh and amyeroberts June 6, 2023 13:59

ydshieh approved these changes Jun 6, 2023

View reviewed changes

amyeroberts approved these changes Jun 6, 2023

View reviewed changes

Fix for Funnel

aa71374

Actually fix Funnel

2b32578

Rocketknight1 merged commit 7203ea6 into main Jun 6, 2023

Rocketknight1 deleted the lower_dummy_memory_usage branch June 6, 2023 17:29

novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023

Reduce memory usage in TF building (huggingface#24046)

61e0124

* Make the default dummies (2, 2) instead of (3, 3) * Fix for Funnel * Actually fix Funnel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory usage in TF building #24046

Reduce memory usage in TF building #24046

Rocketknight1 commented Jun 6, 2023

HuggingFaceDocBuilderDev commented Jun 6, 2023 •

edited

Loading

ydshieh left a comment •

edited

Loading

ydshieh commented Jun 6, 2023

amyeroberts left a comment

amyeroberts Jun 6, 2023

Rocketknight1 commented Jun 6, 2023

ydshieh commented Jun 6, 2023 •

edited

Loading

Rocketknight1 commented Jun 6, 2023

ydshieh commented Jun 6, 2023

Rocketknight1 commented Jun 6, 2023

ydshieh commented Jun 6, 2023

Rocketknight1 commented Jun 6, 2023

Rocketknight1 commented Jun 6, 2023

Reduce memory usage in TF building #24046

Reduce memory usage in TF building #24046

Conversation

Rocketknight1 commented Jun 6, 2023

HuggingFaceDocBuilderDev commented Jun 6, 2023 • edited Loading

ydshieh left a comment • edited Loading

Choose a reason for hiding this comment

ydshieh commented Jun 6, 2023

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Jun 6, 2023

Choose a reason for hiding this comment

Rocketknight1 commented Jun 6, 2023

ydshieh commented Jun 6, 2023 • edited Loading

Rocketknight1 commented Jun 6, 2023

ydshieh commented Jun 6, 2023

Rocketknight1 commented Jun 6, 2023

ydshieh commented Jun 6, 2023

Rocketknight1 commented Jun 6, 2023

Rocketknight1 commented Jun 6, 2023

HuggingFaceDocBuilderDev commented Jun 6, 2023 •

edited

Loading

ydshieh left a comment •

edited

Loading

ydshieh commented Jun 6, 2023 •

edited

Loading