Port IDEFICS to tensorflow (#26870)

* Initial commit * Just a copy of modeling_idefics.py that will be ported to TF * - Prepend TF to the name of all classes - Convert pytorch ops to TF (not all operations are converted yet) * Add TF imports * Add autotranslated files * Add TF classes to model_tf_auto.py * Add the TF classes in model_doc * include auto-translated code * Adopted from auto-translated version * Add a forgotten super().build * Add test code for TF version. * Fix indentation and load pytorch weights for now * Some fixes. Many tests are still failing but some are passing now. - I have added TODO's for some of the hacks I made to unblock me and I will address them soon - I have the processing_idefics.py hacked in my view to support TF temporarily * Add ALL_LAYERNORM_LAYERS to match pytorch * Revert "Add ALL_LAYERNORM_LAYERS to match pytorch" This reverts commit 7e0a35119b4d7a6284d04d8c543fba1b29e573c9 as it is not needed in the tf implementation. * Fix freeze_relevant_params() * Some more fixes * Fix test_attention_outputs * Add tf stuff to processing_idefics.py processing_idefics.py supports both pytorch and tf now. test_processor_idefics.py for pytorch is passing, so i didn't break anything but still some issues with tf. I also need to add tf tests in test_processor_idefics.py. * Pass return_tensors to image processing code and fix test * Pass return_tensors to the image processor __init__ * Fix several test cases - Make input to some of the forward pass of type `TFModelInputType` - Decorate main layer forward pass with `@unpack_inputs` - Decorate main layer with `@keras_serializable` - Pass `inputs` to TFIdeficsModel * Some more fixes forgotten in last commit * Fix processing code and vision_tf.py * Fix perceiver bug * Import from * Auto-add build() methods + style pass * Fix build() errors due to `None` being passed as shape to some layers * Change name in TFIdeficsForVisionText2Text to attribute in IdeficsForVisionText2Text * Fix pytorch weights load for tf2 There were a lot of `name=` missing in weight initialization code. * Attempt to fix CI * Add back accidently removed line * Remove torch-specific stuff from the TF test file * make fix-copies, make style, remove autotranslated files * Fixes to imports/docstrings * Let's try the from future import in desperation * Fix the core random_attention_mask fn to match the torch/flax behaviour * Clean random_attention_mask up correctly * Remove torch-only test * Fix loss shape, couple of nits * make style * Don't test for OOB embeddings because IDEFICS uses those deliberately * Fix loss computation to handle masking * Fix test failures when flattening * Fix some test failures - Add cross attention gate which was missing and wasn't being passed arround - Fix overwriting of image_attention_mask due to hack I had for dummy inputs * Add a proper stateless scaled_dot_product_attention * make style * Adding missing attribute from the PyTorch version * Small cleanups to decoupledlinearlayer in case that helps * Pass epsilon to LayerNormalization * Attemp to fix pytorch weight cross-loading for TFIdeficsEmbedding * Fix a bug in TFIdeficsGatedCrossAttentionLayer * Patching up build() methods * Constant self.inv_freq * Constant self.inv_freq * First working version The TF implementation works now, there was a bug in the TFIdeficsDecoupledLinear where the weights were mis-intialized (in_features,out_features) when it should be: (out_features, in_features) I have tested this so far with tiny-random and idefics-9b-instruct and gives correct output. I also dumped the final outputs for both pytorch and TF and they are identical. * Fix some test failures * remove print statement * Fix return_tensors * Fix CI test failure check_code_quality * Attempt to fix CI failures by running `make fixup` The hardcoded IDs in test_modeling_tf_idefics.py are for the integration test and makes that file unreadable and should probably be moved to a seperate file. * Attempt to fix tests_pr_documentation_tests * Fix a test failure in test_image_processing_idefics.py * Fix test test_pt_tf_model_equivalence * Fix a few failures * Tiny fix * Some minor fixes * Remove a duplicate test * Override a few test failures for IDEFICS - `test_keras_save_load` is passing now - `test_compile_tf_model` is still failing * Fix processing_idefics.py after rebase * Guard import keras with is_tf_available * fix check code quality * fix check code quality * Minor fixes * Skip test_save_load temporarily This test passed on my local box but fails on the CI, skipping for now to see if there are other remaining failures on the CI. * Run `ruff format tests src utils` * Fix last failing test, `test_compile_tf_model` * Add fixes for vision_tf.py I forgot to add this file in last commit. * Minor fixes * Replace "<<<" with "<<" for doc tests IDEFICS-9B is too big for doctest runner, so don't run it there * Make code more readable * Fix bug after code review I added a layer_norm_eps to IdeficsConfig but I don't even need it since the vision config has a layer_norm_eps. * Fix after code review Use original code tokenizer.convert_tokens_to_ids * Keep PyTorch as the default return_tensors * Fixes to modeling_tf after code review * Fixes from code review - Remove all references of `TF_IDEFICS_PRETRAINED_MODEL_ARCHIVE_LIST` - Pass 1e-5 to LayerNormalization in perceiver * Run ruff * Undo a change * Refactor processing code after Matt's suggestion * Remove TODO's that aren't needed anymore * For pytorch, Use original pytorch processing code from main Since this PR is a TF port it shouldn't make any modifications to pytorch IDEFICS code. This changes undo's the pytorch processing modifications I made and uses original code from main. * Update tests/models/idefics/test_modeling_idefics.py * Update tests/models/idefics/test_modeling_tf_idefics.py * Add missing imports for is_pt_tf_cross_test * [DO NOT MERGE]: This is a commit for debugging and will be reverted The cross test `test_pt_tf_model_equivalence` passes locally but fails when running on the CI. This commit is to help debug that and will be reverted. * Revert "[DO NOT MERGE]: This is a commit for debugging and will be reverted" This reverts commit 8f0d709ec5bd46685fb0b4259d914ffee794875b. * [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted * [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted * Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted" This reverts commit 998cc38b8c3d313bf5e5eb55a7f5b7b881897b89. * Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted" This reverts commit 1c695ac4219c4ae4d39b330b01744dc27deb7dd4. * Don't skip test_save_load IIRC test_save_load was also failing on the CI but not on my local box, it might be easier to debug that on the CI first than the cross tests * Debugging commit, will be reverted * Revert "Debugging commit, will be reverted" This reverts commit 8eafc8e41e20c4e95a3a90834f06a6e9f445e2d5. * Override `test_save_load` and push model to save Maybe this will help me repro this weird bug * pass my repo_id * add endpoint * Pass a temp (write) token just for this CI * Undo last few commits, still pushing to hub for model debugging The issue seems to be with save_pretrained(), when I looked at the model saved from the CI test failure it is basically empty and has no weights. `self.save_weights(..)` seems to be failing in save_pretrained but needs more debugging * Add logging to modeling tf utils, will be reverted just for debugging * Debugging, will revert * Revert "Debugging, will revert" This reverts commit 9d0d3075fb7c82d8cde3a5c76bc8f3876c5c55d3. * Revert "Add logging to modeling tf utils, will be reverted just for debugging" This reverts commit 774b6b7b1c17b3ce5d7634ade768f2f686cee617. * Remove `test_save_load` The CI failures are gone after my latest rebase, no idea why but I was still saving the model to my hub on HF and the tf_model.h5 file now has everything. * Run make fix-copies * Run ruff format tests src utils * Debugging commit, will be reverted * Run ruff, also trigger CI run * Run ruff again * Undo debugging commit --------- Co-authored-by: Matt <[email protected]> Co-authored-by: Matt <[email protected]>
huggingface · May 13, 2024 · 9430635 · 9430635
1 parent de2f722
commit 9430635
Show file tree

Hide file tree

Showing 17 changed files with 3,392 additions and 49 deletions.
diff --git a/docs/source/en/index.md b/docs/source/en/index.md
@@ -160,7 +160,7 @@ Flax), PyTorch, and/or TensorFlow.
 |                       [HerBERT](model_doc/herbert)                       |       ✅        |         ✅         |      ✅      |
 |                        [Hubert](model_doc/hubert)                        |       ✅        |         ✅         |      ❌      |
 |                        [I-BERT](model_doc/ibert)                         |       ✅        |         ❌         |      ❌      |
-|                       [IDEFICS](model_doc/idefics)                       |       ✅        |         ❌         |      ❌      |
+|                       [IDEFICS](model_doc/idefics)                       |       ✅        |         ✅         |      ❌      |
 |                      [Idefics2](model_doc/idefics2)                      |       ✅        |         ❌         |      ❌      |
 |                      [ImageGPT](model_doc/imagegpt)                      |       ✅        |         ❌         |      ❌      |
 |                      [Informer](model_doc/informer)                      |       ✅        |         ❌         |      ❌      |

diff --git a/docs/source/en/model_doc/idefics.md b/docs/source/en/model_doc/idefics.md
@@ -52,6 +52,16 @@ To train a new IDEFICS model from scratch use the m4 codebase (a link will be pr
 [[autodoc]] IdeficsForVisionText2Text
     - forward
 
+## TFIdeficsModel
+
+[[autodoc]] TFIdeficsModel
+    - call
+
+## TFIdeficsForVisionText2Text
+
+[[autodoc]] TFIdeficsForVisionText2Text
+    - call
+
 ## IdeficsImageProcessor
 
 [[autodoc]] IdeficsImageProcessor

diff --git a/src/transformers/__init__.py b/src/transformers/__init__.py
@@ -3862,6 +3862,15 @@
             "TFHubertPreTrainedModel",
         ]
     )
+
+    _import_structure["models.idefics"].extend(
+        [
+            "TFIdeficsForVisionText2Text",
+            "TFIdeficsModel",
+            "TFIdeficsPreTrainedModel",
+        ]
+    )
+
     _import_structure["models.layoutlm"].extend(
         [
             "TFLayoutLMForMaskedLM",
@@ -7905,6 +7914,11 @@
             TFHubertModel,
             TFHubertPreTrainedModel,
         )
+        from .models.idefics import (
+            TFIdeficsForVisionText2Text,
+            TFIdeficsModel,
+            TFIdeficsPreTrainedModel,
+        )
         from .models.layoutlm import (
             TFLayoutLMForMaskedLM,
             TFLayoutLMForQuestionAnswering,

diff --git a/src/transformers/models/auto/modeling_tf_auto.py b/src/transformers/models/auto/modeling_tf_auto.py
@@ -58,6 +58,7 @@
         ("gptj", "TFGPTJModel"),
         ("groupvit", "TFGroupViTModel"),
         ("hubert", "TFHubertModel"),
+        ("idefics", "TFIdeficsModel"),
         ("layoutlm", "TFLayoutLMModel"),
         ("layoutlmv3", "TFLayoutLMv3Model"),
         ("led", "TFLEDModel"),
@@ -112,6 +113,7 @@
         ("funnel", "TFFunnelForPreTraining"),
         ("gpt-sw3", "TFGPT2LMHeadModel"),
         ("gpt2", "TFGPT2LMHeadModel"),
+        ("idefics", "TFIdeficsForVisionText2Text"),
         ("layoutlm", "TFLayoutLMForMaskedLM"),
         ("lxmert", "TFLxmertForPreTraining"),
         ("mobilebert", "TFMobileBertForPreTraining"),

diff --git a/src/transformers/models/idefics/__init__.py b/src/transformers/models/idefics/__init__.py
@@ -13,7 +13,13 @@
 # limitations under the License.
 from typing import TYPE_CHECKING
 
-from ...utils import OptionalDependencyNotAvailable, _LazyModule, is_torch_available, is_vision_available
+from ...utils import (
+    OptionalDependencyNotAvailable,
+    _LazyModule,
+    is_tf_available,
+    is_torch_available,
+    is_vision_available,
+)
 
 
 _import_structure = {"configuration_idefics": ["IdeficsConfig"]}
@@ -39,6 +45,17 @@
     ]
     _import_structure["processing_idefics"] = ["IdeficsProcessor"]
 
+try:
+    if not is_tf_available():
+        raise OptionalDependencyNotAvailable()
+except OptionalDependencyNotAvailable:
+    pass
+else:
+    _import_structure["modeling_tf_idefics"] = [
+        "TFIdeficsForVisionText2Text",
+        "TFIdeficsModel",
+        "TFIdeficsPreTrainedModel",
+    ]
 
 if TYPE_CHECKING:
     from .configuration_idefics import IdeficsConfig
@@ -64,6 +81,17 @@
         )
         from .processing_idefics import IdeficsProcessor
 
+    try:
+        if not is_tf_available():
+            raise OptionalDependencyNotAvailable()
+    except OptionalDependencyNotAvailable:
+        pass
+    else:
+        from .modeling_tf_idefics import (
+            TFIdeficsForVisionText2Text,
+            TFIdeficsModel,
+            TFIdeficsPreTrainedModel,
+        )
 
 else:
     import sys

diff --git a/src/transformers/models/idefics/image_processing_idefics.py b/src/transformers/models/idefics/image_processing_idefics.py
@@ -92,8 +92,9 @@ def preprocess(
         image_mean: Optional[Union[float, List[float]]] = None,
         image_std: Optional[Union[float, List[float]]] = None,
         transform: Callable = None,
+        return_tensors: Optional[Union[str, TensorType]] = TensorType.PYTORCH,
         **kwargs,
-    ) -> TensorType.PYTORCH:
+    ) -> TensorType:
         """
         Preprocess a batch of images.
 
@@ -162,7 +163,6 @@ def preprocess(
         images = [self.rescale(image=image, scale=1 / 255) for image in images]
         images = [self.normalize(x, mean=image_mean, std=image_std) for x in images]
         images = [to_channel_dimension_format(x, ChannelDimension.FIRST) for x in images]
-        # TODO: this converts to torch tensors - switch to convert_to_tensors once it becomes available
-        images = BatchFeature(data={"pixel_values": images}, tensor_type=TensorType.PYTORCH)["pixel_values"]
+        images = BatchFeature(data={"pixel_values": images}, tensor_type=return_tensors)["pixel_values"]
 
         return images