support using cached data and re-splitting for huggingface datasets #302

yxdyc · 2022-08-08T12:16:12Z

support

using cached data for huggingface datasets;
re-splitting the GLUE versions into custom FL versions

…the GLUE versions into custom FL versions

rayrayraykk

Good job, we could accelerate to load the huggingface data file with the cache.

rayrayraykk · 2022-08-08T12:30:36Z

federatedscope/core/auxiliaries/data_builder.py

-            tokenizer = AutoTokenizer.from_pretrained(
-                config.model.type.split('@')[0])
+
+            try:


Why need a try here?

In case of the cached file not existed

rayrayraykk · 2022-08-08T12:31:04Z

federatedscope/core/auxiliaries/data_builder.py

@@ -402,6 +413,7 @@ def load_torch_geometric_data(name, splits=None, config=None):

    def load_huggingface_datasets_data(name, splits=None, config=None):
        from datasets import load_dataset
+        from datasets import load_from_disk


Merge Line 415 and Line 416.

federatedscope/core/auxiliaries/data_builder.py

…nhance

rayrayraykk

LGTM

…libaba#302)

yxdyc added 2 commits August 8, 2022 20:11

support using cached data for huggingface datasets; and re-splitting …

0df2b4e

…the GLUE versions into custom FL versions

support using cached data for huggingface datasets; and re-splitting …

2acbeb4

…the GLUE versions into custom FL versions

yxdyc added the enhancement New feature or request label Aug 8, 2022

yxdyc requested review from rayrayraykk and xieyxclack August 8, 2022 12:22

rayrayraykk reviewed Aug 8, 2022

View reviewed changes

yxdyc added 3 commits August 10, 2022 16:47

Merge remote-tracking branch 'upstream/master' into Feature/dataset_e…

f796bd8

…nhance

minor fix according to weirui's comments

795da26

minor fix for unittest

d2056dd

rayrayraykk approved these changes Aug 22, 2022

View reviewed changes

rayrayraykk merged commit da04d7d into alibaba:master Aug 22, 2022

Schichael pushed a commit to Schichael/FederatedScope_thesis that referenced this pull request Sep 7, 2022

support using cached data and re-splitting for huggingface datasets (a…

12e28e9

…libaba#302)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support using cached data and re-splitting for huggingface datasets #302

support using cached data and re-splitting for huggingface datasets #302

yxdyc commented Aug 8, 2022

rayrayraykk left a comment

rayrayraykk Aug 8, 2022

yxdyc Aug 10, 2022

rayrayraykk Aug 8, 2022

rayrayraykk left a comment

support using cached data and re-splitting for huggingface datasets #302

support using cached data and re-splitting for huggingface datasets #302

Conversation

yxdyc commented Aug 8, 2022

rayrayraykk left a comment

Choose a reason for hiding this comment

rayrayraykk Aug 8, 2022

Choose a reason for hiding this comment

yxdyc Aug 10, 2022

Choose a reason for hiding this comment

rayrayraykk Aug 8, 2022

Choose a reason for hiding this comment

rayrayraykk left a comment

Choose a reason for hiding this comment