Train_DataLoader's sampler changed from RandomSampler -> SequentialSampler #25862

dumpmemory · 2023-08-30T11:01:21Z

System Info

transformers version: 4.32.1
Platform: Linux-5.4.119-19.0009.28-x86_64-with-glibc2.35
Python version: 3.10.6
Huggingface_hub version: 0.16.4
Safetensors version: 0.3.3
Accelerate version: 0.22.0
Accelerate config: not found
PyTorch version (GPU?): 2.1.0a0+b5021ba (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

I found after

transformers/src/transformers/trainer.py

Line 1569 in ccb92be

train_dataloader = self.get_train_dataloader()

the output of

 logger.info(
            f"{type(train_dataloader)}, {type(train_dataloader.sampler)},{type(train_dataloader.batch_sampler)}")

is

<class 'accelerate.data_loader.DataLoaderShard'>, <class 'torch.utils.data.sampler.SequentialSampler'>,<class 'accelerate.data_loader.BatchSamplerShard'>

The train_dataloader dataargs is

{'batch_size': 4, 'collate_fn': <function default_data_collator at 0x7f404cf33520>, 'num_workers': 0, 'pin_memory': True, 'sampler': <torch.utils.data.sampler.RandomSampler object at 0x7f404cbd26e0>, 'drop_last': False, 'worker_init_fn': <function seed_worker at 0x7f4061da8820>}

why sample changed from RandomSampler -> SequentialSampler

Expected behavior

The sampler should be same

The text was updated successfully, but these errors were encountered:

amyeroberts · 2023-08-30T11:06:30Z

cc @muellerzr @pacman100

dumpmemory · 2023-08-30T11:37:48Z

it might be relate to https://github.com/huggingface/accelerate/blob/69e4c3c54da3201eda288b500d138761e7a5221c/src/accelerate/data_loader.py#L709

I am checking train_dataloader.batch_sampler.batch_sampler

dumpmemory · 2023-08-30T11:53:18Z

train_dataloader.batch_sampler.batch_sampler.sampler is torch.utils.data.sampler.RandomSampler

dumpmemory closed this as completed Aug 30, 2023

This was referenced Aug 30, 2023

Training Loss inconsistent after resume from old checkpoint #25340

Closed

Try to fix training Loss inconsistent after resume from old checkpoint #25872

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train_DataLoader's sampler changed from RandomSampler -> SequentialSampler #25862

Train_DataLoader's sampler changed from RandomSampler -> SequentialSampler #25862

dumpmemory commented Aug 30, 2023

amyeroberts commented Aug 30, 2023

dumpmemory commented Aug 30, 2023

dumpmemory commented Aug 30, 2023

Train_DataLoader's sampler changed from RandomSampler -> SequentialSampler #25862

Train_DataLoader's sampler changed from RandomSampler -> SequentialSampler #25862

Comments

dumpmemory commented Aug 30, 2023

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

amyeroberts commented Aug 30, 2023

dumpmemory commented Aug 30, 2023

dumpmemory commented Aug 30, 2023