Features/584 data parallel #660

coquelin77 · 2020-08-31T14:06:16Z

Description

Implementation of Data Parallel Neural Networks. Documentation about how to use in still being written. This is still not a final iteration, there are many things which will be tuned in the background.

Issue/s resolved: #584 #585 #603 #604 #605 #606

Changes proposed:

Implement DataParallel class
Implement DataLoader class
Implement Dataset class (template)
call through to torch.nn modules and torch.nn.functional routines
Added MNIST example as a working data parallel NN to examples/nn/mnist
MNISTDataset added for loading data and preparing for the network architecture
Image-net example added
Partial data loading from an HDF5 file is made available with an iterator and dataset class.

Type of change

New feature (non-breaking change which adds functionality)
Documentation update

Due Diligence

~~All split configurations tested~~
Multiple dtypes tested in relevant functions
Documentation updated (if needed)
Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

Notes

There will be backend changes for efficiency coming in the future. however, this feature is becoming rather large and a review/merge would make it more manageable in the future

changelog still to be fixed!

…in docstring

…ltz-analytics/heat into features/584-DataParallel

…er to reversed computation order

…ltz-analytics/heat into features/584-DataParallel

coquelin77 · 2020-12-11T12:54:50Z

This branch is updated and the tests run clean. there is no way to increase the coverage for this branch without overhauling how coverage is reported.

It is also important to note that since this development, there have been many improvements to the data parallel neural networks which builds on these addition. If there are questions / comments for this PR I will give it until the end of the year, then it will be merged into the master. I apologize for the length of this PR as well. The length was unavoidable for this feature

…ltz-analytics/heat into features/584-DataParallel Conflicts: heat/nn/__init__.py heat/nn/tests/test_data_parallel.py heat/optim/__init__.py

coquelin77 · 2021-02-25T15:23:46Z

rerun tests

coquelin77 · 2021-02-25T15:38:07Z

rerun tests

coquelin77 · 2021-02-26T13:54:13Z

this is going to be forced in. Other changes are waiting and can be made on the fly

coquelin77 and others added 30 commits June 16, 2020 13:51

updated init to include nn

6a2439a

added script for converting tfrecords to one dataset. details coming …

7ccac3b

…in docstring

Merge branch 'master' into features/584-DataParallel

25e571c

updated tfrecord loader, more changes coming from test system later

4e73b33

finalized code in merge_files_imagenet_tfrecord

be449e1

doc update

6dec389

added data_parallel files in nn, and other utilities

f77364c

work on changing over torch imagenet to heat network

3165c66

Merge branch 'master' into features/584-DataParallel

f7049a9

added getattr function for call to torch

4cf033f

Merge branch 'master' into features/584-DataParallel

ed79be0

Merge branch 'master' into features/584-DataParallel

b48a09d

added callthough for torch modules

1e9d6c3

removed dpnn_modules file

c213b35

Updated hooks, added counterbalancing of local gradient averaging

d197871

Merge branch 'features/584-DataParallel' of https://github.com/helmho…

1f114d7

…ltz-analytics/heat into features/584-DataParallel

Merge branch 'features/584-DataParallel' of https://github.com/helmho…

15b56b8

…ltz-analytics/heat into features/584-DataParallel

Removed unnecessary heat tensor wrapping

0fdc147

Merge branch 'features/584-DataParallel' of https://github.com/helmho…

fde1393

…ltz-analytics/heat into features/584-DataParallel

Changed order of layers for the wait handles dict from definition ord…

780c49b

…er to reversed computation order

adding to data tools for DataLoader wrapper

e2973f5

Merge branch 'features/584-DataParallel' of https://github.com/helmho…

94fccab

…ltz-analytics/heat into features/584-DataParallel

added (untested) DataLoader, and callthough for torchvision.transforms

c4776e5

dataset is working in simple cases

da0c691

dataloader + average changing

966d6b8

updates to dataloader for unbalanced arrays

6910fed

adding mnist example

ce93e4d

adding mnist example

fb026b3

mnist updates

6156c38

Merge branch 'master' into features/584-DataParallel

9d6f2b7

coquelin77 added 6 commits December 11, 2020 12:15

type correction for test device in heat.nn.tests.data_parallel

44b7400

type correction for test device in heat.nn.tests.data_parallel

cc79e74

added device param to hook in DP

5fa23b0

changed all to allcose in dp tests for gpu completion

17d3174

modified diff settings in allclose in dp tests

427b838

Merge branch 'master' into features/584-DataParallel

c7ea000

coquelin77 added 8 commits February 19, 2021 09:33

Merge branch 'features/584-DataParallel' of https://github.com/helmho…

95e635b

…ltz-analytics/heat into features/584-DataParallel Conflicts: heat/nn/__init__.py heat/nn/tests/test_data_parallel.py heat/optim/__init__.py

Merge branch 'master' into features/584-DataParallel

00e23a8

removed commented and fixed init files

2bda81f

removed typo from optim.__init__.py

696b84e

rework of partial h5 dataset tests and removal of warning import

b301d15

unit test debug

a18c2cc

updated data parallel NN tests

666bd53

hooray for init bugs in python 3.6!

803e297

coquelin77 added 4 commits February 26, 2021 13:08

added call-through tests for lr schedulers

b22604a

test updates

5a490ea

updated dataloader + tests

29404c5

changelog update

e19138e

coquelin77 merged commit 38ef4a9 into master Feb 26, 2021

This was referenced Mar 2, 2021

DPNN: non-blocking shuffle #626

Closed

DPNN: dataloader pt2 shuffle #606

Closed

DPNN: dataloader #605

Closed

DPNN: comm hooks for backwards during the step #604

Closed

DPNN: __getattribute__ pytorch forward #603

Closed

coquelin77 mentioned this pull request Mar 17, 2021

Features/584 data parallel pr1 #625

Closed

4 tasks

mtar deleted the features/584-DataParallel branch February 28, 2024 09:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Features/584 data parallel #660

Features/584 data parallel #660

coquelin77 commented Aug 31, 2020 •

edited

Loading

coquelin77 commented Dec 11, 2020

coquelin77 commented Feb 25, 2021

coquelin77 commented Feb 25, 2021

coquelin77 commented Feb 26, 2021

Features/584 data parallel #660

Features/584 data parallel #660

Conversation

coquelin77 commented Aug 31, 2020 • edited Loading

Description

Changes proposed:

Type of change

Due Diligence

Does this change modify the behaviour of other functions? If so, which?

Notes

coquelin77 commented Dec 11, 2020

coquelin77 commented Feb 25, 2021

coquelin77 commented Feb 25, 2021

coquelin77 commented Feb 26, 2021

coquelin77 commented Aug 31, 2020 •

edited

Loading