Features/361 pad #572

lenablind · 2020-05-25T11:14:19Z

Description

Implementation of function pad for mode "constant".
Syntax is nearly the same as for numpy, whereas I use torch.nn.functional.pad internally.

Syntactical differences

Although numpy uses different values keywords for the corresponding mode types (more specifically constant_values, end_values), I decided to use simply one (values) for ease of usage, as only one mode can be used simultaneously either way.

Also, what lacks my implementation are two numpy keywords as the corresponding modes are currently not available in this version:

‘stat_length’ used in numpy modes ‘maximum’, ‘mean’, ‘median’, ‘minimum’
‘reflect_type’ used in numpy modes ‘reflect’ and ‘symmetric’.

Strategy

Hint: Torch allows only one padding value to be specified for all dimensions, whereas numpy offers the possibility to define one in each case. Therefore, to simulate numpy functionality but keep the performance of torch, I decided to call torch for each value in specified in values.

Preparation

handle different types of numpy shortcuts for pad_width and transform it into one torch pad tuple (-> shortcuts: see numpy docs)
handle different types of numpy shortcuts for values and transform it into one tuple if various values are included (value _ tuple (-> shortcuts: see numpy docs)
calculate the gshape of the resulting DNDarray

Actual Padding

CASE 0 : input tensor contains no data

Return the empty tensor with the adapted lshape (necessary for remapping in case of distribution and general consistency)

CASE 1 : Padding in non-split dimension or no distribution at all

If only one value is specified for all dimensions, pad the tensor with torch as usual, otherwise:
iterate through value _ tuple in reverse order (as numpy starts padding with the last dimension in contrary to torch) and call the torch pad version using the corresponding value , pad_tuple and the more and more padded tensor
In other words, you pad each dimension with the specified value in the value _ tuple.
This is necessary to provide numpy functionality ( -> Hint above)

CASE 2 : Padding in split dimension and function runs on more than 1 process

Pad only first/last tensor portion on node (i.e. only beginning/end in split dimension)
"Calculate" the pad _ tuple for the corresponding tensor portion, respectfully the two indices which have to be set to zero in the original/undistributed pad _ tuple depending on the dimension:
Therefore: Calculate the index of the first element in pad tuple that has to change/be set to zero (the following is the second)
The pad tuples can hereby be divided in three categories:
- pad _ beginning (first process)
- pad _ end (last process)
- pad _ middle (all other processes)
  This is only a mathematical transcription for the manner in which the tensor chunk has to be padded.
Balance the tensor and return it

Docs numpy: https://numpy.org/devdocs/reference/generated/numpy.pad.html
Docs pytorch: https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.pad

Issue/s resolved: #361

Changes proposed:

Additional modes (pytorch offers here a lot less than numpy by itself)

Overview modes (and their differences in numpy and torch)

Aequivalent modes numpy and torch

These might be implemented most easily, though there are some restrictions.
More specifically, only 3D, 4D and 5D padding with non-constant padding are currently supported by torch. Additionally, some scalability issues might occur for these modes.
To make it clear, padding a 9 element long-tensor with 'reflect' might already result in a RuntimeError.

Numpy	Torch	Description	Available dimensions (Torch)
constant	constant	Pads the input tensor boundaries with a constant value	Arbitrary
reflect	reflect	Pads the input tensor using the reflection of the input boundary	last 2 of 4D, last of 3D
edge	replicate	Pads the input tensor using the replication of the input boundary	last 3 of 5D, last 2 of 4D
wrap	circular	Pads with the wrap of the vector along the axis. The first values are used to pad the end and the end values are used to pad the beginning.

Numpy modes which might result in constant padding with calculated padding values

Mode	Pads with the...
linear ramp	…linear ramp between end_value and the array edge value
maximum	… maximum value of all or part of the vector along each axis
mean	… mean value of all or part of the vector along each axis
median	… median value of all or part of the vector along each axis
minimum	… minimum value of all or part of the vector along each axis

the referred 'part of the vector' might be furthermore specified with the corresponding values keyword.
(-> see numpy docs.)

Type of change

New feature (non-breaking change which adds functionality)

Due Diligence

All split configurations tested
Multiple dtypes tested in relevant functions
Documentation updated (if needed)
Updated changelog.md under the title "Pending Additions"

Does this change modify the behaviour of other functions? If so, which?

no

…tics/heat into features/361-pad

…eless still not working.

…ng_back)

…ensor dimension)

…tics/heat into features/361-pad

…es in test

…syntax.

ClaudiaComito

Good job @lenablind , needs a few more changes!

CHANGELOG.md

heat/core/dndarray.py

heat/core/manipulations.py

mtar · 2020-08-31T08:37:01Z

GPU cluster tests are currently disabled on this Pull Request.

lenablind · 2020-08-31T09:06:57Z

GPU cluster tests are currently disabled on this Pull Request.

@mtar Thank you for letting me know. Is there a reason for that or to put it differently, are these needed for this PR and if that is the case, could you explain to me why?

mtar · 2020-08-31T09:22:26Z

The CI system that I was setting up recently has a life of its own 😃
It will be important in the future.

mtar · 2020-09-21T14:34:35Z

ok to test

mtar · 2020-09-22T08:15:08Z

rerun tests

…t into features/361-pad

lenablind and others added 30 commits January 9, 2020 13:45

BROKEN. First version of pad

4a777b9

Merge branch 'master' into features/361-pad

0951faa

Pad with split=None. Moved function to manipulations.py

9eeece8

Merge branch 'master' into features/361-pad

a36c117

Merge branch 'master' into features/361-pad

9e7065d

Merge branch 'master' into features/361-pad

79fece6

Updated documentation & implemented checks for parameter validity

ebd967f

Corrected pad dimensions

427f054

Added implementation for distributed input

cb527f1

minor formatting changes

f68aa13

Added strategy draft for padding in split dimension

5774f45

Merge branch 'master' into features/361-pad

29b6933

Merge branch 'features/361-pad' of https://github.com/helmholtz-analy…

c84d9a6

…tics/heat into features/361-pad

Calculation of padding tuples for padding in split dimension. Neverth…

221c809

…eless still not working.

Adjusted datatypes. Missing: Number of process

5eb6c21

Integrated correct rank test

5a8eddb

Merge branch 'master' into features/361-pad

893a608

Updated documentation. Found bug for padding last 3 dimensions (paddi…

34b85b3

…ng_back)

Fixed pad for possibly distributed tensors running on 1 process

0637b3c

Pad for tensors having more than 3 dimensions

94b1c0a

Implemented formula for calculation of padding tuple (regardless of t…

bda8339

…ensor dimension)

First draft of test_pad(). Basic local test.

944c674

Merge branch 'features/361-pad' of https://github.com/helmholtz-analy…

9e140d1

…tics/heat into features/361-pad

Fixed test to account for inverted input order in torch

61aca06

Restructured code to adapt numpy signature

f752e6c

Merge branch 'features/361-pad' of https://github.com/helmholtz-analy…

c57355f

…tics/heat into features/361-pad

Merge branch 'master' into features/361-pad

7f94d41

Fixed function call for split attribute None & adapted comparison typ…

4f4738c

…es in test

Added type restrictions & shortcuts for pad_width

cb05082

Adjusted error messages. Padding only one dimension implementedin np …

161ecc3

…syntax.

lenablind and others added 3 commits June 24, 2020 15:59

Code formatting via black

5432d85

Merge branch 'master' into features/361-pad

654f4f7

Merge branch 'master' into features/361-pad

2654d39

ClaudiaComito requested changes Jul 16, 2020

View reviewed changes

lenablind and others added 4 commits July 17, 2020 16:26

Removed pad in dndarray.py and adapted code to PR reviews

00d4467

Changed dtype from dndarray.DNDarray to factories.array

4a063da

Merge branch 'master' into features/361-pad

47227c0

Merge branch 'master' into features/361-pad

0870564

ClaudiaComito and others added 4 commits September 21, 2020 10:45

Merge branch 'master' into features/361-pad

f1abe3e

Additional tests (larger input, shortcuts distributed case)

ad3a425

Additional tests constant_values

964767a

Merge branch 'master' into features/361-pad

d3eb73b

ClaudiaComito added 4 commits September 22, 2020 11:15

Merge branch 'master' into features/361-pad

62c8e79

Merge branch 'features/361-pad' of github.com:helmholtz-analytics/hea…

f7b0daf

…t into features/361-pad

Merge branch 'master' into features/361-pad

4442a80

Update changelog

6ffc5bd

ClaudiaComito previously approved these changes Sep 22, 2020

View reviewed changes

Fix double entries in changelog

3837d17

ClaudiaComito dismissed their stale review via 3837d17 September 22, 2020 09:56

coquelin77 approved these changes Sep 22, 2020

View reviewed changes

coquelin77 merged commit a9efe03 into master Sep 22, 2020

coquelin77 deleted the features/361-pad branch September 22, 2020 12:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Features/361 pad #572

Features/361 pad #572

lenablind commented May 25, 2020 •

edited

Loading

ClaudiaComito left a comment

mtar commented Aug 31, 2020

lenablind commented Aug 31, 2020

mtar commented Aug 31, 2020

mtar commented Sep 21, 2020

mtar commented Sep 22, 2020

Features/361 pad #572

Features/361 pad #572

Conversation

lenablind commented May 25, 2020 • edited Loading

Description

Syntactical differences

Strategy

Preparation

Actual Padding

Changes proposed:

Overview modes (and their differences in numpy and torch)

Aequivalent modes numpy and torch

Numpy modes which might result in constant padding with calculated padding values

Type of change

Due Diligence

Does this change modify the behaviour of other functions? If so, which?

ClaudiaComito left a comment

Choose a reason for hiding this comment

mtar commented Aug 31, 2020

lenablind commented Aug 31, 2020

mtar commented Aug 31, 2020

mtar commented Sep 21, 2020

mtar commented Sep 22, 2020

lenablind commented May 25, 2020 •

edited

Loading