Skip to content
This repository has been archived by the owner on Jul 1, 2024. It is now read-only.

implement sparse categorical crossentropy, enable unitests #145

Merged
merged 10 commits into from
Jul 29, 2018
Merged

implement sparse categorical crossentropy, enable unitests #145

merged 10 commits into from
Jul 29, 2018

Conversation

roywei
Copy link

@roywei roywei commented Jul 25, 2018

This PR completes the implementation of sparse categorical crossentropy

  1. Implement sparse categorical crossentropy

  2. Fix Bug in K.equal operator #146 found when fixing sparse categorical accuracy, fixed all element wise operators
    Now K.equal() can handle two variables with different shapes using broadcast operations

  3. Enabled unit tests:

  • test_training: test_model_with_crossentropy_losses_channels_first
  • losses_test.py: test_sparse_categorical_crossentropy
  • metrics_test.py: test_sparse_metrics: sparse_categorical_crossentropy
  • losses_test.py: test_cce_one_hot
  • losses_test.py: test_sparse_categorical_crossentropy_4d
  • metrics_test.py: test_sparse_metrics: sparse_categorical_accuracy
  1. Enabled/Tested end to end examples:
  • examples: mnist_acgan.py
  1. Performance tests
  • Performance comparison to categorical_crossentropy
    Sparse categorical crossentropy is faster than categorical crossentropy when dealing with large number of exclusive labels. X1.78 faster with 1000 classes and 10000 samples. See detailed results in comment.

@roywei roywei mentioned this pull request Jul 27, 2018
@sandeep-krishnamurthy
Copy link

Super cool work @roywei

@@ -1614,11 +1614,14 @@ def equal(x, y):
if isinstance(y, KerasSymbol):
y = y.symbol
scalar = True
if scalar:
if scalar and x.infer_shape() == y.infer_shape():

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for this if..else. broadcast_equal will work for both same shape and shapes that require broadcasting.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for all other operators below.

@@ -2902,7 +2927,29 @@ def sparse_categorical_crossentropy(target, output, from_logits=False, axis=-1):
# Returns
Output tensor.
"""
raise NotImplementedError('MXNet Backend: Sparse operations are not supported yet.')
output_dimensions = list(range(len(int_shape(output))))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(int_shape(output)) => ndim(output)?

if isinstance(x, mx.sym.Symbol) and isinstance(y, mx.sym.Symbol):
# use broadcasting to do element-wise comparison if both x and y are mxnet symbol

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually it is best practice not to have multiple returns in the same function. (hard to maintain and debug). Can you restructure? Same in other places.

@roywei roywei changed the title [WIP]implement sparse categorical crossentropy, enable unitests implement sparse categorical crossentropy, enable unitests Jul 27, 2018
@roywei
Copy link
Author

roywei commented Jul 27, 2018

@sandeep-krishnamurthy @kalyc
PR is ready for review, addressed comments, did performance benchmark using the following script:
sparse categorical is X1.78 faster

import time

import numpy as np
from keras import backend as K
from keras import losses

num_classes = 1000
num_samples = 10000
pred = np.random.dirichlet(np.ones(num_classes),size=num_samples)
true = np.random.randint(0, 10, (num_samples,))
y_pred = K.variable(pred)
y_true = K.variable(true)
start = time.time()
loss = K.eval(losses.sparse_categorical_crossentropy(y_true, y_pred))
end = time.time()
print("sparse_categorical_crossentropy result:", np.mean(loss))
print("sparse_categorical_crossentropy time:", end - start)


start = time.time()
y_true = K.one_hot(y_true, num_classes)
loss = K.eval(losses.categorical_crossentropy(y_true, y_pred))
end = time.time()
print("categorical_crossentropy result:", np.mean(loss))
print("categorical_crossentropy time:", end - start)

output

sparse_categorical_crossentropy result: 7.4769287
sparse_categorical_crossentropy time: 0.14634299278259277
categorical_crossentropy result: 7.4769287
categorical_crossentropy time: 0.2528080940246582

@roywei roywei requested a review from kalyc July 27, 2018 21:15
Copy link

@kalyc kalyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contributions, added comments inline

try:
out = np.greater(x, y)
except:
raise TypeError('MXNet Backend: The inputs are not valid for not_equal operation.')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modify error message to reflect operator name - 'The inputs are not valid for greater operation'

try:
out = np.less(x, y)
except:
raise TypeError('MXNet Backend: The inputs are not valid for not_equal operation.')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inputs are not valid for less operation

try:
out = np.less(x, y)
except:
raise TypeError('MXNet Backend: The inputs are not valid for not_equal operation.')
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

less_equal operation*

Copy link

@kalyc kalyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added few more comments

out = KerasSymbol(mx.sym.Cast(mx.sym.broadcast_lesser_equal(lhs=x, rhs=y), dtype='uint8'))
elif scalar:
# directly use '<=' operator for element-wise comparison
out = KerasSymbol(mx.sym.Cast(x <= y, dtype='uint8'))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] check style

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[adjust spaces around '=']

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kalyc usually there is no space for optional params values, refer to any other operators.

# directly use '<' operator for element-wise comparison
out = KerasSymbol(mx.sym.Cast(x < y, dtype='uint8'))
else:
# use numpy if x and x are all numbers or numpy arrays
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if x & y*

else:
raise TypeError('MXNet Backend: The inputs are not valid for greater_equal operation.')
# use numpy if x and x are all numbers or numpy arrays
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if x & y*

else:
raise TypeError('MXNet Backend: The inputs are not valid for greater operation.')
# use numpy if x and x are all numbers or numpy arrays
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if x and y*

else:
raise TypeError('MXNet Backend: The inputs are not valid for equal operation.')
# use numpy if x and x are all numbers or numpy arrays
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if x and y*

else:
raise TypeError('MXNet Backend: The inputs are not valid for not_equal operation.')
# use numpy if x and x are all numbers or numpy arrays
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if x and y*

if isinstance(x, mx.sym.Symbol) and isinstance(y, mx.sym.Symbol):
return KerasSymbol(mx.sym.Cast(mx.sym.broadcast_greater(lhs=x, rhs=y), dtype='uint8'))
# use broadcasting to do element-wise comparison if both x and y are mxnet symbol
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please follow docstring conventions as mentioned here - https://www.python.org/dev/peps/pep-0257/#one-line-docstrings

Also the comments are not necessary per operator, you can consider adding this blurb about implementation of operators at the beginning of the file depending on data type

'which has {} dimensions.'.format(len(int_shape(output)))))

mx_output = output.symbol
# scale predictions so that the class probas of each sample sum to 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above about writing the docstring - https://www.python.org/dev/peps/pep-0257/#one-line-docstrings

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kalyc compare to element-wise comparison operators (e.g. K.equal), sparse_categorical_crossentropy and categorical_crossentropy are more complicated, and it requires different backend to do different processing logic. It's important to place the inline comments. Placing these comments in doc string will confuse users. See tensorflow_backend.py for reference.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!
[minor] change probas to probabilities

mx_output = mx.sym.clip(mx_output, a_min=epsilon(), a_max=1.0 - epsilon())
# For this operation, the probability of a given label is considered exclusive.
mx_output = mx.sym.pick(mx_output, target.symbol, axis=axis, keepdims=True)
mx_output = - mx.sym.log(mx_output, axis=axis)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do? Why are you making this value negative? Why not use mx.sym here instead of -?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

@kalyc kalyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not commit the benchmarking script in this dir? - https://github.com/awslabs/keras-apache-mxnet/tree/master/benchmark/scripts

out = KerasSymbol(mx.sym.Cast(mx.sym.broadcast_lesser_equal(lhs=x, rhs=y), dtype='uint8'))
elif scalar:
# directly use '<=' operator for element-wise comparison
out = KerasSymbol(mx.sym.Cast(x <= y, dtype='uint8'))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[adjust spaces around '=']

'which has {} dimensions.'.format(len(int_shape(output)))))

mx_output = output.symbol
# scale predictions so that the class probas of each sample sum to 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!
[minor] change probas to probabilities

mx_output = mx.sym.broadcast_div(mx_output, mx.sym.sum(mx_output,
axis=axis,
keepdims=True))
# clip to prevent NaN's and Inf's
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'infinities'

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'nan' and 'inf' are python values

@roywei
Copy link
Author

roywei commented Jul 27, 2018

@kalyc addressed comments.
The performance scripts is just a simple one to test the operator, it's a sanity check.
Only complete and useful models should be added to benchmark scripts.

Copy link

@kalyc kalyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sandeep-krishnamurthy sandeep-krishnamurthy merged commit 10e7087 into awslabs:dev2 Jul 29, 2018
roywei added a commit that referenced this pull request Aug 8, 2018
* Add saving mxnet model checkpoint callback (#132)

* Add saving mxnet model checkpoint callback

* Add tests for MXNetModelCheckpoint callback

* Fix MXNetModelCheckpoint test case

* Fix MXNetModelCheckpoint tests. Add dependency on keras_applications and keras_preprocessing

* Fixed CR comments. Split tests into multiple independent tests

* Fix CR comments on the code documentation

* Add additional test to verify only one model is saved

* Add examples of monitors

* update pr and nightly buildspec,add into source control (#141)

* Fix batchnorm gamma (#137)

* fix gamma and beta equal to None

* fix style

* fix initializer, enable unit test

* update comments

* remove +, remove repeated install, add clear message (#142)

* fix conv1d channels first (#143)

* fix conv1d channels first

* update data format for causal test

* fix style

* Adding get_mxnet_model_info API to allow users to query underlying MXNet model info (#144)

* Adding get_mxnet_model_info API to allow users to query underlying MXNet model info

* resolve merge conflicts in conv1d

* Add more tests - functional model, compare with return values of save_mxnet_model API

* update save mxnet model API document. (#147)

Co-authored-by: Sandeep Krishnamurthy <[email protected]>

* implement sparse categorical crossentropy, enable unitests (#145)

* implement sparse categorical crossentropy, enable unitests

* fix elementwise operators, fix sparse categorical accuract, enabled unitests

* fix element wise opreators

* simplify using ndim

* fix style

* reduce number of returns

* fix operator name in error message

* update comments and doc string

* update comment spelling

* update documentation, improve CI test commands (#151)

* update documentation, improve CI test commands

* fix conv1d initialization, fix conv1d unit test

* fix documentation hyperlink

* update buildspec

* remove official keras installed with dependencies

* update ci commands
sandeep-krishnamurthy pushed a commit that referenced this pull request Aug 14, 2018
* Add saving mxnet model checkpoint callback (#132)

* Add saving mxnet model checkpoint callback

* Add tests for MXNetModelCheckpoint callback

* Fix MXNetModelCheckpoint test case

* Fix MXNetModelCheckpoint tests. Add dependency on keras_applications and keras_preprocessing

* Fixed CR comments. Split tests into multiple independent tests

* Fix CR comments on the code documentation

* Add additional test to verify only one model is saved

* Add examples of monitors

* update pr and nightly buildspec,add into source control (#141)

* Fix batchnorm gamma (#137)

* fix gamma and beta equal to None

* fix style

* fix initializer, enable unit test

* update comments

* remove +, remove repeated install, add clear message (#142)

* fix conv1d channels first (#143)

* fix conv1d channels first

* update data format for causal test

* fix style

* Adding get_mxnet_model_info API to allow users to query underlying MXNet model info (#144)

* Adding get_mxnet_model_info API to allow users to query underlying MXNet model info

* resolve merge conflicts in conv1d

* Add more tests - functional model, compare with return values of save_mxnet_model API

* update save mxnet model API document. (#147)

Co-authored-by: Sandeep Krishnamurthy <[email protected]>

* implement sparse categorical crossentropy, enable unitests (#145)

* implement sparse categorical crossentropy, enable unitests

* fix elementwise operators, fix sparse categorical accuract, enabled unitests

* fix element wise opreators

* simplify using ndim

* fix style

* reduce number of returns

* fix operator name in error message

* update comments and doc string

* update comment spelling

* update documentation, improve CI test commands (#151)

* update documentation, improve CI test commands

* fix conv1d initialization, fix conv1d unit test

* fix documentation hyperlink

* update buildspec

* remove official keras installed with dependencies

* update ci commands
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants