Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CV collection: image classification #654

Merged
merged 34 commits into from
Jun 3, 2020
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
44077f9
CV collection init: MNIST image classification
tkornuta-nvidia May 22, 2020
519b663
Ported FFN and TensorReshaper, MNIST classification working on CPU
tkornuta-nvidia May 22, 2020
56fc0b3
reformatted code
tkornuta-nvidia May 22, 2020
6f99dcb
convnet encoder ported, example working... but showing that CNNs are …
tkornuta-nvidia May 22, 2020
9f037f6
format fix
tkornuta-nvidia May 22, 2020
f244a30
Trainable NM fix - removing no_grad()
tkornuta-nvidia May 22, 2020
4fdaa03
CIFAR10 working
tkornuta-nvidia May 22, 2020
8d051ca
Made the types of FFN and ReshapeTensor more
tkornuta-nvidia May 23, 2020
364aef4
formatting fix
tkornuta-nvidia May 23, 2020
eee5915
LGTM unused import fixes
tkornuta-nvidia May 23, 2020
cf237a4
LGTM fixes: unused variable in the loop
tkornuta-nvidia May 23, 2020
b165288
GenericImageEncoder ported + CIFAR10 VGG16 classification example
tkornuta-nvidia May 23, 2020
a58d3dd
Merge branch 'master' of github.com:NVIDIA/NeMo into dev-cv-image-cla…
tkornuta-nvidia May 23, 2020
f0331a9
Added NonLinearity component, simplified the FFN, cifar10 - ResNet50 …
tkornuta-nvidia May 23, 2020
7ab19d7
LGTM fixes
tkornuta-nvidia May 23, 2020
37ed483
Stronger typing in CV modules and examples, introduced several new El…
tkornuta-nvidia May 28, 2020
050b424
formatting
tkornuta-nvidia May 28, 2020
e43d428
Merge branch 'master' of github.com:NVIDIA/NeMo into dev-cv-image-cla…
tkornuta-nvidia May 28, 2020
5566638
updated requirements, docs, setup, added information about CV collect…
tkornuta-nvidia May 28, 2020
af20a4f
updated description in changelog
tkornuta-nvidia May 28, 2020
516642d
minor comment polish
tkornuta-nvidia May 28, 2020
ee6be29
rst fix
tkornuta-nvidia May 28, 2020
be0c6ea
minor nemo typing fix - imagetype
tkornuta-nvidia May 28, 2020
44cfdea
Merge branch 'master' of github.com:NVIDIA/NeMo into dev-cv-image-cla…
tkornuta-nvidia Jun 2, 2020
ee6ec4a
polished datalayers, added CIFAR100, added Index and Label types, pol…
tkornuta-nvidia Jun 2, 2020
3730537
formatting fix
tkornuta-nvidia Jun 2, 2020
456c6d3
GenericImageEncoder -> ImageEncoder, updated readme file
tkornuta-nvidia Jun 2, 2020
d3ab6ff
changed assert to get_value_from_dict
tkornuta-nvidia Jun 2, 2020
2160027
formatting fix
tkornuta-nvidia Jun 2, 2020
8c3a9fd
added python 3 typing to all inits, fixed LGTM issue, formatted
tkornuta-nvidia Jun 2, 2020
2cd0ac5
Updated docstrings
tkornuta-nvidia Jun 2, 2020
bba75f9
raise ConfigurationError
tkornuta-nvidia Jun 2, 2020
6613d40
reshape tensor docstring update
tkornuta-nvidia Jun 2, 2020
eee4c1e
Label -> StringLabel, description of ImageEncoder
tkornuta-nvidia Jun 3, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 2 additions & 7 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ To release a new version, please update the changelog as followed:
- ContextNet Encoder + Decoder Initial Support ([PR #630](https://github.com/NVIDIA/NeMo/pull/630)) - @titu1994
- Added finetuning with Megatron-LM ([PR #601](https://github.com/NVIDIA/NeMo/pull/601)) - @ekmb
- Added documentation for 8 kHz model ([PR #632](https://github.com/NVIDIA/NeMo/pull/632)) - @jbalam-nv
- The Neural Graph is a high-level abstract concept empowering the users to build graphs consisting of many, interconnected Neural Modules. A user in his/her application can build any number of graphs, potentially spanning over the same modules. The import/export options combined with the lightweight API make Neural Graphs a perfect tool for rapid prototyping and experimentation. ([PR #413](https://github.com/NVIDIA/NeMo/pull/413)) - @tkornuta-nvidia
- Created the NeMo CV collection, added MNIST and CIFAR10 thin datalayers, implemented/ported several general usage trainable and non-trainable modules, added several new ElementTypes ([PR #654](https://github.com/NVIDIA/NeMo/pull/654)) - @tkornuta-nvidia


### Changed
Expand All @@ -94,13 +96,6 @@ To release a new version, please update the changelog as followed:

### Security

### Contributors

## [0.10.2] - 2020-05-05

### Added
- The Neural Graph is a high-level abstract concept empowering the users to build graphs consisting of many, interconnected Neural Modules. A user in his/her application can build any number of graphs, potentially spanning over the same modules. The import/export options combined with the lightweight API make Neural Graphs a perfect tool for rapid prototyping and experimentation. ([PR #413](https://github.com/NVIDIA/NeMo/pull/413)) - @tkornuta

## [0.10.0] - 2020-04-03

### Added
Expand Down
1 change: 1 addition & 0 deletions docs/sources/source/collections/modules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,6 @@ NeMo Collections API

core
nemo_asr
nemo_cv
nemo_tts
nemo_nlp
34 changes: 34 additions & 0 deletions docs/sources/source/collections/nemo_cv.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
NeMo CV collection
==================

DataLayers
----------
.. automodule:: nemo.collections.cv.modules.data_layers
:members:
:undoc-members:
:show-inheritance:
:exclude-members: forward

Trainable Modules
-----------------
.. automodule:: nemo.collections.cv.modules.trainables
:members:
:undoc-members:
:show-inheritance:
:exclude-members: forward

NonTrainable Modules
--------------------
.. automodule:: nemo.collections.cv.modules.non_trainables
:members:
:undoc-members:
:show-inheritance:
:exclude-members: forward

Losses
------
.. automodule:: nemo.collections.cv.modules.losses
:members:
:undoc-members:
:show-inheritance:
:exclude-members: forward
5 changes: 2 additions & 3 deletions nemo/backends/pytorch/nm.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,8 +156,7 @@ def __init__(self, name=None):
def __call__(self, force_pt=False, *input, **kwargs):
pt_call = len(input) > 0 or force_pt
if pt_call:
with t.no_grad():
return self.forward(*input, **kwargs)
return self.forward(*input, **kwargs)
else:
return NeuralModule.__call__(self, **kwargs)

Expand Down Expand Up @@ -305,13 +304,13 @@ def dataset(self):
pass

@property
@abstractmethod
def data_iterator(self):
""""Iterator over the dataset. It is a good idea to return
torch.utils.data.DataLoader here. Should implement either this or
`dataset`.
If this is implemented, `dataset` property should return None.
"""
return None

@property
def batch_size(self):
Expand Down
6 changes: 6 additions & 0 deletions nemo/collections/cv/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
NeMo CV Collection: Neural Modules for Computer Vision
====================================================================

The NeMo CV collection offers modules useful for the following computer vision applications:
1. Image Classification
..* classification of MNIST digits with the use of LeNet-5 (a classic hello world)
20 changes: 20 additions & 0 deletions nemo/collections/cv/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# =============================================================================
# Copyright (c) 2020 NVIDIA. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# =============================================================================

from nemo.collections.cv.modules import *

# __version__ = "0.1"
# __name__ = "nemo.collections.cv"
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# =============================================================================
# Copyright (c) 2020 NVIDIA. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# =============================================================================

import argparse

import nemo.utils.argparse as nm_argparse
from nemo.collections.cv.modules.data_layers import CIFAR10DataLayer
from nemo.collections.cv.modules.losses import NLLLoss
from nemo.collections.cv.modules.non_trainables import NonLinearity, ReshapeTensor
from nemo.collections.cv.modules.trainables import ConvNetEncoder, FeedForwardNetwork
from nemo.core import (
DeviceType,
NeuralGraph,
NeuralModuleFactory,
OperationMode,
SimpleLossLoggerCallback,
WandbCallback,
)
from nemo.utils import logging

if __name__ == "__main__":
# Create the default parser.
parser = argparse.ArgumentParser(parents=[nm_argparse.NemoArgParser()], conflict_handler='resolve')
# Parse the arguments
args = parser.parse_args()

# Instantiate Neural Factory.
nf = NeuralModuleFactory(local_rank=args.local_rank, placement=DeviceType.CPU)

# Data layer for training.
cifar10_dl = CIFAR10DataLayer(train=True)
# The "model".
cnn = ConvNetEncoder(input_depth=3, input_height=32, input_width=32)
reshaper = ReshapeTensor(input_sizes=[-1, 16, 2, 2], output_sizes=[-1, 64])
ffn = FeedForwardNetwork(input_size=64, output_size=10, dropout_rate=0.1)
nl = NonLinearity(type="logsoftmax", sizes=[-1, 10])
# Loss.
nll_loss = NLLLoss()

# Create a training graph.
with NeuralGraph(operation_mode=OperationMode.training) as training_graph:
img, tgt = cifar10_dl()
feat_map = cnn(inputs=img)
res_img = reshaper(inputs=feat_map)
logits = ffn(inputs=res_img)
pred = nl(inputs=logits)
loss = nll_loss(predictions=pred, targets=tgt)
# Set output - that output will be used for training.
training_graph.outputs["loss"] = loss

# Display the graph summmary.
logging.info(training_graph.summary())

# SimpleLossLoggerCallback will print loss values to console.
callback = SimpleLossLoggerCallback(
tensors=[loss], print_func=lambda x: logging.info(f'Training Loss: {str(x[0].item())}')
)

# Log training metrics to W&B.
wand_callback = WandbCallback(
train_tensors=[loss], wandb_name="simple-mnist-fft", wandb_project="cv-collection-image-classification",
tkornuta-nvidia marked this conversation as resolved.
Show resolved Hide resolved
)

# Invoke the "train" action.
nf.train(
training_graph=training_graph,
callbacks=[callback, wand_callback],
optimization_params={"num_epochs": 10, "lr": 0.001},
tkornuta-nvidia marked this conversation as resolved.
Show resolved Hide resolved
optimizer="adam",
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# =============================================================================
# Copyright (c) 2020 NVIDIA. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# =============================================================================

import argparse

import nemo.utils.argparse as nm_argparse
from nemo.collections.cv.modules.data_layers import CIFAR10DataLayer
from nemo.collections.cv.modules.losses import NLLLoss
from nemo.collections.cv.modules.non_trainables import NonLinearity
from nemo.collections.cv.modules.trainables import GenericImageEncoder
from nemo.core import (
DeviceType,
NeuralGraph,
NeuralModuleFactory,
OperationMode,
SimpleLossLoggerCallback,
WandbCallback,
)
from nemo.utils import logging

if __name__ == "__main__":
# Create the default parser.
parser = argparse.ArgumentParser(parents=[nm_argparse.NemoArgParser()], conflict_handler='resolve')
# Parse the arguments
args = parser.parse_args()

# Instantiate Neural Factory.
nf = NeuralModuleFactory(local_rank=args.local_rank, placement=DeviceType.CPU)

# Data layer - upscale the CIFAR10 images to ImageNet resolution.
cifar10_dl = CIFAR10DataLayer(height=224, width=224, train=True)
# The "model".
image_classifier = GenericImageEncoder(model_type="resnet50", output_size=10, pretrained=True, name="resnet50")
nl = NonLinearity(type="logsoftmax", sizes=[-1, 10])
# Loss.
nll_loss = NLLLoss()

# Create a training graph.
with NeuralGraph(operation_mode=OperationMode.training) as training_graph:
img, tgt = cifar10_dl()
logits = image_classifier(inputs=img)
pred = nl(inputs=logits)
loss = nll_loss(predictions=pred, targets=tgt)
# Set output - that output will be used for training.
training_graph.outputs["loss"] = loss

# Display the graph summmary.
logging.info(training_graph.summary())

# SimpleLossLoggerCallback will print loss values to console.
callback = SimpleLossLoggerCallback(
tensors=[loss], print_func=lambda x: logging.info(f'Training Loss: {str(x[0].item())}')
)

# Log training metrics to W&B.
wand_callback = WandbCallback(
train_tensors=[loss], wandb_name="simple-mnist-fft", wandb_project="cv-collection-image-classification",
)

# Invoke the "train" action.
nf.train(
training_graph=training_graph,
callbacks=[callback, wand_callback],
optimization_params={"num_epochs": 10, "lr": 0.001},
optimizer="adam",
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# =============================================================================
# Copyright (c) 2020 NVIDIA. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# =============================================================================

import argparse

import nemo.utils.argparse as nm_argparse
from nemo.collections.cv.modules.data_layers import CIFAR10DataLayer
from nemo.collections.cv.modules.losses import NLLLoss
from nemo.collections.cv.modules.non_trainables import NonLinearity, ReshapeTensor
from nemo.collections.cv.modules.trainables import FeedForwardNetwork, GenericImageEncoder
from nemo.core import (
DeviceType,
NeuralGraph,
NeuralModuleFactory,
OperationMode,
SimpleLossLoggerCallback,
WandbCallback,
)
from nemo.utils import logging

if __name__ == "__main__":
# Create the default parser.
parser = argparse.ArgumentParser(parents=[nm_argparse.NemoArgParser()], conflict_handler='resolve')
# Parse the arguments
args = parser.parse_args()

# Instantiate Neural Factory.
nf = NeuralModuleFactory(local_rank=args.local_rank, placement=DeviceType.CPU)

# Data layer - upscale the CIFAR10 images to ImageNet resolution.
cifar10_dl = CIFAR10DataLayer(height=224, width=224, train=True)
# The "model".
image_encoder = GenericImageEncoder(model_type="vgg16", return_feature_maps=True, pretrained=True, name="vgg16")
reshaper = ReshapeTensor(input_sizes=[-1, 7, 7, 512], output_sizes=[-1, 25088])
ffn = FeedForwardNetwork(input_size=25088, output_size=10, hidden_sizes=[1000, 1000], dropout_rate=0.1)
nl = NonLinearity(type="logsoftmax", sizes=[-1, 10])
# Loss.
nll_loss = NLLLoss()

# Create a training graph.
with NeuralGraph(operation_mode=OperationMode.training) as training_graph:
img, tgt = cifar10_dl()
feat_map = image_encoder(inputs=img)
res_img = reshaper(inputs=feat_map)
logits = ffn(inputs=res_img)
pred = nl(inputs=logits)
loss = nll_loss(predictions=pred, targets=tgt)
# Set output - that output will be used for training.
training_graph.outputs["loss"] = loss

# Freeze the pretrained encoder.
training_graph.freeze(["vgg16"])
logging.info(training_graph.summary())

# SimpleLossLoggerCallback will print loss values to console.
callback = SimpleLossLoggerCallback(
tensors=[loss], print_func=lambda x: logging.info(f'Training Loss: {str(x[0].item())}')
)

# Log training metrics to W&B.
wand_callback = WandbCallback(
train_tensors=[loss], wandb_name="simple-mnist-fft", wandb_project="cv-collection-image-classification",
)

# Invoke the "train" action.
nf.train(
training_graph=training_graph,
callbacks=[callback, wand_callback],
optimization_params={"num_epochs": 10, "lr": 0.001},
optimizer="adam",
)
Loading