- Basic introduction to convnets
- First steps with PyTorch
- numpy
- matplotlib
- torch
- torchvision
The latter two can be installed through Anaconda:
conda install pytorch torchvision -c soumith
In this exercise, we follow one of the PyTorch tutorials and train a very simple network to recognize objects in 32x32 pixel images. Because this involves lots of framework functions, it will be more of a "guided copy and paste" than actual programming.
Lets start with some imports that we will need:
import torch
import torchvision
import torchvision.transforms as transforms
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import matplotlib.pyplot as plt
import numpy as np
We want to train and test on the cifar10 dataset. Torch already contains code for (down)loading these datasets. For most learning approaches, it is beneficial to scale and shift the input data such that the mean is zero and the variance is one. This can be done inside the loader. Since the cifar data is scaled between 0 and 1 we assume a mean of 0.5 and a standard deviation of 0.5. This will let the loader scale it into the range of -1 and 1.
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
With this transform at hand, load the training data. Upon first call, this will download the data from the internet.
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
Implement a second loader for the test dataset.
The following code fetches a mini batch of training data (4 samples):
dataiter = iter(trainloader)
images, labels = dataiter.next()
Display the four classes. The names can be read from this array:
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Display the corresponding four images using:
def imshow(img):
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
plt.show()
imshow(torchvision.utils.make_grid(images))
The following code builds a network with one convolution (operating on 3 channels, with 6 filters of size 5x5) + relu + pooling, a flattening operation, and a fully connected layer:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.fc1 = nn.Linear(6 * 14 * 14, 10)
def forward(self, x):
# x is 32x32x3
x = self.pool(F.relu(self.conv1(x)))
# x is 14x14x6 (Filters of size 5 without padding take away 4 pixels. Max pooling halves resolution.)
x = x.view(-1, 6 * 14 * 14)
x = self.fc1(x)
return x
net = Net()
print(net)
The last layer does not have a relu activation and has one output for each class. Extend the network to 2x conv + relu + pool and 3x fully connected layers. Remember that, except for the last layer, the fully connected layers also need relu activations.
Train the network on the training data for 2 epochs (you can also try 10 epochs if you have lots of time) using cross entropy loss and stochastic gradient descend:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs
inputs, labels = data
# wrap them in Variable
inputs, labels = Variable(inputs), Variable(labels)
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.data[0]
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
Load a minibatch of "images" and "labels" from the test dataset. Run the trained network on the minibatch of test data:
outputs = net(Variable(images))
_, predicted = torch.max(outputs.data, 1)
imshow(torchvision.utils.make_grid(images))
Look at the images and compare the predicted labels "predicted" to the ground truth labels "labels".
Compute the accuracy on the entire test dataset:
correct = 0
total = 0
for data in testloader:
images, labels = data
# run through network and update number of total and correct labels.
print('Accuracy of the network on the 10000 test images: %d %%' % (
100 * correct / total))