- Chapter 1
- Chapter 2
- Label / Class: The tag that a given image falls into
- Ex. if you’re given an image of a dog, the label is ‘dog’
- Image classification
- Classifying objects within an image
- Object localization
- Identifying objects positions within an image
- Object detection
- Identifying and labeling objects within an image
- Scene classification
- Classifying the given situation within an image
- Scene parsing
- Dividing an image into regions which relate to the contents of the image
- Starting image, ex. Dog
- Image gets resized and normalized
- Preprocessed into a torch.Tensor (vector)
- Forward passed into model, outputs a vector of scores
- Scores map one-to-one to label
- Each score represents the score associated with a given class
- Some good models include:
- AlexNet, ResNet, and Inception v3
Models can be found within torchvision.models, theres a bunch of them
from torchvision import models
list_models = dir(models)
print("Num of models: ", len(list_models))
print(list_models[0:5])
print(list_models[170:175])
- NOTE: Notice how theres both uppercase / lowercase models?
- Uppercase represent classes that implement varying amount of models
- Lowercase represent convince functions that return an instance of that model type
- ex. ‘resnet50’ returns an instance of model ‘resnet’ with 50 layers
Initializing the network is as easy as:
alexnet = models.AlexNet()
However, this creates a randomized AlexNet object, with no pretrained weights
We can use one of those handy-dandy lowercased models, which instantiates a model for us with pretrained models.For example, creating instance of resnet with 101 layers:
resnet = models.resnet101(pretrained=True)
Prior to using any neural network, we have to process and normalize the input, so the network can work When processing images, we can preprocess them with torchvision’s transforms module. Here is an example:
from torchvision import transforms
preprocess = transforms.Compose([
transforms.Resize(256), # resizes to 256 x 256
transforms.CenterCrop(224), # crops 224 x 224 around center of new image
transforms.ToTensor(), # converts image to tensor
transforms.Normalize( # normalizes color channels (RGB)
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
NOTE: transforms.Compose() returns a tensor with those given parameters
Lets say you have a dog image located at PATH=”~/dog.png”. To feed it into the neural network, we would have to:
- Preprocess it ->
img_tensor = preprocess(img)
- Create an inference of the neural network we’re using
- An inference is just putting the NN in an evaluate state where it can produce actual results
- Done so by doing
resnet.eval()
- Feeding our
img_tensor
into the NNoutput = resnet(img_tensor)
- Read lines from “imagenet_classes.txt”
- Get the maximum score from
output
via_, index = torch.max(output, 1)
- Get the models confidence level for the prediction
percentage = torch.nn.functional.softmax(out, dim=1)[0] * 100 labels[index[0], percentage[index[0]]].item()
Should output: (‘golden retriever’, 96.293…) Therefore the model predicted with 96.293% certainty that the dog image was a golden retriever
- Listing out the labels of the other predictions
_, indices = torch.sort(out, descending=True) other_predictions_and_percentages = [(labels[idx], percentage[idx].item()) for idx in indices[0][:5]]
GAN Diagram Two neural networks that compete against each other, where one produces an output, while the other interprets and identifies the mistakes in the output
- Generator: produces output from an arbitrary input
- Its end goal is fool the discriminator with its image outputs
- Discriminator: analyzes whether or not an input image was fabricated or belongs to a real set of images
- Its end goal is to know when its being fooled
CycleGAN Diagram Note that the D’s in the image are discriminators
- Can turn images from one domain into images from another (and back)
- Ex. turning an image into a Zebra, form a picture of a horse
Natural language model for describing scenes within an image