visit
PyTorch has sort of became one of the de facto standard for creating Neural Networks now, and I love its interface. Yet, it is somehow a little difficult for beginners to get a hold of.
I remember picking PyTorch up only after some extensive experimentation a couple of years back. To tell you the truth, it took me a lot of time to pick it up but am I glad that I moved from . With its high customizability and pythonic syntax, PyTorch is just a joy to work with, and I would recommend it to anyone who wants to do some heavy lifting with Deep Learning.So, in this PyTorch guide, I will try to ease some of the pain with PyTorch for starters and go through some of the most important classes and modules that you will require while creating any Neural Network with Pytorch.
But, that is not to say that this is aimed at beginners only as I will also talk about the high customizability PyTorch provides and will talk about custom Layers, Datasets, Dataloaders, and Loss functions.
So let’s get some coffee ☕ ️and start it up.1. Create a Tensor
We can create a PyTorch tensor in multiple ways. This includes converting to tensor from a NumPy array. Below is just a small gist with some examples to start with, but you can do a whole lot of with tensors just like you can do with NumPy arrays.# Using torch.Tensor
t = torch.Tensor([[1,2,3],[3,4,5]])
print(f"Created Tensor Using torch.Tensor:\n{t}")
# Using torch.randn
t = torch.randn(3, 5)
print(f"Created Tensor Using torch.randn:\n{t}")
# using torch.[ones|zeros](*size)
t = torch.ones(3, 5)
print(f"Created Tensor Using torch.ones:\n{t}")
t = torch.zeros(3, 5)
print(f"Created Tensor Using torch.zeros:\n{t}")
# using torch.randint - a tensor of size 4,5 with entries between 0 and 10(excluded)
t = torch.randint(low = 0,high = 10,size = (4,5))
print(f"Created Tensor Using torch.randint:\n{t}")
# Using from_numpy to convert from Numpy Array to Tensor
a = np.array([[1,2,3],[3,4,5]])
t = torch.from_numpy(a)
print(f"Convert to Tensor From Numpy Array:\n{t}")
# Using .numpy() to convert from Tensor to Numpy array
t = t.numpy()
print(f"Convert to Numpy Array From Tensor:\n{t}")
2. Tensor Operations
Again, there are a lot of operations you can do on these tensors. The full list of functions can be found .A = torch.randn(3,4)
W = torch.randn(4,2)
# Multiply Matrix A and W
t = A.mm(W)
print(f"Created Tensor t by Multiplying A and W:\n{t}")
# Transpose Tensor t
t = t.t()
print(f"Transpose of Tensor t:\n{t}")
# Square each element of t
t = t**2
print(f"Square each element of Tensor t:\n{t}")
# return the size of a tensor
print(f"Size of Tensor t using .size():\n{t.size()}")
Note: What are PyTorch Variables? In the previous versions of Pytorch, Tensor and Variables used to be different and provided different functionality, but now the Variable API is , and all methods for variables work with Tensors. So, if you don’t know about them, it’s fine as they re not needed, and if you know them, you can forget about them.
Here comes the fun part as we are now going to talk about some of the most used constructs in Pytorch while creating deep learning projects. nn.Module lets you create your Deep Learning models as a class. You can inherit from nn.Moduleto define any model as a class. Every model class necessarily contains an
__init__
procedure block and a block for the forward
pass.__init__
part, the user can define all the layers the network is going to have but doesn’t yet define how those layers would be connected to each other.forward
pass block, the user defines how data flows from one layer to another inside the network.class myNeuralNet(nn.Module):
def __init__(self):
super().__init__()
# Define all Layers Here
self.lin1 = nn.Linear(784, 30)
self.lin2 = nn.Linear(30, 10)
def forward(self, x):
# Connect the layer Outputs here to define the forward pass
x = self.lin1(x)
x = self.lin2(x)
return x
class myCrazyNeuralNet(nn.Module):
def __init__(self):
super().__init__()
# Define all Layers Here
self.lin1 = nn.Linear(784, 30)
self.lin2 = nn.Linear(30, 784)
self.lin3 = nn.Linear(30, 10)
def forward(self, x):
# Connect the layer Outputs here to define the forward pass
x_lin1 = self.lin1(x)
x_lin2 = x + self.lin2(x_lin1)
x_lin2 = self.lin1(x_lin2)
x = self.lin3(x_lin2)
return x
x = torch.randn((100,784))
model = myCrazyNeuralNet()
model(x).size()
--------------------------
torch.Size([100, 10])
Pytorch is pretty powerful, and you can actually create any new experimental layer by yourself using
nn.Module
. For example, rather than using the predefined Linear Layer nn.Linear
from Pytorch above, we could have created our custom linear layer.class myCustomLinearLayer(nn.Module):
def __init__(self,in_size,out_size):
super().__init__()
self.weights = nn.Parameter(torch.randn(in_size, out_size))
self.bias = nn.Parameter(torch.zeros(out_size))
def forward(self, x):
return x.mm(self.weights) + self.bias
Parameters are subclasses, that have a very special property when used with Module - when they’re assigned as Module attributes they are automatically added to the list of its parameters, and will appear in parameters() iterator
As you will later see, the
model.parameters()
iterator will be an input to the optimizer. But more on that later.Right now, we can now use this custom layer in any PyTorch network, just like any other layer.class myCustomNeuralNet(nn.Module):
def __init__(self):
super().__init__()
# Define all Layers Here
self.lin1 = myCustomLinearLayer(784,10)
def forward(self, x):
# Connect the layer Outputs here to define the forward pass
x = self.lin1(x)
return x
x = torch.randn((100,784))
model = myCustomNeuralNet()
model(x).size()
------------------------------------------
torch.Size([100, 10])
conv_layer = nn.Conv2d(in_channels = 3, out_channels = 64, kernel_size = (3,3), stride = 1, padding=1)
x = torch.randn((100,3,24,24))
conv_layer(x).size()
--------------------------------
torch.Size([100, 64, 24, 24])
data
train
sailboat
kayak
.
.
We can use
torchvision.datasets.ImageFolder
dataset to get an example image like below:from torchvision import transforms
from torchvision.datasets import ImageFolder
traindir = "data/train/"
t = transforms.Compose([
transforms.Resize(size=256),
transforms.CenterCrop(size=224),
transforms.ToTensor()])
train_dataset = ImageFolder(root=traindir,transform=t)
print("Num Images in Dataset:", len(train_dataset))
print("Example Image and Label:", train_dataset[2])
for i in range(0,len(train_dataset)):
image ,label = train_dataset[i]
pred = model(image)
But that is not optimal. We want to do batching. We can actually write some more code to append images and labels in a batch and then pass it to the Neural network. But Pytorch provides us with a utility iterator
torch.utils.data.DataLoader
to do precisely that. Now we can simply wrap our train_dataset
in the Dataloader, and we will get batches instead of individual examples.train_dataloader = DataLoader(train_dataset,batch_size = 64, shuffle=True, num_workers=10)
for image_batch, label_batch in train_dataloader:
print(image_batch.size(),label_batch.size())
break
------------------------------------------------------------------
torch.Size([64, 3, 224, 224]) torch.Size([64])
t = transforms.Compose([
transforms.Resize(size=256),
transforms.CenterCrop(size=224),
transforms.ToTensor()])
train_dataset = torchvision.datasets.ImageFolder(root=traindir,transform=t)
train_dataloader = DataLoader(train_dataset,batch_size = 64, shuffle=True, num_workers=10)
for image_batch, label_batch in train_dataloader:
pred = myImageNeuralNet(image_batch)
To write our custom datasets, we can make use of the abstract class
torch.utils.data.Dataset
provided by Pytorch. We need to inherit this Dataset class and need to define two methods to create a custom Dataset.__len__
: a function that returns the size of the dataset. This one is pretty simple to write in most cases.__getitem__
: a function that takes as input an index i and returns the sample at index i.For example, we can create a simple custom dataset that returns an image and a label from a folder. See that most of the tasks are happening in
__init__
part where we use glob.glob
to get image names and do some general preprocessing.from glob import glob
from PIL import Image
from torch.utils.data import Dataset
class customImageFolderDataset(Dataset):
"""Custom Image Loader dataset."""
def __init__(self, root, transform=None):
"""
Args:
root (string): Path to the images organized in a particular folder structure.
transform: Any Pytorch transform to be applied
"""
# Get all image paths from a directory
self.image_paths = glob(f"{root}/*/*")
# Get the labels from the image paths
self.labels = [x.split("/")[-2] for x in self.image_paths]
# Create a dictionary mapping each label to a index from 0 to len(classes).
self.label_to_idx = {x:i for i,x in enumerate(set(self.labels))}
self.transform = transform
def __len__(self):
# return length of dataset
return len(self.image_paths)
def __getitem__(self, idx):
# open and send one image and label
img_name = self.image_paths[idx]
label = self.labels[idx]
image = Image.open(img_name)
if self.transform:
image = self.transform(image)
return image,self.label_to_idx[label]
Also, note that we open our images one at a time in the
__getitem__
method and not while initializing. This is not done in __init__
because we don't want to load all our images in the memory and just need to load the required ones.We can now use this dataset with the utility Dataloader just like before. It works just like the previous dataset provided by PyTorch but without some utility functions.
t = transforms.Compose([
transforms.Resize(size=256),
transforms.CenterCrop(size=224),
transforms.ToTensor()])
train_dataset = customImageFolderDataset(root=traindir,transform=t)
train_dataloader = DataLoader(train_dataset,batch_size = 64, shuffle=True, num_workers=10)
for image_batch, label_batch in train_dataloader:
pred = myImageNeuralNet(image_batch)
This particular section is a little advanced and can be skipped going through this post as it will not be needed in a lot of situations. But I am adding it for completeness here.
So let’s say you are looking to provide batches to a network that processes text input, and the network could take sequences with any sequence size as long as the size remains constant in the batch. For example, we can have a BiLSTM network that can process sequences of any length. It’s alright if you don’t understand the layers used in it right now; just know that it can process sequences with variable sizes.class BiLSTM(nn.Module):
def __init__(self):
super().__init__()
self.hidden_size = 64
drp = 0.1
max_features, embed_size = 10000,300
self.embedding = nn.Embedding(max_features, embed_size)
self.lstm = nn.LSTM(embed_size, self.hidden_size, bidirectional=True, batch_first=True)
self.linear = nn.Linear(self.hidden_size*4 , 64)
self.relu = nn.ReLU()
self.dropout = nn.Dropout(drp)
self.out = nn.Linear(64, 1)
def forward(self, x):
h_embedding = self.embedding(x)
h_embedding = torch.squeeze(torch.unsqueeze(h_embedding, 0))
h_lstm, _ = self.lstm(h_embedding)
avg_pool = torch.mean(h_lstm, 1)
max_pool, _ = torch.max(h_lstm, 1)
conc = torch.cat(( avg_pool, max_pool), 1)
conc = self.relu(self.linear(conc))
conc = self.dropout(conc)
out = self.out(conc)
return out
model = BiLSTM()
input_batch_1 = torch.randint(low = 0,high = 10000, size = (100,10))
input_batch_2 = torch.randint(low = 0,high = 10000, size = (100,25))
print(model(input_batch_1).size())
print(model(input_batch_2).size())
------------------------------------------------------------------
torch.Size([100, 1])
torch.Size([100, 1])
class CustomTextDataset(Dataset):
'''
Simple Dataset initializes with X and y vectors
We start by sorting our X and y vectors by sequence lengths
'''
def __init__(self,X,y=None):
self.data = list(zip(X,y))
# Sort by length of first element in tuple
self.data = sorted(self.data, key=lambda x: len(x[0]))
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
return self.data[idx]
import numpy as np
train_data_size = 1024
sizes = np.random.randint(low=50,high=300,size=(train_data_size,))
X = [np.random.randint(0,10000, (sizes[i])) for i in range(train_data_size)]
y = np.random.rand(train_data_size).round()
#checking one example in dataset
print((X[0],y[0]))
Example of one random sequence and label. Each integer in the sequence corresponds to a word in the sentence.
We can use the custom dataset now using:train_dataset = CustomTextDataset(X,y)
train_dataloader = DataLoader(train_dataset,batch_size = 64, shuffle=False, num_workers=10)
for xb,yb in train_dataloader:
print(xb.size(),yb.size())
So, how do we iterate through this dataset so that each batch has sequences with the same length, but different batches may have different sequence lengths?
We can use
collate_fn
parameter in the DataLoader that lets us define how to stack sequences in a particular batch. To use this, we need to define a function that takes as input a batch and returns (x_batch
, y_batch
) with padded sequence lengths based on max_sequence_length
in the batch. The functions I have used in the below function are simple NumPy operations. Also, the function is properly commented so you can understand what is happening.def collate_text(batch):
# get text sequences in batch
data = [item[0] for item in batch]
# get labels in batch
target = [item[1] for item in batch]
# get max_seq_length in batch
max_seq_len = max([len(x) for x in data])
# pad text sequences based on max_seq_len
data = [np.pad(p, (0, max_seq_len - len(p)), 'constant') for p in data]
# convert data and target to tensor
data = torch.LongTensor(data)
target = torch.LongTensor(target)
return [data, target]
train_dataloader = DataLoader(train_dataset,batch_size = 64, shuffle=False, num_workers=10,collate_fn = collate_text)
for xb,yb in train_dataloader:
print(xb.size(),yb.size())
See that the batches have different sequence lengths now
It will work this time as we have provided a custom collate_fn. And see that the batches have different sequence lengths now. Thus we would be able to train our BiLSTM using variable input sizes just like we wanted.We know how to create a neural network using
nn.Module
. But how to train it? Any neural network that has to be trained will have a training loop that will look something similar to below:num_epochs = 5
for epoch in range(num_epochs):
# Set model to train mode
model.train()
for x_batch,y_batch in train_dataloader:
# Clear gradients
optimizer.zero_grad()
# Forward pass - Predicted outputs
pred = model(x_batch)
# Find Loss and backpropagation of gradients
loss = loss_criterion(pred, y_batch)
loss.backward()
# Update the parameters
optimizer.step()
model.eval()
for x_batch,y_batch in valid_dataloader:
pred = model(x_batch)
val_loss = loss_criterion(pred, y_batch)
model(x_batch)
loss_criterion
loss.backward()
call. We don't have to worry about the calculation of the gradients at all, as this simple call does it all for us.optimizer.step()
. This is where weights of the network get modified using the gradients calculated in loss.backward()
call.model.eval()
. Please note we don't back-propagate losses in eval mode.Till now, we have talked about how to use
nn.Module
to create networks and how to use Custom Datasets and Dataloaders with Pytorch. So let's talk about the various options available for Loss Functions and Optimizers.Pytorch provides us with a variety of for our most common tasks, like Classification and Regression. Some most used examples are , , and . You can read the documentation of each loss function, but to explain how to use these loss functions, I will go through the example of
batch_size
x Num_Classes
) — These are the predictions from the Neural Network we have created.LogSoftmax
Layer as the last layer of our network.So, we can try to use this Loss function for a simple classification network. Please note the LogSoftmax layer after the final linear layer. If you don't want to use this
LogSoftmax
layer, you could have just used
class myClassificationNet(nn.Module):
def __init__(self):
super().__init__()
# Define all Layers Here
self.lin = nn.Linear(784, 10)
self.logsoftmax = nn.LogSoftmax(dim=1)
def forward(self, x):
# Connect the layer Outputs here to define the forward pass
x = self.lin(x)
x = self.logsoftmax(x)
return x
# some random input:
X = torch.randn(100,784)
y = torch.randint(low = 0,high = 10,size = (100,))
model = myClassificationNet()
preds = model(X)
criterion = nn.NLLLoss()
loss = criterion(preds,y)
loss
------------------------------------------
tensor(2.4852, grad_fn=<NllLossBackward>)
Defining your custom loss functions is again a piece of cake, and you should be okay as long as you use tensor operations in your loss function. For example, here is the
customMseLoss
def customMseLoss(output,target):
loss = torch.mean((output - target)**2)
return loss
output = model(x)
loss = customMseLoss(output, target)
loss.backward()
class CustomNLLLoss(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x, y):
# x should be output from LogSoftmax Layer
log_prob = -1.0 * x
# Get log_prob based on y class_index as loss=-mean(ylogp)
loss = log_prob.gather(1, y.unsqueeze(1))
loss = loss.mean()
return loss
criterion = CustomNLLLoss()
loss = criterion(preds,y)
Once we get gradients using the loss.backward() call, we need to take an optimizer step to change the weights in the whole network. Pytorch provides a variety of different ready to use optimizers using the
torch.optim
module. For example: , , and the most widely used .To use the most used Adam optimizer from PyTorch, we can simply instantiate it with:optimizer = torch.optim.Adam(model.parameters(), lr=0.01, betas=(0.9, 0.999))
And then use
optimizer
.
zero_grad()
and optimizer.step()
while training the model.I am not discussing how to write custom optimizers as it is an infrequent use case, but if you want to have more optimizers, do check out the library, which provides a lot of other optimizers used in research papers. Also, if you anyhow want to create your own optimizers, you can take inspiration using the source code of implemented optimizers in or .Other optimizers from library
Till now, whatever we have done is on the CPU. If you want to use a GPU, you can put your model to GPU using
model.to('cuda')
. Or if you want to use multiple GPUs, you can use nn.DataParallel
. Here is a utility function that checks the number of GPUs in the machine and sets up parallel training automatically using DataParallel if needed.# Whether to train on a gpu
train_on_gpu = torch.cuda.is_available()
print(f'Train on gpu: {train_on_gpu}')# Number of gpus
if train_on_gpu:
gpu_count = torch.cuda.device_count()
print(f'{gpu_count} gpus detected.')
if gpu_count > 1:
multi_gpu = True
else:
multi_gpu = False
if train_on_gpu:
model = model.to('cuda')
if multi_gpu:
model = nn.DataParallel(model)
num_epochs = 5
for epoch in range(num_epochs):
model.train()
for x_batch,y_batch in train_dataloader:
if train_on_gpu:
x_batch,y_batch = x_batch.cuda(), y_batch.cuda()
optimizer.zero_grad()
pred = model(x_batch)
loss = loss_criterion(pred, y_batch)
loss.backward()
optimizer.step()
model.eval()
for x_batch,y_batch in valid_dataloader:
if train_on_gpu:
x_batch,y_batch = x_batch.cuda(), y_batch.cuda()
pred = model(x_batch)
val_loss = loss_criterion(pred, y_batch)
Full disclosure: There are some affiliate links in this post to relevant resources, as sharing knowledge is never a bad idea.
Also published on: