Following is the official documentation of torchy, a pytorch wrapper, and all the additional functions of the wrapper. This documentation contains detail descriptions of all the arguments that can be passed on the new utilities of nn.Module as well as guides and tutorials to make the best use of torchy wrapper.
Argument | Default | Description |
---|---|---|
train_dataloader | N/A | Pass either the training dataset in form of TensorDataset or the DataLoader for training. Note: It's upon you to pass the validation dataloader if you choose to pass a training dataloader. You don't have to worry about validation dataloader if you passed the TensorDataset. |
loss_fn | N/A | The loss function that should be used to calculate the loss function. Ideal loss function should be from nn.functional but a custom loss function that can handle tensors will work fine. |
opt | N/A | The optimizer that the model should use to fine tune the parameters. Initialize optimizer form torch.optim, set appropirate hyperparameters and pass to .fit(). |
epochs | N/A | The number os epochs. Pass integer. |
valid_dataloader | None | Pass a validation DataLoader if you have passed a training DataLoader instead of a Dataset. |
valid_pct | 30 | The percentage of the training TensorDataset that should be the validation dataset. Usual values between 10 and 30. |
batch_size | 32 | The batch size of the Training DataLoader that will be created when you upload a TensorDataset. Ignore if you passed a DataLoader by default. |
accuracy | False | Either to calculate the accuracy of model or not. Pass Boolean True or False. |
device | CPU | The device that the given model, and its dataset should be converted into. Only provide device if you passed a TensorDataset. Value should be any device available ('cpu' or'cuda'). |
Argument | Description |
---|---|
batch | The TensorDataset or batch or tuple containing the input and output tensors in the form of (x,y) whose loss is to be calculated |
loss_fn | The loss function from nn.functional should be passed. |
Argument | Description |
---|---|
labels | The actual outputs. |
preds | The predicted outputs. |
Argument | Description |
---|---|
valid_dl | The validation dataloader which should be used to do validation |
loss_fn | The loss function from nn.functional should be passed. |
PS: When using model.validate() you don't need to do model.eval() beforehand as the method is already decorated with @torch.no_grad()
and has self.eval()
implemented whithin the method.
DeviceDL is a helper tool to put your DataLoader into given device in the most efficient way. It's not recommended to put the entire DataLoader into given device in pytorch, so DeviceDL would only put the current batch in the specified device and pytorch will automatically remove the batch from device after its processed.
DeviceDL can be used in the following way:
from torchy.utils.data import DeviceDL, DataLoader, TensorDataset
# create a TensorDataset
dataset = TensorDataset(x,y)
# create your desired dataloader with the hyperparameters
dataloader = DataLoader(dataset, ...)
# now put your dataloader into the appropirate device using DeviceDL
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dataloader_device = DeviceDL(dataloader, device)
Argument | Description |
---|---|
dl | The dataloader that you have created using for your model. |
dev | The device that the given dataloader should be kept in. |
Do you find it a meh
experience when you have to count the number of examples you want to use in your training and validation set when using random_split? you don't want to install sklearn in your virtual environment just to split your dataset into train and validation sets?
If so, SplitPCT is a native pytorch's implementation to split the given dataset into training and validaiton set based on the percentage of data you want in training set and not the number of examples that the dataset should be divided into.
SplitPCT can be used in the following way:
from torchy.utils.data import SplitPCT, TensorDataset
# create a dataset
dataset = TensorDataset(x,y)
# determine what percentage of data should be on your training dataset
training_pct = 75
dataset = SplitPCT(dataset,training_pct)
# get training dataset and validation dataset as attributes
train_ds, validaiton_ds = dataset.train_ds, dataset.validation_ds
# get the original TensorDataset
orig_dataset = dataset.tensor_dataset
PS: the dataset passed to SplitPCT can be any type of pytorch's dataset and should not be limited to TensorDataset.
Argument | Description |
---|---|
tensor_dataset | The dataset that you have created using for your model. |
train_pct | The percentage of data that sould be on the training dataset; rest will be validation dataset. |
By definition, torchy is a pytorch wrapper, so there will be no changes on torch.nn or any other torch functionality. torchy.nn can replacetorch.nn there will not be any unsolvable errors.
Examples and quick start guide to use torchy can also be found on the project readme at github.
Since torchy is just a Wrapper and doesn't implement everything from scratch its recommended to implement just the nn.Module using the wrapper.
Recommended
import torch
import torchy.nn as nn
import torch.nn.functional as F
Make your models as you would using torch.nn
class Model(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(1, 1)
def forward(self,x):
return self.linear(x)
model = Model()
Choose your loss function and optimizers
loss_fn = F.mse_loss
opt = torch.optim.SGD(model.parameters(), lr=0.001)
Then, you can use torchy's DeviceDL to put your DataLoader in the given device
from torchy.utils.data import DeviceDL
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
train_dataloader = DeviceDL(old_dataloader, device)
valid_dataloader = DeviceDL(old_valid_dataloader, device)
Now you can fit the model
hist = model.fit(
train_dataloader,
loss_fn,
opt,
epochs,
valid_dataloader,
batch_size=64,
accuracy=True,
device=device
)
If you don't want to go through the hassle of making a dataloader then don't worry, torchy will do it for you.
PS: Torchy requires a TensorDataset to be passed for the following implementation to work.
hist = model.fit(
dataset,
loss_fn,
opt,
epochs,
valic_pct=30,
batch_size=64,
accuracy=True,
device=device
)
print(hist)
Looks pretty same and simple with the only change being you have to provide the percentage of the dataset that should be in the validation dataset and eventually the validation dataloader.
Because the wheel doesn't need to be reinvented when using torchy. The end user can just use torchy as torch and just learn some new methods in the nn.Module that are handy.