Transformer-based English-Bengali Translation Model

This repository demonstrates the implementation of a Transformer model for translation tasks, specifically for translating from English to Bengali. The project focuses on understanding the workings of the Transformer architecture and coding it from scratch using PyTorch.

Introduction

Transformers have become the de-facto standard for translation tasks due to their ability to process sequences in parallel and leverage the attention mechanism for fast and efficient translations.

This project implements a basic Transformer model trained on the Samantar Dataset. The model is far from being a standard English-Bengali translator and currently produces nearly random Bengali word sequences for given English input sequences.

The primary goal of this project is to understand the Transformer architecture and build it from scratch.
Character level encoding is used over here for simplicity, unlike the Byte Pair Encoding algorithm which is widely used and much better in encoding syntatic and semantic information of tokens.

Dataset

The model is trained on the Samantar Dataset, which contains English-Bengali parallel text.

Dataset Link: Samantar Dataset
The dataset was preprocessed and tokenized to suit the input format required by the Transformer model.

Model Architecture

The Transformer model used in this project has the following specifications:

Layers: 2
Model Dimension (d_model): 256
Number of Epochs: 1

Key Features:

Parallel processing of sequences.
Scaled dot-product attention mechanism.
Encoder-decoder architecture.

For details, refer to the Transformer Architecture paper by Vaswani et al.

Training Details

Due to limited computational resources, we trained a very shallow version of the Transformer model:

Parameter	Value
Layers	2
Model Dimension	256
Epochs	1

Expected Performance:
When trained with:

Layers: 6
Model Dimension (d_model): 512
Epochs: 7–8
A decent translation performance can be expected.

Results

The current model generates sequences of random Bengali words for a given English input sequence.

This is due to:

Limited training (shallow architecture and low epochs).
Insufficient computational resources for training a deeper model.

Future Work

Train a deeper Transformer model with d_model = 512, layers = 6, and epochs = 7–8.
Explore hyperparameter tuning to improve translation accuracy.
Experiment with pre-trained embeddings for better initialization.
Use GPUs for faster and more efficient training.

Acknowledgments

The Samantar Dataset for providing the parallel English-Bengali text corpus.
The PyTorch community for its excellent documentation and resources.
The authors of the Transformer architecture paper (Attention Is All You Need).

Disclaimer

This project is purely educational and aims to demonstrate the working of the Transformer model. The current implementation is not intended for production use.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
Attention Is All You Need.pdf		Attention Is All You Need.pdf
Readme.md		Readme.md
layer_norm.ipynb		layer_norm.ipynb
model_inphase1-2.pth		model_inphase1-2.pth
self_attention.ipynb		self_attention.ipynb
tokeinsation.ipynb		tokeinsation.ipynb
train.py		train.py
transformer.py		transformer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer-based English-Bengali Translation Model

Table of Contents

Introduction

Dataset

Model Architecture

Training Details

Results

Future Work

Acknowledgments

Disclaimer

About

Releases

Packages

Languages

SurAyush/Transformer_from_scratch

Folders and files

Latest commit

History

Repository files navigation

Transformer-based English-Bengali Translation Model

Table of Contents

Introduction

Dataset

Model Architecture

Training Details

Results

Future Work

Acknowledgments

Disclaimer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages