This repository contains a simple English to French translator implemented using a sequence-to-sequence (seq2seq) model. It includes a dataset file (fra.txt
) containing English and corresponding French translations, as well as a Python script (English_to_french_translator.py
) which trains the model, makes predictions, and saves the trained model.
-
fra.txt
: This dataset file contains English sentences and their corresponding French translations. Each line represents a pair of sentences separated by a tab. -
English_to_french_translator.py
: This Python script contains the implementation of the English to French translation model. It includes code for data loading, preprocessing, model definition, training, and prediction.
- The model utilizes a seq2seq architecture for translation.
- The model is trained using TensorFlow and Keras libraries.
- It preprocesses the text data by converting it to lowercase and removing punctuation.
- Tokenization is applied to both the source (French) and target (English) language sentences.
- The model architecture includes an embedding layer, LSTM layers, and a dense output layer with softmax activation.
- The training process involves splitting the data into train and test sets, compiling the model, and fitting it to the training data.
- After training, the model is used to make predictions on the test set.
- The predicted sequences are decoded to obtain the translated sentences.
To use the translator model:
- Ensure you have the necessary dependencies installed, including TensorFlow and Keras.
- Clone or download this repository to your local machine.
- Place your own English-French dataset (in the format similar to
fra.txt
) or use the provided dataset (fra.txt
). - Run the
English_to_french_translator.py
script to train the model and generate translations. - The trained model will be saved as
seq2seq_model.h5
, which can be loaded and used for translation in any application.
- The seq2seq model is a powerful architecture commonly used for tasks like machine translation.
- The saved model (
seq2seq_model.h5
) can be utilized in any application by loading it using TensorFlow/Keras APIs.