advanced_transformer_machine_translation

Advanced - Transformer for Machine Translation

Description

This project provides a from-scratch implementation of a simplified Transformer model for a machine translation task. The goal is to translate short English sentences into Spanish. It is designed to be an educational tool to understand the core components of the Transformer architecture, which has become the foundation for most state-of-the-art NLP models.

Note: This is a simplified implementation. A production-level machine translation system would require a much larger dataset (like WMT or Tatoeba), a deeper and wider model, and significantly more training time and computational resources.

Functionality

Dataset Preparation: The script uses a small, hard-coded list of English-Spanish sentence pairs as its training corpus.
Text Vectorization: It utilizes TensorFlow's TextVectorization layer to perform tokenization, create vocabularies for both languages, and convert the text sentences into integer sequences.
Custom Transformer Components: The key components of the Transformer are built as custom Keras layers:
- Positional Embedding: A layer that injects positional information into the word embeddings, which is crucial since the model itself does not have a sense of sequence order.
- Multi-Head Self-Attention: The core mechanism of the Transformer, allowing the model to weigh the importance of different words in the input and output sequences.
- Transformer Encoder Layer: A block containing a multi-head attention layer and a feed-forward network.
- Transformer Decoder Layer: A block containing two multi-head attention layers (one for self-attention and one for cross-attention with the encoder's output) and a feed-forward network.
Model Construction: The full Encoder-Decoder model is assembled by stacking the custom layers.
Custom Training: The model is trained using a custom learning rate schedule (CustomSchedule) and a custom loss function that masks out padding tokens, which is standard practice for NLP sequence tasks.
Inference: A basic translation function is included to demonstrate how to use the trained model to translate a new English sentence. It performs a simple greedy decoding to generate the Spanish output.

Architecture

TensorFlow & Keras: The project is built entirely with TensorFlow, using the Keras API to define custom layers and build the model.
Encoder-Decoder Structure: The model follows the classic Transformer architecture, with an encoder stack to process the input sequence and a decoder stack to generate the output sequence.
Attention Mechanism: The model's logic is driven by multi-head attention, including self-attention in the encoder and decoder, and cross-attention between the decoder and encoder.
Positional Embeddings: These are added to the word embeddings to provide the model with information about the order of tokens.

How to Run

Prerequisites

Make sure you have Python installed, along with the required libraries. You can install them using pip:

pip install tensorflow numpy matplotlib

Execution

To run the project, navigate to the project directory and execute the following command:

python advanced_transformer_machine_translation.py

The script will build the model, train it on the small dataset for a few epochs, and then print the translation results for a few sample sentences.

Concepts Covered

Transformer Architecture: The overall encoder-decoder structure.
Attention and Self-Attention: The core mechanism that allows the model to focus on relevant parts of the sequence.
Multi-Head Attention: The technique of running the attention mechanism in parallel to capture different types of relationships.
Positional Encoding: The method for incorporating sequence order into a non-recurrent model.
Sequence-to-Sequence (Seq2Seq) Tasks: The class of problems that involve transforming an input sequence into an output sequence (e.g., translation, summarization).
Tokenization and Text Vectorization: The process of converting raw text into a numerical format suitable for a neural network.
Masking: The technique of ignoring padding tokens in loss and attention calculations.

Files and Subdirectories

📄 advanced_transformer_machine_translation.py