Building GPT architecture from scratch
Building GPT from Scratch: A project focused on constructing a Generative Pre-trained Transformer (GPT) model from the ground up. It covers the entire process, including model architecture, tokenization, data preparation, and training techniques, providing a comprehensive guide for those interested in understanding and building their own language models.
Technologies used:
- Python
- Pytorch
- Tensorflow
- Numpy
Approach:
- Learning about LLM (Large Language Models)
- Stages of building LLM
- Data preprocessing
- Cleaning and tokenizing text
- Transformer architecture
- Attention mechanisms (Multi Head Attention)
- Coding & Training Model
Skills I gained or Things I learned:
- Handling metrices and tensors
- Vector operations
- Data preprocessing
- Tokenization
- Transformers
- Attention types
- Normalization layers
- Dropouts
- GELU activation function
Mentor:
Although it seems easier, each concept holds a lot of knowledge and just because of Sir Raj Abhijit Dandekar of Vizuara Team, I am able to understand how the things actually work.
Resources:
Hand Written Notes Youtube Playlist Complete Code