Building GPT architecture from scratch

Building GPT from Scratch: A project focused on constructing a Generative Pre-trained Transformer (GPT) model from the ground up. It covers the entire process, including model architecture, tokenization, data preparation, and training techniques, providing a comprehensive guide for those interested in understanding and building their own language models.

Technologies used:

Python
Pytorch
Tensorflow
Numpy

Approach:

Learning about LLM (Large Language Models)
Stages of building LLM
Data preprocessing
Cleaning and tokenizing text
Transformer architecture
Attention mechanisms (Multi Head Attention)
Coding & Training Model

Skills I gained or Things I learned:

Handling metrices and tensors
Vector operations
Data preprocessing
Tokenization
Transformers
Attention types
Normalization layers
Dropouts
GELU activation function

Mentor:

Although it seems easier, each concept holds a lot of knowledge and just because of Sir Raj Abhijit Dandekar of Vizuara Team, I am able to understand how the things actually work.

Prasad Ashok Chavan

Building GPT architecture from scratch

Technologies used:

Approach:

Skills I gained or Things I learned:

Mentor:

Resources:

Feel free to contribute :)

Share on