Building GPT architecture from scratch

Building GPT from Scratch: A project focused on constructing a Generative Pre-trained Transformer (GPT) model from the ground up. It covers the entire process, including model architecture, tokenization, data preparation, and training techniques, providing a comprehensive guide for those interested in understanding and building their own language models.

Technologies used:

  • Python
  • Pytorch
  • Tensorflow
  • Numpy

Approach:

  • Learning about LLM (Large Language Models)
  • Stages of building LLM
  • Data preprocessing
  • Cleaning and tokenizing text
  • Transformer architecture
  • Attention mechanisms (Multi Head Attention)
  • Coding & Training Model

Skills I gained or Things I learned:

  • Handling metrices and tensors
  • Vector operations
  • Data preprocessing
  • Tokenization
  • Transformers
  • Attention types
  • Normalization layers
  • Dropouts
  • GELU activation function

Mentor:

Although it seems easier, each concept holds a lot of knowledge and just because of Sir Raj Abhijit Dandekar of Vizuara Team, I am able to understand how the things actually work.

Resources:

Hand Written Notes Youtube Playlist Complete Code

Feel free to contribute :)