GPT-2 Architecture
Create GPT 2 from scratch
GPT-2 is the transformer-based autoregressive model create by OpenAI and I implement from scratch. I implement 768 dim and around 117 mellion param model with inference and traning loop.
Using GPT-2 Paper created tranformer architecture and with multi head attention split.