Implements a character-level GPT-style Transformer:
- model.py: CausalSelfAttention, FeedForward, TransformerBlock, LLM
- tokenizer.py: CharTokenizer (char -> int mapping)
- train.py: training loop with AdamW, gradient clipping, checkpointing, sampling
- generate.py: load checkpoint and generate text from a prompt
Verified working on a built-in Shakespeare excerpt (805k param model).
https://claude.ai/code/session_01SWXLQb3nFTiygbp74dpjVa