Inside LLMs — Series | Gabriel Caiana

1
From Common Crawl to the Base Model: How an LLM Learns Language
How an LLM goes from raw web data to the base model: the Transformer, Common Crawl, FineWeb, tokenization and pre-training explained clearly, no hype.

Jul 2, 2026 · 12 min read
2
From Base Model to ChatGPT: SFT, Tools and Reinforcement Learning
How an LLM base model becomes ChatGPT: supervised fine-tuning, tool use, hallucination reduction and reinforcement learning explained clearly and in depth.

Jul 2, 2026 · 12 min read