/blog/from-common-crawl-to-the-base-model/
https://gabrielcaiana.com/blog/from-common-crawl-to-the-base-model-how-an-llm-learns-language/