506 shaares
This 2024 IOCC entry by Adrian Cable "implements an LLM inference engine in an impossibly minimal quantity of maximally incomprehensible C code". It can run llama2-7b in only 1750 bytes of C. That's basically a paragraph of C code to run an LLM 🤯
I was able to eventually convince ChatGPT that it actually works by asking it to de-obfuscate and explain the code.
I tried it on my 8-year-old Thinkpad (X1 Carbon 5th gen) with 16GB of RAM and got surprisingly good output at about 1 token/second!