We Hit 10,500 Tokens/Sec on B200

Technical deep-dive: custom CUDA kernels + speculative execution for 2.3x speedup

Tejas Bhakta
Tejas Bhakta
September 15, 20254 min read
We Hit 10,500 Tokens/Sec on B200