| Svelte Hacker News

points by vrdesai a year ago

Hi all,

I spent a couple months earlier this year learning about GPU programming through trying to optimize inference for a robotics paper from last year (https://diffusion-policy.cs.columbia.edu/). I consume so much content from helpful blog posts on the internet and wanted to contribute with my own series detailing all my learnings in speeding up inference ~3.4x over native Pytorch eager mode. Code for all posts can be found here - https://github.com/vdesai2014/inference-optimization-blog-po...

I work up from GPU architecture to higher level details like profiling Pytorch, optimizing CUDA kernels, and integrating them with CUDA graphs into Pytorch. Hope it's helpful for anyone interested in this space!