Hey, I'm Pranav Deepak. Blue Morphism is a blog/digital garden themed around mathematics and code, which are things that I really enjoy. I hope to learn and get better. There will be some longer-spanning stuff as well as some random interesting stuff that is self-contained. Expect a lot of mistakes, a lack of polish, and remember to take everything here with a fat fistful of salt. And if you do find mistakes or want to talk in general, reach out at: bluemorphism@gmail.com, or join the Discord Blue Morphism server, or reach out on Twitter.
I've spent the past 2–3 years doing a bunch of math, deep learning, reinforcement learning, and so on. I've recently been in love with GPU kernels, and have found my north star problem: Can mathematics give a universal language where one can state the necessary and sufficient model of a reduced instruction set + GPU that is needed per operation, in a way that exposes a tractable search space towards faster kernels?
(Hint: PTX has probably hundreds, if not thousands, of instructions, but for a really fast matmul, one uses a few load/store and tensor core ops, which is significantly less complex to model than the whole instruction set.)
I have also collected a whole bunch of interesting math and code over this time, and intend to polish it, tie up loose ends, and throw it out there.