Despite being mathematical systems, deep learning models are not systematically approached. Proper analysis requires various abstractions. Mathematically, we are interested in how functions feed into each other, how they are parallelized, and how simple linear operations can be rearranged. Practically, we are interested in the resource costs given some mathematical goal, allowing us to find the optimal execution strategy on parallelized GPU hardware. Typical approaches struggle to consider these different lenses.
Category theory's tools for studying abstractions allows us to relate these approaches. Furthermore, category theory allows us to develop a rigorous diagrammatic language which reflects these abstractions. Our methods have successfully expressed a variety of models in all their detail, exactly describing the constituent functions and associated parallelized and linearity properties. Additionally, In FlashAttention on a Napkin, we used diagrams to quickly derive optimized execution strategies and performance models. In contrast, typical methods take years of laborious research to derive these methods.
This research opens many branches for further work at the intersection of category theory and the practical aspects of deep learning design, including optimizing resource usage. So far, we have published FlashAttention on a Napkin in addition to Vincent Abbott's previous works.
Future work will encompass: