Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yep! Think of LORA for network fine tuning. Monarch (linked above) uses lots of block diagonality. These ideas also make flash attention flash.

I haven't seen banded matrices as much, though (with weight sharing) they're just convolutions. One nice feature of block diagonality is that you can express it as batched matrix multiplication, reusing all the existing matmul kernels.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: