簡單介紹拆分矩陣(讓cache能夠塞得下,實作方式如下圖

image.png

for size 384, we have

Screenshot 2025-08-19 at 9.12.48 PM.png

for size 1024, we have

Screenshot 2025-08-19 at 9.27.17 PM.png