![abhishek on X: "In the forward function, we apply the formula for self-attention. softmax(Q.K´/ dim(k))V. torch.bmm does matrix multiplication of batches. dim(k) is the sqrt of k. Please note: q, k, v ( abhishek on X: "In the forward function, we apply the formula for self-attention. softmax(Q.K´/ dim(k))V. torch.bmm does matrix multiplication of batches. dim(k) is the sqrt of k. Please note: q, k, v (](https://pbs.twimg.com/media/FGfroicWQAAiIf1.jpg)
abhishek on X: "In the forward function, we apply the formula for self-attention. softmax(Q.K´/ dim(k))V. torch.bmm does matrix multiplication of batches. dim(k) is the sqrt of k. Please note: q, k, v (
![PyTorch Basics Part 2. This is the Second part of the PyTorch… | by saketh-saraswathi | Chatbots Life PyTorch Basics Part 2. This is the Second part of the PyTorch… | by saketh-saraswathi | Chatbots Life](https://miro.medium.com/v2/resize:fit:1400/1*_ikG6rs8ac35r6X9afehpQ.jpeg)
PyTorch Basics Part 2. This is the Second part of the PyTorch… | by saketh-saraswathi | Chatbots Life
![Understanding einsum for Deep learning: implement a transformer with multi-head self-attention from scratch | AI Summer Understanding einsum for Deep learning: implement a transformer with multi-head self-attention from scratch | AI Summer](https://theaisummer.com/static/4cc18938d1acf254e759f2e2870e9964/ee604/einsum-attention.png)
Understanding einsum for Deep learning: implement a transformer with multi-head self-attention from scratch | AI Summer
![Diagram] How to use torch.gather() Function in PyTorch with Examples - MLK - Machine Learning Knowledge Diagram] How to use torch.gather() Function in PyTorch with Examples - MLK - Machine Learning Knowledge](https://machinelearningknowledge.ai/wp-content/uploads/2022/11/torch.gather-with-Dim0-Example-1.jpg)