Self-Attention

Self-attention is a mechanism that allows a model to focus on different parts of the input sequence when processing it. Instead of treating each element in the sequence independently, self-attention calculates a weighted sum of all elements, where the weights are determined by the relationships between the elements themselves. This enables the model to capture dependencies and contextual information within the input sequence, regardless of their distance from each other.

Visit the following resources to learn more:

@article@What is self-attention?
@video@Why is Self Attention called "Self"? | Self Attention Vs Luong Attention in Depth Lecture