Back to Developer Roadmap

Multi-Head Attention

src/data/roadmaps/machine-learning/content/[email protected]

4.0810 B
Original Source

Multi-Head Attention

Multi-head attention is an attention mechanism that runs through the attention process multiple times independently. Each of these independent attention mechanisms is called a "head." The outputs of all the heads are then concatenated and linearly transformed to produce the final output. This allows the model to attend to different parts of the input sequence with different learned representations, capturing a richer set of relationships than a single attention mechanism could.

Visit the following resources to learn more: