🥭

Mango Encyclopedia

Comprehensive Guide to Attention Mechanisms in Deep Learning

Welcome to Mango

This encyclopedia provides in-depth explanations of all attention mechanisms used in modern deep learning, particularly in Transformer architectures. Each topic includes detailed explanations, mathematical formulations, implementation insights, and test questions to reinforce learning.

Total Topics: 72 | Format: Mobile-friendly HTML

Core Attention Mechanisms (1-15)

01
Attention Motivation
02
Encoder-Decoder Attention
03
Additive (Bahdanau) Attention
04
Multiplicative (Luong) Attention
05
Query-Key-Value Mechanism
06
Alignment Scores
07
Context Vectors
08
Soft Attention
09
Hard Attention
10
Self-Attention
11
Scaled Dot-Product Attention
12
Multi-Head Attention
13
Positional Encoding
14
Learned Positional Embeddings
15
Relative Positional Encoding

Positional Encodings & Advanced Attention (16-30)

16
Rotary Position Embeddings (RoPE)
17
ALiBi Attention Bias
18
Transformer Encoder
19
Transformer Decoder
20
Masked Self-Attention
21
Cross-Attention
22
Feed-Forward Layers
23
Residual Connections
24
Layer Normalization
25
Causal Masking
26
Padding Masking
27
Attention Masks
28
Sparse Attention
29
Local Attention
30
Global Attention

Efficient & Vision Attention (31-45)

31
Sliding Window Attention
32
Block Attention
33
Longformer Attention
34
BigBird Attention
35
Linear Attention
36
Kernelized Attention
37
Performer Attention
38
Low-Rank Attention
39
FlashAttention
40
Memory-Efficient Attention
41
Vision Attention
42
Vision Transformer (ViT)
43
Window Attention
44
Swin Transformer Attention
45
Deformable Attention

Cross-Modal & Specialized Attention (46-60)

46
Cross-Modal Attention
47
Image-Text Attention
48
Audio-Text Attention
49
Video Attention
50
Retrieval Attention
51
Memory Attention
52
Chunked Attention
53
Ring Attention
54
KV Cache
55
Grouped Query Attention (GQA)
56
Multi-Query Attention (MQA)
57
Attention Visualization
58
Attention Interpretability
59
Attention Rollout
60
Quadratic Complexity Problem

Advanced Architectures & Future Directions (61-72)

61
Long-Context Attention Issues
62
Lost-in-the-Middle Problem
63
Hierarchical Attention
64
Graph Attention Networks
65
Reinforcement Learning Attention
66
Mixture-of-Experts Attention
67
Transformer Alternatives
68
State Space Models (Mamba)
69
Hybrid Attention Architectures
70
Multimodal Transformers
71
Diffusion Transformer Attention
72
Agentic Memory Attention