Introduction
Memory attention is attention over an external memory module that stores information. This extends the Transformer's attention to access large external stores of knowledge, similar to how computer memory stores data that can be accessed via addressing.
Memory-Augmented Neural Networks
Introduced in Memory Networks (MemN2N) and Neural Turing Machines:
Read: Attention(query, Memory_keys, Memory_values)
Output: Weighted combination of memory values
Output: Weighted combination of memory values
Different Types of Memory
1. Key-Value Memory
Keys K_mem, Values V_mem
Attention(query, K_mem) → weights
Output = Σ weights × V_mem
Attention(query, K_mem) → weights
Output = Σ weights × V_mem
2. Content-Based Memory
Memory addressed by content similarity.
3. Addressable Memory
Both content-based and location-based addressing.
Transformer as Memory
Standard Transformer can be viewed as having memory:
Previous tokens act as "memory" for current token
Self-attention: query attends to previous token keys/values
Self-attention: query attends to previous token keys/values
External Memory in Models
- BERT attention: Tokens attend to all other tokens in sequence
- Retrieval models: Query attends to external document store
- Memory models: Dedicated memory modules updated over time