This write-up explores the intersection of computer vision and natural language processing (NLP), specifically how attention mechanisms bridge the gap between seeing and describing. 👁️ Core Concept: The Bridge
Assigns weights to different image regions. Attention and Vision in Language Processing
The weighted sum of visual features used to inform the word choice. 📈 Evolution of Techniques This write-up explores the intersection of computer vision
Picks one specific region to focus on. It is non-differentiable and requires Reinforcement Learning (Policy Gradient). Attention and Vision in Language Processing
A global approach where every pixel gets a weight. It is differentiable and easy to train via backpropagation.