We propose to learn a visual embedding of images in the emoji space using a massive dataset of weakly labeled photos from Twitter. We demonstrate that the learned embedding generalizes much better to visual sentiment analysis tasks when compared to the commonly used object-based representations (e.g. learned from ImageNet). Our model leads to state-of-the-art performance on visual sentiment analysis and fine-grained visual emotion predictions. Additionally, our model enables the novel task of zero-shot visual sentiment learning.
Due to the lack of large-scale datasets, the prevailing approach in visual sentiment analysis is to leverage models trained for object classification in large datasets like ImageNet. However, objects are sentiment neutral which hinders the expected gain of transfer learning for such tasks. In this work, we propose to overcome this problem by learning a novel sentiment-aligned image embedding that is better suited for subsequent visual sentiment analysis. Our embedding leverages the intricate relation between emojis and images in large-scale and readily available data from social media. Emojis are language-agnostic, consistent, and carry a clear sentiment signal which make them an excellent proxy to learn a sentiment aligned embedding. Hence, we construct a novel dataset of 4 million images collected from Twitter with their associated emojis. We train a deep neural model for image embedding using emoji prediction task as a proxy. Our evaluation demonstrates that the proposed embedding outperforms the popular object-based counterpart consistently across several sentiment analysis benchmarks. Furthermore, without bell and whistles, our compact, effective and simple embedding outperforms the more elaborate and customized state-of-the-art deep models on these public benchmarks. Additionally, we introduce a novel emoji representation based on their visual emotional response which support a deeper understanding of the emoji modality and their usage on social media.
Smile, Be Happy :) Emoji Embedding for Visual Sentiment Analysis
Ziad Al-Halah, Andrew Aitken, Wenzhe Shi and Jose Caballero
IEEE International Conference on Computer Vision Workshops (ICCV), October 2019.
[paper]
[supplementary]
[arXiv]
@inproceedings{al-halah2020,
title={Smile, Be Happy :) Emoji Embedding for Visual Sentiment Analysis},
author={Ziad Al-Halah and Andrew Aitken and Wenzhe Shi and Jose Caballero},
booktitle = {IEEE International Conference on Computer Vision Workshops},
arxivId = {1907.06160},
year={2019}
}
Given an image, our model predicts emojis that are aligned with the sentiment and the emotions conveyed in the image. In contrast to an object-based representation (e.g. learned from ImageNet) where an image sentiment is scored based on the object type (e.g. all dog images are positive), our model captures the overall context and produces diverse outputs even for images of the same category, as seen for the dogs (first row) and cars (second row) images.
Our model learns a unique signature in the emotional space for each emoji based on our learned embedding. Click on the dropdown lists to select different emojis and see their respective emotional signature. Curves with a green color code show a positive sentiment while those in blue commute a negative sentiment in general.
The following projection shows the similarity between the emojis in the emotion space as learned by our model embedding and using t-SNE.
you can find quantitative results on visual sentiment analysis, fine-grained emotion recognition, zero-shot sentiment learning, and more interesting findings in our paper.