Thumbnail Image

Investigation of multimodal aspect based sentiment analysis using a Crossmodal Model

Williams, Jacob
Multimodal aspect-based sentiment analysis, the task of identifying a target aspect and obtaining its sentiment, has begun to gain more and more attention in the natural language processing community. Although the field started with simply focusing on textual data, there are many datasets such as Twitter 2015 and 2017 that require models to apply both textual and visual focuses. In this work, the model we propose is the Cross Modal Model (CMM). This model contains a BERT model and a CNN, which extract textual and visual features from the dataset, then obtaining the attention on features, and finally concatenating the features together to obtain the sentiment prediction. We saw significant performance gains with this model that achieve breakthrough results on the Twitter 2015 and Twitter 2017 datasets. These results demonstrate how useful our method could be applied to other multimodal datasets and potentially other multimodal problems.