Processing the labelled captions of the video