You are here

Dublin City University Participation in the VTT Track at TRECVid 2017


Haithem Afli andFeiyan Hu, Jinhua Du, Daniel Cosgrove, Kevin McGuinness, Noel O'Connor, Eric Arazo Sanchez, Jiang Zhou, Alan Smeaton

Publication Type: 
Refereed Conference Meeting Proceeding
Dublin City University participated in the video-to-text caption generation task in TRECVid and this paper describes the three approaches we took for our 4 submitted runs. The first approach is based on extracting regularly-spaced keyframes from a video, generating a text caption for each keyframe and then combining the keyframe captions into a single caption. The second approach is based on detecting image crops from those keyframes using saliency map to include as much of the attractive part of the image as possible, generating a caption for each crop in each keyframe, and combining the captions into one. The third approach is an end-to-end system, a true deep learning submission based on MS-COCO, an externally available set of training captions. The paper presents a description and the official results of each of the approaches.
Conference Name: 
TRECVid workshop
Proceedings of TRECVid workshop
Digital Object Identifer (DOI):
Publication Date: 
Conference Location: 
United States of America
Research Group: 
Dublin City University (DCU)
Open access repository: