You are here

Automatic MOOC video classification using transcript features and convolutional neural networks


Houssem Chatbri, Kevin McGuinness, Suzanne Little, Jiang Zhou, Keisuke Kameyama, Paul Kwan, Noel O'Connor

Publication Type: 
Refereed Conference Meeting Proceeding
The amount of MOOC video materials has grown exponentially in recent years. Therefore, their storage and analysis need to be made as fully automated as possible in order to maintain their management quality. In this work, we present a method for automatic topic classi€fication of MOOC videos using speech transcripts and convolutional neural networks (CNN). Our method works as follows: First, speech recognition is used to generate video transcripts. Then, the transcripts are converted into images using a statistical co-occurrence transformation that we designed. Finally, a CNN is used to produce video category labels for a transcript image input. For our data, we use the Khan Academy on a Stick dataset that contains 2,545 videos, where each video is labeled with one or two of 13 categories. Experiments show that our method is strongly competitive against other methods that are also based on transcript features and supervised learning.
Conference Name: 
ACM Multimedia 2017 - MultiEdTech Workshop
Proceedings of ACM Multimedia 2017 - MultiEdTech Workshop
Digital Object Identifer (DOI): 
Publication Date: 
Conference Location: 
United States of America
Research Group: 
Dublin City University (DCU)
Open access repository: