Genre Recognition of musical recording using Deep Learning

Submitted By: Daniel Lederman.

The project was supervised by Dr. Dan Feldman from the department of Computer Science and by Dr. Alon Schab from the department of Music of the University of Haifa with additional consultation from Dr. Dan Tidhar of the University of Cambridge. The project was developed as part of University of Haifa’s Musicological Lab initiative.

I would like to thank Alastair Porter from the Music Technology Group at the Universitat Pompeu Fabra in Barcelona, for providing me access to the ISMIR2004 genre dataset, which was essential to the project.

This project implements a Convolutional Neural Network based classifier for the task of Music Genre Recognition based on the work presented in “Costa, Y. M., Oliveira, L. S., & Silla Jr, C. N. (2017). An evaluation of convolutional neural networks for music classification using spectrograms. Applied soft computing, 52, 28-38”. The project also includes a proof-of-concept Genre Recognition and Genre based Music Recommendation App based on the classifier.

Music Genre Recognition is a task in MIR (Music Information Retrieval) research, whose goal is as follows: given a piece of music (in either symbolic form or audio form), and a set of possible musical genres, find to which genre does the piece belong to. Finding the genre of a piece can be useful for real-life tasks such as Genre based Music Recommendation and indexing of large untagged collections of music.

In short the classifier works as follows:

1.      Training phase:

a.      For each musical piece / file in the training set:

i.     A ~60 second long clip from a single channel around the one minute mark is selected (pieces shorter than 90 seconds are treated slightly differently).

ii.     A Short Time Fourier Transform is applied to the musical clip to create its spectrogram.

iii.     The spectrogram is divided into 50, ~1.2 second long segments.

b.      A CNN (Convolutional Neural Network) is trained to classify the individual spectrogram segments.

2.      Testing / inference phase:

a.      For each musical piece / file:

i.     Apply steps 1.a.i – 1.a.iii

ii.     Pass each segment through the CNN to obtain its class (genre) probability vector.

iii.     Average all the class probability vectors to get a final probability vector for the musical piece / file.

iv.     Predict the genre of the musical piece / file to be the genre that has the highest probability.

The project was implemented using the following tools:

  • Development languages: Python 2.7, Bash
  • Tools:
    • Caffe – deep learning framework
    • SoX – audio processing and visualization tool
    • Mp3Info – mp3 information extraction tool
  • IDE: JetBrains PyCharm
  • External python libraries: PyCaffe, NumPy, Pandas, Matplotlib, OpenCV2, lmdb

For more details, please see the project summary below.


ISMIR2004 test set evaluation log with accuracy report

Classifier’s confusion matrix on ISMIR2004 test set

Project summary


A Demo of a PoC App for Genre Recognition and Music Recommendation based on the CNN classifier.

For more information and source code, contact me at

Close Close