Classify Audio using ANN

Audio Classification using ANN | Audio Classification Machine Learning

In this blog, we dive in and discuss Audio classification using ANN (Artificial Neural Network). Before jumping into the ANN model, First, you need to understand how to work with audio files. You might need to split or trim audio files based on the requirements. Follow this link to get an initial idea regarding how to split an audio file using python.

Read Audio files

The first step of the Audio classification using ANN is to import the required libraries and load.

import IPython.display as ipd
import librosa
% pylab inline
import os
import pandas as pd
import glob
import librosa.display


Audio Data Sampling

Extract data from audio file

Digital Signal represented by Sampling rate/Sampling frequency. Sampling rate/Sampling frequency is nothing but the number of samples of audio per second. It is measured in Hertz.

Let’s try to extract the sampling rate of audio data in python.

data,sampling_rate = librosa.load('recordings/Train/0_jackson_0.wav')
print('data', data)
sampling_rate: 22050
data [-0.01095867 -0.01327632 -0.01378769 ... 0.00736098 0.00378776 0. ]
data shape (14190,)

Plot audio data file

Try plotting audio data and sampling_rate to understand the pattern of the different audio signals. You will notice similar audio files have a similar graph.

% pylab inline
import os
import pandas as pd
import glob 
import librosa.display

plt.figure(figsize=(12, 4))

Audio classification using ANN

Prepare audio training data

Prepare training data containing file names and corresponding levels (digits) and use the same data for preparing the training dataset.

train = pd.read_csv('recordings/audio-number.csv')

    ID        Class
0 0_jackson_0.wav 0
1 0_jackson_1.wav 0
2 0_jackson_2.wav 0
3 0_jackson_3.wav 0
4 0_jackson_4.wav 0

Verify digits and the number of training data.



9 50
8 50
7 50
6 50
5 50
4 50
3 50
2 50
1 50
0 50
Name: Class, dtype: int64

Prepare audio features

In this stage, we will use the training data and generate Mel-frequency cepstral coefficients (MFCC) data.

MFCC is derived by taking the Fourier transform of a signal. MFCC is commonly used as a feature in speech processing. More information about MFCC can be found here.

def parser(row):
    file_name = os.path.join('recordings', 'Train', str(row.ID))
        X, sample_rate = librosa.load(file_name, res_type='kaiser_fast') 
        # extract mfcc feature from data
        mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T,axis=0) 
    except Exception as e:
        print("Error encountered while parsing file: ", file)
        return None, None

    feature = mfccs
    label = row.Class

    return [feature, label]

traningdata = train.apply(parser, axis=1)
traningdata.columns = ['feature', 'label']
feature label
0 [-301.09082, 193.94702, -16.108715, -1.6334546… 0
1 [-343.8217, 207.41052, -4.387587, 7.7349954, 2… 0
2 [-319.12198, 198.0649, 2.989768, 2.6435905, 22… 0
3 [-332.68658, 196.18948, -5.666796, 6.148818, 2… 0
4 [-333.14938, 206.82094, -7.1719756, 2.4563458,… 0

Create ANN Model

We have used ‘Keras‘ library for audio classification using ANN. Ensure the first layer input dimension is the same as parameter n_mfcc. In this case the value of n_mfcc = 40. The first layer has 256 nodes and uses the ‘relu‘ activation function. Similarly, the second layer has also 256 nodes, and the final layer has 10 nodes which are equal to the number of labels.

In this case, we have used ‘categorical_crossentropy’ as the loss function.

We have defined ‘adam‘ stochastic gradient descent algorithm as an optimizer and ‘accuracy‘ as the measure of the metrics. Follow the below steps to create a model for Audio classification using ANN.

import keras
import keras.utils
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.optimizers import Adam
from keras.utils import np_utils
from sklearn import metrics 
from sklearn.preprocessing import LabelEncoder

X = np.array(traningdata.feature.tolist())
y = np.array(traningdata.label.tolist())

lb = LabelEncoder()
y = np_utils.to_categorical(lb.fit_transform(y))

num_labels = y.shape[1]
filter_size = 2

# build model
model = Sequential()

model.add(Dense(256, input_shape=(40,)))



model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam'), y, batch_size=10, epochs=150, validation_split=0.2)

Evaluate the model:

model.evaluate(X_test, y_test, verbose=2)


[0.054617074131965634, 0.98]

Save the model to the disk:

Save the model so that the same model can be used in a later stage."Saved Models/model.h5")

Predict using the model on test data:

predictions = model.predict(X_test)


In this blog, we discussed how to split audio files, prepare audio data for training and Audio classification using ANN. Browse other blogs to learn more about Machine Learning and Data Science. Happy learning.

#Audio classification using ANN

Leave a Reply