Выбор последовательных функций для звуковых функций

В этом примере используются:

В этом примере показан типичный рабочий процесс для выбора функций, применяемых к задаче распознавания речевых цифр.

При последовательном выборе функций необходимо обучить сеть заданному набору функций, а затем постепенно добавлять или удалять функции до тех пор, пока не будет достигнута наивысшая точность [1]. В этом примере к задаче распознавания речевых цифр применяется последовательный прямой выбор с помощью набора данных Free Spoke Digit [2].

Потоковое распознавание речевых цифр

Чтобы мотивировать пример, начните с загрузки предварительно обученной сети, audioFeatureExtractor Объект (Audio Toolbox), используемый для обучения сети, и коэффициенты нормализации для элементов.

load('network_Audio_SequentialFeatureSelection.mat','bestNet','afe','normalizers');

Создание audioDeviceReader (Audio Toolbox) для считывания звука с микрофона. Создать три dsp.AsyncBuffer (DSP System Toolbox) объекты: один для буферизации звука, считанного с микрофона, один для буферизации кратковременной энергии входного звука для обнаружения речи и один для буферизации предсказаний.

fs = afe.SampleRate;

deviceReader = audioDeviceReader('SampleRate',fs,'SamplesPerFrame',256);

audioBuffer = dsp.AsyncBuffer(fs*3);
steBuffer = dsp.AsyncBuffer(1000);
predictionBuffer = dsp.AsyncBuffer(5);

Создайте график для отображения потокового звука, вероятности вывода сети во время вывода и прогнозирования.

fig = figure;

streamAxes = subplot(3,1,1);
streamPlot = plot(zeros(fs,1));
ylabel('Amplitude')
xlabel('Time (s)')
title('Audio Stream')
streamAxes.XTick = [0,fs];
streamAxes.XTickLabel = [0,1];
streamAxes.YLim = [-1,1];

analyzedAxes = subplot(3,1,2);
analyzedPlot = plot(zeros(fs/2,1));
title('Analyzed Segment')
ylabel('Amplitude')
xlabel('Time (s)')
set(gca,'XTickLabel',[])
analyzedAxes.XTick = [0,fs/2];
analyzedAxes.XTickLabel = [0,0.5];
analyzedAxes.YLim = [-1,1];

probabilityAxes = subplot(3,1,3);
probabilityPlot = bar(0:9,0.1*ones(1,10));
axis([-1,10,0,1])
ylabel('Probability')
xlabel('Class')

Выполните потоковое распознавание цифр (цифры от 0 до 9) в течение 20 секунд. Во время цикла произносите одну из цифр и проверьте ее точность.

Во-первых, определите кратковременный порог энергии, при котором предполагается, что сигнал не содержит речи.

steThreshold = 0.015;
idxVec = 1:fs;
tic
while toc < 20
    
    % Read in a frame of audio from your device.
    audioIn = deviceReader();
    
    % Write the audio into a the buffer.
    write(audioBuffer,audioIn);
    
    % While 200 ms of data is unused, continue this loop.
    while audioBuffer.NumUnreadSamples > 0.2*fs
        
        % Read 1 second from the audio buffer. Of that 1 second, 800 ms
        % is rereading old data and 200 ms is new data.
        audioToAnalyze = read(audioBuffer,fs,0.8*fs);
        
        % Update the figure to plot the current audio data.
        streamPlot.YData = audioToAnalyze;

        ste = mean(abs(audioToAnalyze));
        write(steBuffer,ste);
        if steBuffer.NumUnreadSamples > 5
            abc = sort(peek(steBuffer));
            steThreshold = abc(round(0.4*numel(abc)));
        end
        if ste > steThreshold
            
            % Use the detectSpeeech function to determine if a region of speech
            % is present.
            idx = detectSpeech(audioToAnalyze,fs);
            
            % If a region of speech is present, perform the following.
            if ~isempty(idx)
                % Zero out all parts of the signal except the speech
                % region, and trim to 0.5 seconds.
                audioToAnalyze = HelperTrimOrPad(audioToAnalyze(idx(1,1):idx(1,2)),fs/2);
                
                % Normalize the audio.
                audioToAnalyze = audioToAnalyze/max(abs(audioToAnalyze));
                
                % Update the analyzed segment plot
                analyzedPlot.YData = audioToAnalyze;

                % Extract the features and transpose them so that time is
                % across columns.
                features = (extract(afe,audioToAnalyze))';

                % Normalize the features.
                features = (features - normalizers.Mean) ./ normalizers.StandardDeviation;
                
                % Call classify to determine the probabilities and the
                % winning label.
                features(isnan(features)) = 0;
                [label,probs] = classify(bestNet,features);
                
                % Update the plot with the probabilities and the winning
                % label.
                probabilityPlot.YData = probs;
                write(predictionBuffer,probs);

                if predictionBuffer.NumUnreadSamples == predictionBuffer.Capacity
                    lastTen = peek(predictionBuffer);
                    [~,decision] = max(mean(lastTen.*hann(size(lastTen,1)),1));
                    probabilityAxes.Title.String = num2str(decision-1);
                end
            end
        else
            % If the signal energy is below the threshold, assume no speech
            % detected.
             probabilityAxes.Title.String = '';
             probabilityPlot.YData = 0.1*ones(10,1);
             analyzedPlot.YData = zeros(fs/2,1);
             reset(predictionBuffer)
        end
        
        drawnow limitrate
    end
end

Остальная часть примера иллюстрирует, как была обучена сеть, используемая при обнаружении потоковой передачи, и как были выбраны функции, подаваемые в сеть.

Создание наборов данных поезда и проверки

Загрузите набор данных свободной речевой цифры (FSDD) [2]. FSDD состоит из коротких аудиофайлов с произносимыми цифрами (0-9).

url = "https://zenodo.org/record/1342401/files/Jakobovski/free-spoken-digit-dataset-v1.0.8.zip";
downloadFolder = tempdir;
datasetFolder = fullfile(downloadFolder,'FSDD');

if ~exist(datasetFolder,'dir')
    fprintf('Downloading Free Spoken Digit Dataset ...\n')
    unzip(url,datasetFolder)
end

Создание audioDatastore (Audio Toolbox) для указания на записи. Получение частоты дискретизации набора данных.

ads = audioDatastore(datasetFolder,'IncludeSubfolders',true);
[~,adsInfo] = read(ads);
fs = adsInfo.SampleRate;

Первым элементом имен файлов является цифра, произносимая в файле. Получить первый элемент имен файлов, преобразовать их в категориальные, а затем установить Labels имущества audioDatastore.

[~,filenames] = cellfun(@(x)fileparts(x),ads.Files,'UniformOutput',false);
ads.Labels = categorical(string(cellfun(@(x)x(1),filenames)));

Чтобы разделить хранилище данных на набор разработок и набор проверки, используйте splitEachLabel (Панель звуковых инструментов). Распределите 80% данных для разработки и оставшиеся 20% для проверки.

[adsTrain,adsValidation] = splitEachLabel(ads,0.8);

Настройка экстрактора аудиофайлов

Создание audioFeatureExtractor (Audio Toolbox) объект для извлечения звуковых функций в течение 30 мс окна с частотой обновления 10 мс. Установите все функции, которые вы хотите протестировать в этом примере, чтобы true.

win = hamming(round(0.03*fs),"periodic");
overlapLength = round(0.02*fs);

afe = audioFeatureExtractor( ...
    'Window',       win, ...
    'OverlapLength',overlapLength, ...
    'SampleRate',   fs, ...
    ...
    'linearSpectrum',      false, ...
    'melSpectrum',         false, ...
    'barkSpectrum',        false, ...
    'erbSpectrum',         false, ...
    ...
    'mfcc',                true, ...
    'mfccDelta',           true, ...
    'mfccDeltaDelta',      true, ...
    'gtcc',                true, ...
    'gtccDelta',           true, ...
    'gtccDeltaDelta',      true, ...
    ...
    'spectralCentroid',    true, ...
    'spectralCrest',       true, ...
    'spectralDecrease',    true, ...
    'spectralEntropy',     true, ...
    'spectralFlatness',    true, ...
    'spectralFlux',        true, ...
    'spectralKurtosis',    true, ...
    'spectralRolloffPoint',true, ...
    'spectralSkewness',    true, ...
    'spectralSlope',       true, ...
    'spectralSpread',      true, ...
    ...
    'pitch',               false, ...
    'harmonicRatio',       false);

Определение слоев и параметров обучения

Определение списка уровней глубокого обучения и trainingOptions используется в этом примере. Первый слой, sequenceInputLayer, это просто местозаполнитель. В зависимости от того, какие элементы тестируются при последовательном выборе элементов, первый слой заменяется на sequenceInputLayer соответствующего размера.

numUnits = 100;
layers = [ ...
    sequenceInputLayer(1)
    bilstmLayer(numUnits,"OutputMode","last")
    fullyConnectedLayer(numel(categories(adsTrain.Labels)))
    softmaxLayer
    classificationLayer];

options = trainingOptions("adam", ...
    "LearnRateSchedule","piecewise", ...
    "Shuffle","every-epoch", ...
    "Verbose",false, ...
    "MaxEpochs",20);

Выбор последовательного элемента

В основной форме последовательного выбора функций вы обучаете сеть данному набору функций, а затем постепенно добавляете или удаляете функции до тех пор, пока точность не улучшится [1].

Прямой выбор

Рассмотрим простой случай прямого выбора для набора из четырех функций. В первом цикле прямого выбора каждая из четырех функций тестируется независимо путем обучения сети и сравнения их точности проверки. Отмечается особенность, которая привела к наивысшей точности проверки. Во втором контуре выбора вперед лучший элемент из первого контура комбинируется с каждым из остальных элементов. Теперь каждая пара функций используется для тренировок. Если точность во втором цикле не улучшилась по сравнению с точностью в первом цикле, процесс выбора заканчивается. В противном случае выбирается новый набор лучших элементов. Цикл прямого выбора продолжается до тех пор, пока точность больше не улучшится.

Обратный выбор

При обратном выборе элементов начинается с обучения набору элементов, состоящему из всех элементов, и проверяется, повышается ли точность при удалении элементов.

Последовательно выполнять выбор элементов

Вспомогательные функции (HelperSFS, HelperTrainAndValidateNetwork и HelperTrimOrPad) реализуют прямой или обратный последовательный выбор функций. Укажите хранилище данных обучения, хранилище данных проверки, экстрактор звуковых функций, сетевые уровни, сетевые параметры и направление. Как правило, выберите вперед, если вы ожидаете небольшой набор элементов, или назад, если вы ожидаете большой набор элементов.

direction = 'forward';
[logbook,bestFeatures,bestNet,normalizers] = HelperSFS(adsTrain,adsValidation,afe,layers,options,direction);

Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 6).

logbook вывод из HelperFeatureExtractor - таблица, содержащая все проверенные конфигурации элементов и соответствующую точность проверки.

logbook

logbook=48×2 table
                 Features                 Accuracy
    __________________________________    ________

    "mfcc, gtcc"                           97.333 
    "mfcc, mfccDelta, gtcc"                    97 
    "mfcc, gtcc, spectralEntropy"              97 
    "mfcc, gtcc, spectralFlatness"             97 
    "mfcc, gtcc, spectralFlux"                 97 
    "mfcc, gtcc, spectralSpread"               97 
    "gtcc"                                 96.667 
    "gtcc, spectralCentroid"               96.667 
    "gtcc, spectralFlux"                   96.667 
    "mfcc, gtcc, spectralRolloffPoint"     96.667 
    "mfcc, gtcc, spectralSkewness"         96.667 
    "gtcc, spectralEntropy"                96.333 
    "mfcc, gtcc, gtccDeltaDelta"           96.333 
    "mfcc, gtcc, spectralKurtosis"         96.333 
    "mfccDelta, gtcc"                          96 
    "gtcc, gtccDelta"                          96 
      ⋮

bestFeatures вывод из HelperSFS содержит структуру с оптимальными элементами, заданными как true.

bestFeatures

bestFeatures = struct with fields:
                    mfcc: 1
               mfccDelta: 0
          mfccDeltaDelta: 0
                    gtcc: 1
               gtccDelta: 0
          gtccDeltaDelta: 0
        spectralCentroid: 0
           spectralCrest: 0
        spectralDecrease: 0
         spectralEntropy: 0
        spectralFlatness: 0
            spectralFlux: 0
        spectralKurtosis: 0
    spectralRolloffPoint: 0
        spectralSkewness: 0
           spectralSlope: 0
          spectralSpread: 0

Вы можете установить audioFeatureExtractor с использованием структуры.

set(afe,bestFeatures)
afe

afe = 
  audioFeatureExtractor with properties:

   Properties
                     Window: [240×1 double]
              OverlapLength: 160
                 SampleRate: 8000
                  FFTLength: []
    SpectralDescriptorInput: 'linearSpectrum'

   Enabled Features
     mfcc, gtcc

   Disabled Features
     linearSpectrum, melSpectrum, barkSpectrum, erbSpectrum, mfccDelta, mfccDeltaDelta
     gtccDelta, gtccDeltaDelta, spectralCentroid, spectralCrest, spectralDecrease, spectralEntropy
     spectralFlatness, spectralFlux, spectralKurtosis, spectralRolloffPoint, spectralSkewness, spectralSlope
     spectralSpread, pitch, harmonicRatio


   To extract a feature, set the corresponding property to true.
   For example, obj.mfcc = true, adds mfcc to the list of enabled features.

HelperSFS также выводит наилучшую производительность сети и коэффициенты нормализации, которые соответствуют выбранным функциям. Для сохранения сети настроено audioFeatureExtractorи факторы нормализации, раскомментируют эту строку:

% save('network_Audio_SequentialFeatureSelection.mat','bestNet','afe','normalizers')

Заключение

Этот пример иллюстрирует рабочий процесс для последовательного выбора функций для рекуррентной нейронной сети (LSTM или BiLSTM). Его можно легко адаптировать для рабочих процессов CNN и RNN-CNN.

Вспомогательные функции

HelperTrainAndValidateNetwork

function [trueLabels,predictedLabels,net,normalizers] = HelperTrainAndValidateNetwork(adsTrain,adsValidation,afe,layers,options)
% Train and validate a network.
%
%   INPUTS:
%   adsTrain      - audioDatastore object that points to training set
%   adsValidation - audioDatastore object that points to validation set
%   afe           - audioFeatureExtractor object.
%   layers        - Layers of LSTM or BiLSTM network
%   options       - trainingOptions object
%
%   OUTPUTS:
%   trueLabels      - true labels of validation set
%   predictedLabels - predicted labels of validation set
%   net             - trained network
%   normalizers     - normalization factors for features under test

% Copyright 2019 The MathWorks, Inc.

% Convert the data to tall arrays.
tallTrain      = tall(adsTrain);
tallValidation = tall(adsValidation);

% Extract features from the training set. Reorient the features so that
% time is along rows to be compatible with sequenceInputLayer.
fs = afe.SampleRate;
tallTrain         = cellfun(@(x)HelperTrimOrPad(x,fs/2),tallTrain,"UniformOutput",false);
tallTrain         = cellfun(@(x)x/max(abs(x),[],'all'),tallTrain,"UniformOutput",false);
tallFeaturesTrain = cellfun(@(x)extract(afe,x),tallTrain,"UniformOutput",false);
tallFeaturesTrain = cellfun(@(x)x',tallFeaturesTrain,"UniformOutput",false);  %#ok<NASGU>
[~,featuresTrain] = evalc('gather(tallFeaturesTrain)'); % Use evalc to suppress command-line output.

tallValidation         = cellfun(@(x)HelperTrimOrPad(x,fs/2),tallValidation,"UniformOutput",false);
tallValidation         = cellfun(@(x)x/max(abs(x),[],'all'),tallValidation,"UniformOutput",false);
tallFeaturesValidation = cellfun(@(x)extract(afe,x),tallValidation,"UniformOutput",false);
tallFeaturesValidation = cellfun(@(x)x',tallFeaturesValidation,"UniformOutput",false); %#ok<NASGU>
[~,featuresValidation] = evalc('gather(tallFeaturesValidation)'); % Use evalc to suppress command-line output.

% Use the training set to determine the mean and standard deviation of each
% feature. Normalize the training and validation sets.
allFeatures = cat(2,featuresTrain{:});
M = mean(allFeatures,2,'omitnan');
S = std(allFeatures,0,2,'omitnan');
featuresTrain = cellfun(@(x)(x-M)./S,featuresTrain,'UniformOutput',false);
for ii = 1:numel(featuresTrain)
    idx = find(isnan(featuresTrain{ii}));
    if ~isempty(idx)
        featuresTrain{ii}(idx) = 0;
    end
end
featuresValidation = cellfun(@(x)(x-M)./S,featuresValidation,'UniformOutput',false);
for ii = 1:numel(featuresValidation)
    idx = find(isnan(featuresValidation{ii}));
    if ~isempty(idx)
        featuresValidation{ii}(idx) = 0;
    end
end

% Replicate the labels of the train and validation sets so that they are in
% one-to-one correspondence with the sequences.
labelsTrain = adsTrain.Labels;

% Update input layer for the number of features under test.
layers(1) = sequenceInputLayer(size(featuresTrain{1},1));

% Train the network.
net = trainNetwork(featuresTrain,labelsTrain,layers,options);

% Evaluate the network. Call classify to get the predicted labels for each
% sequence.
predictedLabels = classify(net,featuresValidation);
trueLabels = adsValidation.Labels;

% Save the normalization factors as a struct.
normalizers.Mean = M;
normalizers.StandardDeviation = S;
end

HelperSFS

function [logbook,bestFeatures,bestNet,bestNormalizers] = HelperSFS(adsTrain,adsValidate,afeThis,layers,options,direction)
%
%   INPUTS:
%   adsTrain    - audioDatastore object that points to training set
%   adsValidate - audioDatastore object that points to validation set
%   afe         - audioFeatureExtractor object. Set all features to test to true
%   layers      - Layers of LSTM or BiLSTM network
%   options     - trainingOptions object
%   direction   - SFS direction, specify as 'forward' or 'backward'
%
%   OUTPUTS:
%   logbook         - table containing feature configurations tested and corresponding validation accuracies
%   bestFeatures    - struct containg best feature configuration
%   bestNet         - Trained network with highest validation accuracy
%   bestNormalizers - Feature normalization factors for best features

% Copyright 2019 The MathWorks, Inc.

afe = copy(afeThis);
featuresToTest = fieldnames(info(afe));
N = numel(featuresToTest);
bestValidationAccuracy = 0;

% Set the initial feature configuration: all on for backward selection
% or all off for forward selection.
featureConfig = info(afe);
for i = 1:N
    if strcmpi(direction,"backward")
        featureConfig.(featuresToTest{i}) = true;
    else
        featureConfig.(featuresToTest{i}) = false;
    end
end

% Initialize logbook to track feature configuration and accuracy.
logbook = table(featureConfig,0,'VariableNames',["Feature Configuration","Accuracy"]);

% Perform sequential feature evaluation.
wrapperIdx = 1;
bestAccuracy = 0;
while wrapperIdx <= N
    % Create a cell array containing all feature configurations to test
    % in the current loop.
    featureConfigsToTest = cell(numel(featuresToTest),1);
    for ii = 1:numel(featuresToTest)
        if strcmpi(direction,"backward")
            featureConfig.(featuresToTest{ii}) = false;
        else
            featureConfig.(featuresToTest{ii}) = true;
        end
        featureConfigsToTest{ii} = featureConfig;
        if strcmpi(direction,"backward")
            featureConfig.(featuresToTest{ii}) = true;
        else
            featureConfig.(featuresToTest{ii}) = false;
        end
    end

    % Loop over every feature set.
    for ii = 1:numel(featureConfigsToTest)

        % Determine the current feature configuration to test. Update
        % the feature afe.
        currentConfig = featureConfigsToTest{ii};
        set(afe,currentConfig)

        % Train and get k-fold cross-validation accuracy for current
        % feature configuration.
        [trueLabels,predictedLabels,net,normalizers] = HelperTrainAndValidateNetwork(adsTrain,adsValidate,afe,layers,options);
        valAccuracy = mean(trueLabels==predictedLabels)*100;
        if valAccuracy > bestValidationAccuracy
            bestValidationAccuracy = valAccuracy;
            bestNet = net;
            bestNormalizers = normalizers;
        end

        % Update Logbook
        result = table(currentConfig,valAccuracy,'VariableNames',["Feature Configuration","Accuracy"]);
        logbook = [logbook;result]; %#ok<AGROW> 

    end

    % Determine and print the setting with the best accuracy. If accuracy
    % did not improve, end the run.
    [a,b] = max(logbook{:,'Accuracy'});
    if a <= bestAccuracy
        wrapperIdx = inf;
    else
        wrapperIdx = wrapperIdx + 1;
    end
    bestAccuracy = a;

    % Update the features-to-test based on the most recent winner.
    winner = logbook{b,'Feature Configuration'};
    fn = fieldnames(winner);
    tf = structfun(@(x)(x),winner);
    if strcmpi(direction,"backward")
        featuresToRemove = fn(~tf);
    else
        featuresToRemove = fn(tf);
    end
    for ii = 1:numel(featuresToRemove)
        loc =  strcmp(featuresToTest,featuresToRemove{ii});
        featuresToTest(loc) = [];
        if strcmpi(direction,"backward")
            featureConfig.(featuresToRemove{ii}) = false;
        else
            featureConfig.(featuresToRemove{ii}) = true;
        end
    end

end

% Sort the logbook and make it more readable.
logbook(1,:) = []; % Delete placeholder first row.
logbook = sortrows(logbook,{'Accuracy'},{'descend'});
bestFeatures = logbook{1,'Feature Configuration'};
m = logbook{:,'Feature Configuration'};
fn = fieldnames(m);
myString = strings(numel(m),1);
for wrapperIdx = 1:numel(m)
    tf = structfun(@(x)(x),logbook{wrapperIdx,'Feature Configuration'});
    myString(wrapperIdx) = strjoin(fn(tf),", ");
end
logbook = table(myString,logbook{:,'Accuracy'},'VariableNames',["Features","Accuracy"]);
end

HelperTrimOrPad

function y = HelperTrimOrPad(x,n)
% y = HelperTrimOrPad(x,n) trims or pads the input x to n samples. If x is
% trimmed, it is trimmed equally on the front and back. If x is padded, it is
% padded equally on the front and back with zeros. For odd-length trimming or
% padding, the extra sample is trimmed or padded from the back.

% Copyright 2019 The MathWorks, Inc.
a = size(x,1);
if a < n
    frontPad = floor((n-a)/2);
    backPad = n - a - frontPad;
    y = [zeros(frontPad,1);x;zeros(backPad,1)];
elseif a > n
    frontTrim = floor((a-n)/2)+1;
    backTrim = a - n - frontTrim;
    y = x(frontTrim:end-backTrim);
else
    y = x;
end
end

Ссылки

[1] Джайн, А. и Д. Цонгкер. «Выбор компонентов: оценка, приложение и малая производительность». Транзакции IEEE по анализу шаблонов и машинному интеллекту. Том 19, выпуск 2, 1997, стр. 153-158.

[2] Якобовски. «Jakobovski/Free-Spoken-Digit-Dataset». GitHub, 30 мая 2019 года. https://github.com/Jakobovski/free-spoken-digit-dataset.

Документация