-
Notifications
You must be signed in to change notification settings - Fork 4
/
SIN_CalAudio.m
357 lines (297 loc) · 13.3 KB
/
SIN_CalAudio.m
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
function scale = SIN_CalAudio(REF, varargin)
%% DESCRIPTION:
%
% The calibration procedure for SIN sets the HINT-Noise.wav (a speech
% shaped noise sample) to 0 dB and scales all other stimuli or stimulus
% sets to also rest at 0 dB. The user can then present the HINT-Noise
% stimulus through their sound playback system and adjust (via hardware)
% the sound pressure level (SPL) to the desired level. Following this
% procedure, all stimuli/stimulus sets should be have a nearly identical
% SPL, provided that the frequency response of the playback/recording
% loop is flat (enough). CWB recommends using hardware (e.g., a graphical
% equalizer) to flatten the frequency response of your playback/recording
% loop.
%
% INPUT:
%
% REF: path to reference file. (e.g., fullfile('playback', 'Noise', 'HINT-Noise.wav'))
%
% General Parameters:
%
% 'testID': string, testID used in call to SIN_TestSetup that is then
% used to gather stimulus information
%
% 'nmixer': Nx1 scaling matrix, where N is the number of data channels
% in the reference signal (typically 2x1 for HINT-Noise.wav)
%
% 'targetdB': double, the relative decibel level to scale output stimuli
% to. This is useful if, for instance, the reference sound is
% calibrated to 65 dB, but the user wants the remaining sound
% files to be calibrated to 80 dB. In this example,
% 'targetdB' would be +15.
%
% 'removesilence': bool, remove silence from beginning and end of
% sounds. If true, requires ampThresh argument below.
% CWB generally recommends this since excess silence
% at beginning and end of sounds can reduce RMS
% estimates considerably (and variably) depending on
% the length of the silent period.
%
% 'ampthresh': used to remove silence from beginning and end of
% acoustic waveforms prior to RMS estimation.
%
% 'bitdepth': bit depth for written audio files. Note: CWB is not sure if
% this applies to MP4s since these invoke FFmpeg for scaling
% purposes.
%
% Note: audioread does not give bitdepth information from
% MP4s, so CWB cannot confirm the bitdepth of the audio in
% these files. But the log readout from FFmpeg suggests
% 16-bit audio. Unconfirmed.
%
% 'suffix': string to append to end of file name for newly created
% files.
%
% 'tmixer': Dx1 scaling matrix, where D is the number of data channels
% in the to-be-calibrated target files. The code will
% essentially combine (linearly) data from the D channels in
% each file. These data are then used (perhaps with some
% additional processing) to estimate the RMS of the target
% file.
%
% 'omixer': 1xP scaling matrix, where 1 is the number of data channels
% (the result of .tmixer) and P is the number of output
% channels in the generated file (.wav or .mp4).
%
% 'saveref': bool, flag to write the reference data to file (.wav
% format).
%
% 'wav_regexp': regular expression. If provided, the
% specific.wav_regexp field is overwritten with this
% parameter. This proved necessary to guarantee that the
% user knows which files are being used for calibration
% purposes.
%
% 'overwritemp4': bool, if true, automatically overwrite MP4s. If false,
% user is prompted for each overwrite. False is safer,
% but time consuming.
%
% Filtering Parameters:
%
% Parameters below allow the user to specify filtering settings for
% sound files. Typically, filter settings are applied to both the
% reference sound AND the to-be-calibrated sounds.
%
% Note that filtering is achieved using MATLAB's butter and filtfilt
% functions.
%
% 'apply_filter': bool, if set, then all sounds are filtered using the
% filter specifications below. If false, then no
% filtering applied.
%
% 'filter_type': string, filter type supported by butter.
%
% These include the following. See doc butter for
% details.
%
% 'low':
% 'high':
% 'stop:
% 'bandpass':
%
% 'frequency_cutoff': equivalent to butter's Wn argument. Specifies the
% cutoff frequency(ies).
%
% 'filter_order': filter order
%
% AV Alignment Options (only applies to MP4s presently)
%
% Note that these procedures circularly shift the data to account for
% changes in timing. So it's important that there be significant periods
% of silence (at least relative silence) at the beginning and end of each
% track. If there is NOT, data may be moved from the beginning of the
% sound to the end or vice versa. Also, if the noise floor is relatively
% high, there may be transients introduced following this procedure. If
% this is the case, then try fading sounds in/out prior to applying a
% shift.
%
% Transcoding Correction:
%
% These options have been removed after CWB learned that FFmpeg was not
% introducing misalignments in audiovisual files, but instead prepending
% a duplicate video frame and delaying the audio by the duration of this
% frame.
%
% Development:
%
% Note (anymore)
%
% Christopher W. Bishop
% University of Washington
% 9/14
%% GATHER PARAMETERS
d=varargin2struct(varargin{:});
%% SET THE RMS ESTIMATOR
rms_function = @rms;
%% GET TEST INFORMATION
% The general field also has information we'll need regarding the
% location of noise files.
opts = SIN_TestSetup(d.testID, '');
opts = opts(1);
%% OVERWRITE FILE FILTER IF NECESSARY
if isfield(d, 'wav_regexp')
input(['Overwriting ' opts(1).specific.wav_regexp ' with ' d.wav_regexp '. Press enter to continue']);
% Error check to make sure we haven't changed the field name to
% something else.
if ~isfield(opts.specific, 'wav_regexp')
error('wav_regexp field name may have changed');
end %
opts.specific.wav_regexp = d.wav_regexp;
end % isfield
%% LOAD THE REFERENCE NOISE
[ref_data, FS] = audioread(REF);
%% SCALE NOISE
ref_data = ref_data*d.nmixer;
%% FILTER REFERENCE DATA:
% Filter the reference data. This is often needed for Project AD in which
% we want to bandpass all sounds between [0.125 10] kHz.
if d.apply_filter
% Convert cutoff frequencies to normalized units (normalized to
% Nyquist)
d.frequency_cutoff = d.frequency_cutoff / (FS/2);
% Get filter coefficients
[b, a] = butter(d.filter_order, d.frequency_cutoff, d.filter_type);
% Apply filter to reference sound
ref_data = filtfilt(b, a, ref_data);
end % if d.apply_filter
%% WRITE SCALED REFERENCE
% Write the (scaled) reference file back to disk?
if d.writeref
% Get file parts
[PATHSTR,NAME,EXT] = fileparts(REF);
% Create output filename
audio_file_out = fullfile(PATHSTR, [NAME d.suffix EXT]);
% Write to file
audiowrite(audio_file_out, ref_data, FS, 'BitsperSample', d.bitdepth);
end % if d.writeref
%% WRITE CALIBRATED STIMULI
% - Now, we load in the stimuli for the specific test we want to rewrite
% stimuli for.
% - Grab file names
[~, files] = SIN_stiminfo(opts);
%% LOAD ALL FILES IN STIMULUS LIST
% - Load all files, concatenate into larger file for RMS estimation
% - We need to save the (potentially filtered) audio files for use below.
% fs is the sampling rate of the to-be-scaled stimulus(i). There's an error
% check built in to catch cases in which fs does not match between these
% stimuli and the reference stimuli. No recovery mechanisms coded, however.
fs = [];
audio_data = {};
for i=1:numel(files)
% Loop through files
for k=1:numel(files{i})
% Load data file
[audio_data{i}{k}, nfs] = SIN_loaddata(files{i}{k});
% Sampling rate check
if isempty(fs)
fs = nfs;
elseif fs ~= nfs
error('Sampling rates do not match');
end % if isempty(fs)
% Double check that reference and files are at the same sampling
% rate
if fs ~= FS
audio_data{i}{k} = resample(audio_data{i}{k}, FS, nfs);
display(['Resampling ' files{i}{k} ]);
end % if fs ~= rfs
% Apply appropriate mixer
% This should reduce the data to a single channel.
audio_data{i}{k} = audio_data{i}{k}*d.tmixer;
% Do we filter the waveforms before concatenating them? \
if d.apply_filter
% Note that d.frequency_cutoff already converted to Nyquist
% normalized values above.
% d.frequency_cutoff = d.frequency_cutoff / (rfs/2);
% Get filter coefficients
[b, a] = butter(d.filter_order, d.frequency_cutoff, d.filter_type);
% Apply filter to reference sound
audio_data{i}{k} = filtfilt(b, a, audio_data{i}{k});
end % if d.apply_filter
end % for k=1:numel(files{i})
end % for i=1:length(files)
% Concatenate files
% Now we'll concatenate the filtered files.
concat = concat_audio_files(concatenate_lists(audio_data), ...
'fs', fs, ...
'remove_silence', d.removesilence, ...
'amplitude_threshold', d.ampthresh, ...
'mixer', 1); % set mixer to 1 since we've already collapsed to a single
% channel above
%% CALCULATE SCALING FACTOR
%
% The target tracks (generally speech) are scaled to match the RMS value.
scale = db2amp(db(rms_function(ref_data)) - db(rms_function(concat)) + d.targetdB);
%% APPLY SCALING FACTOR, WRITE STIMULI
% Now, load/scale each stimulu, write to file.
for i=1:numel(files)
for k=1:numel(files{i})
% We need to do something different depending on the file format.
[PATHSTR,NAME,EXT] = fileparts(files{i}{k});
% Create output file name
% We always want to write audio data as wav files
audio_file_out = fullfile(PATHSTR, [NAME d.suffix '.wav']);
% Scale audio data
audio_data{i}{k} = audio_data{i}{k}.*scale;
% Multiply by omixer to generate appropriately sized output
% matrix.
audio_data{i}{k} = audio_data{i}{k}*d.omixer;
if max(max(abs(audio_data{i}{k}))) > 1, error('Signal Clipped'); end
% Write audio track(s) to file
audiowrite(audio_file_out, audio_data{i}{k}, FS, 'BitsperSample', d.bitdepth);
% If a .wav/.mp3, then use audiowrite
switch EXT
case {'.mp3', '.wav'}
% Nothing else to do here, it's all done above.
case {'.mp4'}
% Now we need to rewrite the MP4s with the scaled (and
% potentially filtered) audio data. It turns out that the
% encoder used for the .wav files written above is
% incompatible with MP4 files. So we have to rewrite the
% audio track as a temporary MP4 file, then use that audio
% track (with the proper AAC encoder) as the audio track
% for the output MP4.
temp_audio_file = fullfile(PATHSTR, [NAME d.suffix '_temp.mp4']);
audiowrite(temp_audio_file, audio_data{i}{k}, FS, 'Bitrate', d.bitrate);
% Is overwriteMP4 set?
% -y overwrites without asking in ffmpeg. Not used by
% default. Maybe a separate parameter?
if d.overwritemp4
cmd = 'ffmpeg -y ';
else
cmd = 'ffmpeg ';
end % if d.overwritemp4
% Replace audio in MP4
% Revised to use only MP4 as video and audio inputs, so
% we can copy the codecs and save time/improve data
% precision.
mp4_file_out = fullfile(PATHSTR, [NAME d.suffix '.mp4']);
cmd = [cmd '-i "' files{i}{k} '" -i "' temp_audio_file '"' ...
' -map 0:0 -map 1 -c copy "' mp4_file_out '"'];
% Issue system call
[status, cmdout] = system(cmd);
% Remove the temporary audio file
delete(temp_audio_file);
% If there's an error, then print the command output.
% Otherwise, post the file name to give the user some
% information about progress.
if status
display(cmdout);
else
display([num2str((i-1)*numel(files{1}) + k) ' of ' num2str(numel(files) .* numel(files{1}))])
end % if ~status
otherwise
error('Unknown file extension');
end % switch EXT
% If MP4, then use FFmpeg (see MLST_makemono for details)
end % for k=1:numel(files{i})
end % for i=1:numel(files)