This glossary collects key terms from the field of computational audio content analysis. The original idea of this glossary was to provide Finnish translations for key terms in this field in order to stabilize the terminology. However, to make the list usable also for others than Finnish researchers, a brief English definition is added for each term with links to Wikipedia and Wiktionary. To make the list more accessible across other languages, some terms have been translated also into German, Spanish, and French through Wikipedia. The glossary does not try to be complete, as it is a work in progress.
Data file used to create this glossary is published as repository:
If you see an error, or you want to contribute, make a pull request to the repository or send me an email.
Special thanks to Tomasz Mąka for the Polish translations and Irene Martin Morato for the Spanish translations.
Dictionaries and glossaries from related fields:
- English-Finnish dictionary for general audio signal processing terms by Vesa Välimäki
- English-Finnish dictionary for statistics and probability theory terms by Petri Koistinen
- English-Finnish dictionary/glossary for language technology by Kimmo Koskenniemi
- Bank of Finnish terminology in arts and sciences
- Glossary of statistical terms by ISI
- Machine learning glossary by Google
- Tilastotieteen sanasto by Juha Alho, Elja Arjas, Esa Läärä ja Pekka Pere
Terms 586, Translations 475 128 138 259 120 , Updated 2024-04-26
A
accuracy
The fraction of system output which was predicted correctly
See also: evaluation metric
acoustic feature
See also: feature
acoustic model
in speech recognition system, model learned from acoustic data
acoustic monitoring
acoustic pattern recognition
acoustic scene
acoustic scene analysis
acoustics
activation function
in neural network, a function to define the output of a neuron
additive model
additive noise
agglomerative hierarchical clustering
aggregation
algorithm
aliasing
Amazon mechanical turk (AMT)
crowdsourcing marketplace enabling the use of human intelligence to perform tasks
annotation
adding metadata to audio
annotation granularity
annotator
anomalous sound detection
area under the curve (AUC)
in binary classification, an evaluation metric to considers all classification thresholds
See also: reciever operating characteristic curve
artifical general intelligence (AGI)
See also: strong artificial intelligence, full artificial intelligence
artificial intelligence (AI)
an ability to have machines act with apparent intelligence
artificial neural network
See also: neural network
assisted living
a housing facility for people with disabilities or for adults who cannot or choose not to live independently
attention
attention mechanism
See also: attention
attribute
audification
audio analysis
audio caption
See also: automated audio captioning, audio captioning
audio captioning
See also: automated audio captioning
audio classification
See also: classification
audio dataset
a collection of audio examples used for system development
audio retrieval
audio signal processing
audio source separation
audio tagging
audio-visual, audiovisual
audiovisual data
auditory
relating to hearing
auditory event
Subjective perception of sound
See also: auditory scene
auditory scene
auditory scene analysis (ASA)
a model proposed by Albert Bregman for the basis of auditory perception
augmented intelligence
auralization
automated audio captioning (AAC)
B
background noise
backpropagation, backprop
method used in neural networks to calculate gradient descent
backpropagation algorithm
bag of frames
representing frames without taking into account their order
balanced accuracy (BACC)
baseline
batch
in neural network training, a set of examples used in one iteration for model training
batch normalization (BN)
a technique for improving the performance and stability of neural networks
See also: deep neural network, neural network, batch
beamforming
technique used in sensor arrays for directional signal reception or transmission
belief network
bias
big data
bigram
binary classification
a type of classification which outputs one of two mutually exclusive classes
binary mask
binaural
related to two ears
bioacoustics
cross-disciplinary science that combines biology and acoustics
block mixing
data augmentation technique
See also: data augmentation
boosting
a machine learning technique which iteratively combines weak classifiers into a classifier with higher accuracy
brute-force search
Systematically going through all possible candidate solutions for the problem
See also: exhaustive search
C
category
a group to which items are assigned based on similarity or defined criteria
cepstrum
class label
classification
identification of which categories an item belong
classification error
classification model
See also: model, classification
classification of events, activities and relationships (CLEAR)
evaluation campaign organized on 2006 and 2007
classification threshold
See also: classification
classifier
See also: classification
closed set
closed set classification
See also: open set classification, closed set
cluster
See also: cluster analysis
cluster analysis
clustering
grouping related examples together
cognitive modeling
collaborative learning
See also: federated learning
computational audio content analysis
See also: content analysis
computational auditory scene analysis (CASA)
computational linguistics
an interdisciplinary field concerned with the statistical or rule-based modeling of natural language from a computational perspective
computational modeling
computer audition (CA)
field of study of algorithms and systems for audio understanding by machine
confidence interval (CI)
interval estimate computed from the statistics of the observed data, that might contain the true value of an unknown population parameter
confusion matrix
an NxN table to summarize classification performance (predicted class versus actual class)
connectionist temporal classification (CTC)
constant-Q cepstral coefficients (CQCC)
constant-Q transform (CQT)
content analysis
context
context-aware
context vector
convolution
mathematical operation of two functions to produce a third function that expresses how the shape of one is modified by the other
convolution kernel
convolutional neural network (CNN)
a neural network with convolutional layers along with pooling and fully connected layers
See also: neural network, deep neural network
convolutional recurrent neural network (CRNN)
See also: neural network, deep neural network
corpus
cost function
See also: lost function
cross-attention
cross-entropy
cross-validation
a method for estimating generalization of a system for new data by reserving a subset of dataset only for testing
crowdsourcing
D
data
data acquisition
data augmentation
artificially increasing the number of training examples
data mining
data post-processing
See also: data preprocessing
data preprocessing
See also: data post-processing
dataset
decision boundary
learned separating boundary between classes
decision function
decision tree
a learning method using tree-like decision graph
decorrelated
deep belief network (DBN)
deep learning (DL)
a multi-level algorithm that gradually identifies things at higher levels of abstraction
deep machine learning (DML)
See also: deep learning
deep neural network (DNN)
neural network containing multiple hidden layers
denoising
detection
detection and classification of acoustic scenes and events (DCASE)
detection error tradeoff (DET)
Plot of the false rejection rate versus false acceptance rate for classification systems
deterministic
diarization error rate (DER)
dimensionality
direction of arrival (DOA)
direction of arrival estimation (DOAE)
See also: direction of arrival
discrete cosine transform (DCT)
Transform to represent data points with a sum of cosine functions
discrete-time Fourier transform (DFT)
discriminant analysis
discriminative learning
modeling the dependence of a target variable y on an observed variable x
See also: generative learning
disentangled representation
disentangled representation learning
dissimilarity
See also: similarity
distance measure
domain
domain adaptation
machine learning field to deal with cases in which a model trained on source distribution is used on different target distribution
domain generalization
domain shift
downmixing, down-mixing
mixing audio channels together
downstream task
duration
dynamic range
E
early fusion
features from multiple sources are combined into a single feature set before feeding to a classifier.
See also: feature level fusion
edge AI
Utilizing artificial intelligence in an edge-computing environment.
See also: edge computing
edge computing
embeddings
a low-dimensional space into which high-dimensional vectors can be translated
See also: word embedding
empirical
ensemble
ensemble learning
use multiple learning algorithms to obtain better predictive performance than any of the constituent learning algorithms alone
envelope function
epoch
while traning neural networks, one pass of the full training set
See also: deep neural network, neural network
equal error rate (EER)
error rate
Euclidean distance
evaluation metric
event-based metric
See also: evaluation metric
event offset
event onset
everyday environment
everyday listening
the interpretation of the sound in terms of its source
exhaustive search
See also: brute-force search
expectation maximization (EM)
an iterative method to find maximum likelihood or maximum a posteriori estimates of parameters in statistical models
experiment
experimental design
F
F-score, f1-score
an evaluation metric to take into account both the precision and the recall
See also: evaluation metric
false negative (FN)
an example wrongly predicted as negative class
false positive (FP)
an example wrongly predicted as positive class
fast Fourier transform (FFT)
feature
a measurable property of the acoustic signal
See also: acoustic feature
feature engineering
using domain knowledge of the data to manually create suitable features for machine learning
See also: feature learning
feature extraction
feature learning
automatically discover needed representations
See also: feature engineering
feature level fusion
features from multiple sources are combined into a single feature set before feeding to a classifier.
See also: early fusion
feature selection
federated learning
machine learning technique to train a model across multiple devices without sharing local data examples
See also: collaborative learning
feed forward network
See also: deep neural network, neural network
feedback
feedforward
feedforward neural network (FNN)
See also: deep neural network, neural network
few-shot learning
See also: one-shot learning
filter
filter bank
an array of band-pass filters
foley
Reproduction of everyday sounds in filmmaking
folksonomy
classification based on user's tags
frame
frame blocking
See also: frame
frame stacking
free field
frequency domain
See also: time domain
frequency resolution
full artificial intelligence (Full AI)
fully connected layer
See also: deep neural network, neural network
fundamental frequency
fuzzy logic
G
gammatone feature cepstral coefficients (GFCC)
gammatone filter
gated recurrent unit (GRU)
See also: neural network, deep neural network
Gaussian mixture model (GMM)
See also: mixture model
generative adversarial network (GAN)
technique where a generator generates data candidates and a discriminator evaluates them.
See also: deep neural network, neural network
generative learning
See also: discriminative learning
gradient descent
ground truth
See also: reference label, annotation
H
hand-crafted feature
using domain knowledge of the data to manually create suitable features for machine learning
See also: feature engineering
harmonic
head-related transfer function (HRTF)
heterogeneous
heterogeneous dataset
heuristic
a practical and suboptimial solution
hidden layer
in neural network, layer between the input layer and the output layer
See also: deep neural network, neural network
hidden Markov model (HMM)
hierarchical classification
histogram of oriented gradients (HOG)
holdout data
examples which are only used for testing the system's performance
See also: cross-validation
homogeneous
hyperparameter
in machine learning, a variable which is set before the learning process starts
See also: parameter
hyponym
I
i-vector
implementation
impulse response
independent component analysis (ICA)
indexing
inference
information retrieval
input
input layer
See also: deep neural network, neural network
inter-annotator agreement
a measurement of how well human annotators agree while annotation task
interclass correlation
intermediate statistics
intraclass correlation
inverse fast Fourier transform (IFFT)
J
jackknife estimator
jackknife method
jitter
K
k-fold cross-validation
See also: cross-validation
k-nearest-neighbor (kNN)
See also: nearest neighbor
kernel
knowledge
Kullback–Leibler divergence
L
labeled data
labeled example
an example with audio and assigned category label
labeling
language acquisition
language-based audio retrieval
See also: audio retrieval
language model
language-queried audio source separation
See also: audio source separation
late fusion
Combaning outputs from multiple classifiers.
latent variable
layer
in neural network, a set of neurons
See also: deep neural network, neural network
leaderboard
a board showing the ranking of participant in a competition
learning rate
a hyperparameter to control the size of the learning step, gradient step
See also: deep neural network, neural network
leave-one-out cross-validation (LOOCV)
likelihood
likelihood ratio
likelihood ratio test
linear discriminant analysis (LDA)
linear prediction
a mathematical operation to estimate future values as a linear function of previous values
linear prediction cepstral coefficients (LPCC)
linear regression
local binary patterns (LBP)
localization
log-likelihood
logistic regression
statistical model to use logistic function to model a binary dependent variable
long short-term memory (LSTM)
See also: deep neural network, neural network
loss function
a function to measure how far prediction are from its label
See also: cost function
loudness
loudness level
low-complexity model
M
machine learning (ML)
field of artificial intelligence that uses statistical techniques to give computer systems the ability to "learn" from data, without being explicitly programmed
machine learning operations (MLOps)
machine listening
field of study of algorithms and systems for audio understanding by machine
machine-to-machine interaction
macro-averaging
See also: evaluation metric
magnitude response
majority voting
masked multi-head attention
See also: attention mechanism, multi-head attention
maximum a posteriori estimator (MAP estimator)
maximum likelihood estimator (MLE)
mean absolute error (MAE)
mean square error (MSE)
See also: root mean square error
mean squared error (MSE)
mel-frequency cepstral coefficients (MFCCs)
mel scale
non-linear perceptual frequency scale where listners judge frequencies to be equal in distance from one another.
mel-scaled spectrogram
meta learning
metadata
micro-averaging
See also: evaluation metric
mini-batch
See also: batch
misclassification
missing label
mixture model
See also: Gaussian mixture model
mixture signal
modal
modality
model
in machine learning system, a parameter set learned from the training data
modeling
monaural
related to one ears
monitoring
monoaural
See also: monophonic
monophonic
See also: monoaural
multi-annotator
multi-class classification
classification type where prediction is done between three or more classes
multi-condition training
multi-head attention
See also: attention mechanism
multi-label classification
classification type where multiple class labels may be assigned to each instance
See also: single-label classification
multi-task learning
approach where multiple learning tasks are solved at the same time
multichannel, multiple channel
See also: single-channel
multiclass classification
multilayer perceptron (MLP)
See also: neural network, deep neural network
multimodal
multiple kernel learning (MKL)
machine learning method to use a predefined set of kernels and learn optimal combination of these kernels
music information retrieval (MIR)
interdisciplinary science of retrieving information from music
N
naive Bayesian classification
naive listener
narrowband
natural language processing (NLP)
near field
nearest neighbor
See also: k-nearest-neighbor
neural network (NN)
network of (artificial) neurons
neuron
a node in a neural network taking in multiple values and generating single value as an output
noise
noise suppression
noisy label
non-negative matrix factorization (NMF)
nonlinear, non-linear
normal distribution
normalization
converting values into standard range of values
null hypothesis
general statement that there is no relationship between two measured phenomena
O
objective
a metric the algorithm tries to optimize
one-hot encoding
representing categorical variables as binary vectors so that only single element is set to one
one-shot learning
machine learning approach where aim is to learn from a single training example
ontology
a structure of concepts or entities within a domain which are organized by relationships
open data
open set
open set classification
See also: closed set classification, open set
optimization
optimizer
in neural network, an implementation of gradient descent algorithm
order of magnitude
outliers
observation points that are distant from other observations
output
See also: input
output layer
last layer of a neural network outputting predictions
See also: deep neural network, neural network
overfitting
a model that models the training data too closely and fails to predict correcly on new data
See also: underfitting
P
paralinguistics
parallel
parameter
in machine learning, a variable which is adjusted during the learning process
See also: hyperparameter
parametrization
parsing
part of speech (POS)
See also: part-of-speech tagging
part-of-speech tagging (POS tagging)
the process of marking up a word in a text as corresponding to a particular part of speech
See also: part of speech
pattern
pattern recognition
perception
perceptual
See also: perception
perceptual spread
performance
in machine learning, refers to the goodness of the model's predictions
pilot study
pitch
pitch shifting
See also: pitch
polyphonic annotation
pooling
in neural network, reducing matrix into a smaller matrix
See also: deep neural network, neural network
posterior probability
pre-trained model
a model which has been already trained
precision
a measure how often prediction is correct when predicting the positive class
See also: recall, F-score, evaluation metric
prediction error
principal component analysis (PCA)
a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components
prior distribution
prior probability, a priori probability
See also: prior distribution
probability
probability measure
prototypical network
pruning
psychoacoustics
the scientific study of sound perception and audiology
Q
quantization, quantizing
R
random effect
random forest (RF)
an ensemble learning method which constructs a multiple decision trees at training stage
random noise
random selection
randomization
randomness
ranking
recall
a measure how many positive classes were correctly predicted
See also: precision, F-score, evaluation metric
receptive field
reciever operating characteristic curve (ROC curve)
a curve of true positive rate versus false positive rate at different classification thresholds
recognition
record
recurrent neural network (RNN)
a neural network to model sequential interactions through a hidden stage or memory
See also: neural network, deep neural network
recursive quantitative analysis
reference data
reference label
See also: ground truth, annotation
regression
regression analysis
regularization
in machine learning, penalizes a model's compelixity in order to prevent overfitting
reinforcement learning
machine learning technique to focusing on peformance, finding a balance between exploration of new knowledge and exploitation of current knowledge
repeatability
replicability
repository
reproducibility
retrieval
reverberation
reverberation time (RT)
robitics
robust classification
robust estimator
robustness
room acoustics
room response
room simulation
root mean square error (RMSE)
See also: mean square error
roughness
S
saliency
salient
sample
sample space
sampling frequency
See also: sampling rate
sampling rate
See also: sampling frequency
scalability
search algorithm
segment-based metric
See also: evaluation metric
segmentation
self-attention
See also: attention mechanism
self-organizing map (SOM)
artificial neural network that is trained using unsupervised learning to produce a low-dimensional discretized representation of the input space
self-supervised learning (SSL)
semantic information
semi-supervised learning
machine learning technique to use small amount of labeled data and large amount of unlabeled data in the learning stage
sensitivity
See also: evaluation metric
sensor
sensor node
See also: sensor
sequential analysis
sharpness
short-time Fourier transform (STFT)
signal modeling
signal processing
signal-to-interference ratio (SIR)
signal-to-noise ratio (SNR)
significance
significance level
significance level, level of significance
similarity
similarity matrix
similarity measure
a measure to determine how similar two examples are
See also: similarity
single-channel
See also: multichannel
single-label classification
classification type where single class label may be assigned to each instance
See also: binary classification, multi-label classification
sinusoidal modeling
situational awareness (SA)
the perception of environmental elements and events with respect to time or space, the comprehension of their meaning, and the projection of their future status
smoothing
sonification
sound event
sound event detection (SED)
See also: sound event
sound event instance
See also: sound event
sound event localization and detection (SELD)
sound pressure
sound pressure level (SPL)
See also: sound pressure
sound quality
sound scene synthesis
sound source
sound source distance estimation
soundscape
source distance estimation (SDE)
source proximity
source separation
See also: audio source separation
sparse matrix
matrix which has elements predominantly zero
sparsity
number of zero elements a matrix divided by the total number of elements
speaker diarisation
Process of spliting audio signal in to segments accroding to the speakers
specificity
See also: evaluation metric
spectral analysis
spectral centroid
spectral clustering
grouping related examples together using the eigenvalues of similarity matrix
spectral envelope
spectral flatness
spectral flux
spectral moments
spectral roll-off
spectral slope
spectrogram
spectrum
speech analysis
speech enhancement
improvement of speech quality by using various algorithms
speech processing
speech recognition
speech segmentation
speech separation
spoken language understanding (SLU)
Extraction of the meaning out of the speech by using automatic speech recognition and natural language understanding.
standard deviation
statistic
statistical
statistical model
statistical significance
statistics
stochastic
stochastic gradient descent (SGD)
stochastic model
stopping rule
stratification
stratified sampling
stride
in convolution or pooling, the delta on horizontal or vertical dimension of the next input slice
strong annotation
See also: annotation, weak annotation
strong artificial intelligence (Strong AI)
See also: artifical general intelligence, full artificial intelligence
strong label
See also: strong annotation, weak label, weak annotation
study
subband power distribution (SPD)
subsampling layer
supervised learning
learning method which learns from labeled examples
See also: unsupervised learning
support vector machine (SVM)
survey
system
system development
T
t-distributed stochastic neighborhood embedding (t-SNE)
tag
taxonomy
a classification in a hierarchical system
temporal integration
tensor
test example
test set
subset of data used to test the system, disjunct from the training set
See also: training set, validation set
testing data
See also: test set
textual label
texture
threshold value
timbre
time-dependent
time domain
time domain envelope
time-frequency distribution
time-frequency representation
time stretching
changing the duration of an audio singal without affecting its pitch
See also: pitch shifting
timestamp
tolerance
training
a process of determining the optimal parameters of the model
training data
See also: training set
training example
training set
subset of data used to train the system, disjunct from the test set
See also: test set, validation set
transfer function
transfer learning
a research problem focusing on storing knowledge gained while solving one problem and applying it to a different problem
transformation
transformer
deep learning model that is utilizing self-attention mechanism to solve sequence-to-sequence task.
transient
transition probability
trigram
true negative (TN)
an example correctly predicted as negative class
true negative rate (TNR)
true positive (TP)
an example correctly predicted as positive class
true positive rate (TPR)
U
unbalanced data
unbiased
uncertainty
uncorrelated
underfitting
a model with low predictive ability because it does not model the training data well nor it does generalize to new data
See also: overfitting
uniform distribution
unlabeled data
unsupervised learning
machine learning technique to learn from unlabeled data
See also: supervised learning
V
validation
validation data
validation example
validation set
subset of data used to adjust hyperparameters, disjunct from the training set and test set
See also: training set, test set
variability
variable
visualisation, visualization
vocalization, vocalisation
W
waveform
wavelets
weak annotation
See also: annotation
weak artificial intelligence (Weak AI)
weak label
See also: weak annotation, strong label, strong annotation
weak supervision
learning approach where noisy, limited or imprecise data is used to supervise the labelling process of larger training data to be used in supervised learning setting
See also: supervised learning, unsupervised learning
weakly labeled
See also: annotation, weak annotation
wideband
wildlife monitoring
windowing
See also: windowing function
windowing function
function that is zero-valued outside of some chosen interval
See also: windowing
word embedding
mapping word or phrase from the vocabulary into vector of real numbers
See also: embeddings
word error rate (WER)
word sense disambiguation
identifying which sense of a word is used in a sentence, when the word has multiple meanings