Welcome to Phonet’s documentation!¶

This toolkit compute posteriors probabilities of phonological classes from audio files for several groups of phonemes according to the mode and manner of articulation.

If you are not sure about what phonological classes are, have a look at this Phonological classes tutorial

The code for this project is available at https://github.com/jcvasquezc/phonet .

The list of the phonological classes available and the phonemes that are activated for each phonological class are observed in the following Table

Phonological class	Phonemes
vocalic	/a/, /e/, /i/, /o/, /u/
consonantal	/b/, /tS/, /d/, /f/, /g/, /x/, /k/, /l/, /ʎ/, /m/, /n/, /p/, /ɾ/, /r/, /s/, /t/
back	/a/, /o/, /u/
anterior	/e/, /i/
open	/a/, /e/, /o/
close	/i/, /u/
nasal	/m/, /n/
stop	/p/, /b/, /t/, /k/, /g/, /tS/, /d/
continuant	/f/, /b/, /tS/, /d/, /s/, /g/, /ʎ/, /x/
lateral	/l/
flap	/ɾ/
trill	/r/
voiced	/a/, /e/, /i/, /o/, /u/, /b/, /d/, /l/, /m/, /n/, /r/, /g/, /ʎ/
strident	/f/, /s/, /tS/
labial	/m/, /p/, /b/, /f/
dental	/t/, /d/
velar	/k/, /g/, /x/
pause	/sil/

Supported features:

phonet.model() - This is the architecture used for the estimation of the phonological classes using a multitask learning strategy. It consists of a 2 Bidirectional GRU layers, followed by a time-distributed dense layer
phonet.get_phon_wav() - Estimate the phonological classes using the BGRU models for an audio file (.wav)
phonet.get_phon_path() - Estimate the phonological classes using the BGRU models for all the (.wav) audio files included inside a directory.
phonet.get_posteriorgram() - Estimate the posteriorgram for an audio file (.wav) sampled at 16kHz.
phonet.get_PLLR() - Estimate the phonological log-likelihood ratio (PLLR) features for an audio file (.wav) sampled at 16kHz.

Installation¶

From the source file:

git clone https://github.com/jcvasquezc/phonet
cd phonet
python setup.py install

Methods¶

class phonet.Phonet(phonological_classes)¶

Phonet computes posteriors probabilities of phonological classes from audio files for several groups of phonemes.

Parameters: phonological_classes – phonological class to be evaluated (“consonantal”, “back”, “anterior”, “open”, “close”, “nasal”, “stop”, “continuant”, “lateral”, “flap”, “trill”, “voice”, “strident”, “labial”, “dental”, “velar”, “pause”, “vocalic”, “all”).
Returns: Phonet Object (see Examples).

phonological_classes==’all’ computes the phonological posterior for the complete list of phonological classes.

get_PLLR(audio_file, feat_file='', projected=True, plot_flag=False)¶

Estimate the phonological log-likelihood ratio (PLLR) features for an audio file (.wav) sampled at 16kHz

Parameters

audio_file – file audio (.wav) sampled at 16 kHz
feat_file – .csv file to save the PLLR features for the phonological classes. Deafult=”” does not save the csv file

Projected

whether to make a projection of the feature space of the PLLR according to [1], in order to avoid the bounding effect.

Plot_flag

True or False. Plot distributions of the feature space

Returns

Pandas dataFrame with the PLLR features

>>> from phonet.phonet import Phonet
>>> phon=Phonet(["all"])
>>> file_audio=PATH+"/audios/sentence.wav"
>>> phon.get_PLLR(file_audio)

References:

[1] Diez, M., Varona, A., Penagarikano, M., Rodriguez-Fuentes, L. J., & Bordel, G. (2014). On the projection of PLLRs for unbounded feature distributions in spoken language recognition. IEEE Signal Processing Letters, 21(9), 1073-1077.

[2] Abad, A., Ribeiro, E., Kepler, F., Astudillo, R. F., & Trancoso, I. (2016). Exploiting Phone Log-Likelihood Ratio Features for the Detection of the Native Language of Non-Native English Speakers. In INTERSPEECH (pp. 2413-2417).

get_feat(signal, fs)¶

This method extracts log-Mel-filterbank energies used as inputs of the model.

Parameters

signal – the audio signal from which to compute features. Should be an N array.
fs – the sample rate of the signal we are working with, in Hz.

Returns

A numpy array of size (NUMFRAMES by 33 log-Mel-filterbank energies) containing features. Each row holds 1 feature vector.

get_phon_path(audio_path, feat_path, plot_flag=False)¶

Estimate the phonological classes using the BGRU models for all the (.wav) audio files included inside a directory

Parameters

audio_path – directory with (.wav) audio files inside, sampled at 16 kHz
feat_path – directory were the computed phonological posteriros will be stores as a (.csv) file per (.wav) file from the input directory
plot_flag – True or False, whether you want plots of phonological classes or not

Returns

A directory with csv files created with the posterior probabilities for the phonological classes.

>>> from phonet.phonet import Phonet
>>> phon=Phonet(["vocalic", "strident", "nasal", "back", "stop", "pause"])
>>> phon.get_phon_path(PATH+"/audios/", PATH+"/phonclasses2/")

get_phon_wav(audio_file, feat_file='', plot_flag=True)¶

Estimate the phonological classes using the BGRU models for an audio file (.wav)

Parameters

audio_file – file audio (.wav) sampled at 16 kHz
feat_file – . File (.csv) to save the posteriors for the phonological classes. Deafult=”” does not save the csv file
plot_flag – True or False, whether you want plots of phonological classes or not

Returns

A pandas dataFrame with the posterior probabilities for the phonological classes.

>>> from phonet.phonet import Phonet
>>> phon=Phonet(["stop"]) # get the "stop" phonological posterior from a single file
>>> file_audio=PATH+"/audios/pataka.wav"
>>> file_feat=PATH+"/phonclasses/pataka"
>>> phon.get_phon_wav(file_audio, file_feat, True)

>>> file_audio=PATH+"/audios/sentence.wav"
>>> file_feat=PATH+"/phonclasses/sentence_nasal"
>>> phon=Phonet(["nasal"]) # get the "nasal" phonological posterior from a single file
>>> phon.get_phon_wav(file_audio, file_feat, True)

>>> file_audio=PATH+"/audios/sentence.wav"
>>> file_feat=PATH+"/phonclasses/sentence_nasal"
>>> phon=Phonet(["strident", "nasal", "back"]) # get "strident, nasal, and back" phonological posterior from a single file
>>> phon.get_phon_wav(file_audio, file_feat, True)

get_posteriorgram(audio_file)¶

Estimate the posteriorgram for an audio file (.wav) sampled at 16kHz

Parameters: audio_file – file audio (.wav) sampled at 16 kHz
Returns: plot of the posteriorgram

>>> from phonet.phonet import Phonet
>>> phon=Phonet(["vocalic", "strident", "nasal", "back", "stop", "pause"])
>>> phon.get_posteriorgram(file_audio)

mask_correction(posterior, threshold=0.5)¶

Implements a mask for a correction the posterior probabilities

Parameters

posterior – phonological posterior.
threshold – threshold for correction

Returns

Corrected phonological posterior.

model(input_size)¶

This is the architecture used for the estimation of the phonological classes using a multitask learning strategy It consists of a 2 Bidirectional GRU layers, followed by a time-distributed dense layer

Parameters: input_size – size of input for the BGRU layers (number of features x sequence length).
Returns: A Keras model of a 2-layer BGRU neural network.

modelp(input_size)¶

This is the architecture used for phoneme recognition It consists of a 2 Bidirectional GRU layers, followed by a time-distributed dense layer

Parameters: input_size – size of input for the BGRU layers (number of features x sequence length).
Returns: A Keras model of a 2-layer BGRU neural network.

number2phoneme(seq)¶

Converts the prediction of the neural network for phoneme recognition to a list of phonemes.

Parameters: seq – sequence of integers obtained from the preiction of the neural network for phoneme recognition.
Returns: A list of strings of the phonemes recognized for each time-frame.

Indices and tables¶

Help¶

If you have trouble with Phonet, please write to Camilo Vasquez at: juan.vasquez@fau.de