Aalto Demos

Waveform generation from MFCCs with different excitation models (for voiced speech). 'Impulse' uses simple impulse train excitation, 'DNN' is a excitation pulse model trained with least-squares, and 'Residual GAN' is a GAN-based additive noise model. All systems use the same LSTM-based model to predict F0 from MFCCs