Waveform generation from MFCCs with different excitation models (for voiced speech).
'Impulse' uses simple impulse train excitation, 'DNN' is a excitation pulse model trained with least-squares, and 'Residual GAN' is a GAN-based additive noise model.
All systems use the same LSTM-based model to predict F0 from MFCCs