NOTE! These pages are deprecated and retained only for archiving purposes. Our new location is https://speechprocessingbook.aalto.fi .




List of authors: Tom Bäckström, Okko Räsänen, Abraham Zewoudie, Pablo Pérez Zarazaga, Liisa Koivusalo

Includes contributions from Sneha Das


NOTE! These pages are deprecated and retained only for archiving purposes. Our new location is https://speechprocessingbook.aalto.fi

Table of contents

  1. Introduction
    1. Why speech processing?
    2. Speech production and acoustic properties
    3. Speech perception (Wikipedia)
    4. Linguistic structure of speech
    5. Speech-Language pathology (Wikipedia)
    6. Applications and systems structures
    7. Social and cognitive processes involved in human communication (external)
  2. Basic representations and models
    1. Waveform
    2. Windowing
    3. Signal energy, loudness and decibel
    4. Spectrogram and the STFT
    5. Autocorrelation and autocovariance
    6. Cepstrum and MFCC
    7. Linear prediction
    8. Fundamental frequency (F0)
    9. Zero-crossing rate
    10. Deltas and Delta-deltas
    11. PSOLA
    12. Jitter and shimmer (also Jitter, shimmer, harmonicity etc (external link))
    13. Crest factor (Wikipedia)
  3. Pre-processing
    1. Pre-emphasis
    2. Noise gate (Wikipedia)
    3. Dynamic Range Compression (Wikipedia)
    4. Voice activity detection (VAD)
    5. Speech enhancement
  4. Modelling tools in speech processing
    1. Linear regression
    2. Sub-space models
    3. Vector quantization (VQ)
    4. Gaussian mixture model (GMM)
    5. Neural networks
    6. Non-negative Matrix and Tensor Factorization
  5. Evaluation of speech processing methods
    1. Subjective quality evaluation
    2. Objective quality evaluation
    3. Other performance measures
    4. Analysis of evaluation results
  6. Speech analysis
    1. Fundamental frequency estimation
    2. Formant estimation and tracking
    3. Inverse filtering for glottal activity estimation
  7. Recognition tasks in speech processing
    1. Voice activity detection (VAD)
    2. Keyword or wake-word spotting
    3. Speech recognition
    4. Speaker recognition and verification

    5. Speaker diarization

    6. Paralinguistic speech processing
  8. Natural language processing
  9. Speech synthesis
    1. Concatenative speech synthesis
    2. Statistical parametric speech synthesis
  10. Transmission, storage and telecommunication
    1. Design goals
    2. Basic tools
      1. Modified discrete cosine transform (MDCT)
      2. Entropy coding
      3. Perceptual modelling in speech and audio coding
      4. Vector quantization (VQ)
      5. Linear prediction
    3. Code-excited linear prediction (CELP)
    4. Frequency-domain coding
  11. Speech enhancement
    1. Noise attenuation
    2. Echo cancellation
    3. Bandwidth extension (BWE)
    4. Dereverberation
    5. Source separation
    6. Beamforming
  12. Voice and speech analysis (wikipedia)
    1. Measurements for medical applications
      1. Electroglottography (Wikipedia)
      2. Stroboscopy and videokymography (Wikipedia)
      3. Highspeed camera
      4. MRI
      5. Rothenberg mask
      6. Glottal inverse filtering
    2. Forensic analysis
  13. Chatbots / Conversational design (external link)
  14. Computational models of human language processing
  15. Security and privacy in speech technology
  16. References



Space contributors

{"mode":"list","scope":"descendants","limit":"5","showLastTime":"true","order":"update","contextEntityId":148294278}


  • No labels