Speech And Audio
17 个课时
01 Audio Fundamentals — Waveforms, Sampling, Fourier Transform
CODE 1 OUTPUTS
✓ → 02 Spectrograms, Mel Scale & Audio Features CODE 1 OUTPUTS
✓ → 03 Audio Classification — From k-NN on MFCCs to AST and BEATs CODE 1 OUTPUTS
✓ → 04 Speech Recognition (ASR) — CTC, RNN-T, Attention CODE 1 OUTPUTS
✓ → 05 Whisper — Architecture & Fine-Tuning CODE 1 OUTPUTS
✓ → 06 Speaker Recognition & Verification CODE 1 OUTPUTS
✓ → 07 Text-to-Speech (TTS) — From Tacotron to F5 and Kokoro CODE 1 OUTPUTS
✓ → 08 Voice Cloning & Voice Conversion CODE 1 OUTPUTS
✓ → 09 Music Generation — MusicGen, Stable Audio, Suno, and the Licensing Earthquake CODE 1 OUTPUTS
✓ → 10 Audio-Language Models — Qwen2.5-Omni, Audio Flamingo, GPT-4o Audio CODE 1 OUTPUTS
✓ → 11 Real-Time Audio Processing CODE 1 OUTPUTS
✓ → 12 Build a Voice Assistant Pipeline — The Phase 6 Capstone CODE 1 OUTPUTS
✓ → 13 Neural Audio Codecs — EnCodec, SNAC, Mimi, DAC and the Semantic-Acoustic Split CODE 1 OUTPUTS
✓ → 14 Voice Activity Detection & Turn-Taking — Silero, Cobra, and the Flush Trick CODE 1 OUTPUTS
✓ → 15 Streaming Speech-to-Speech — Moshi, Hibiki, and Full-Duplex Dialogue CODE 1 OUTPUTS
✓ → 16 Voice Anti-Spoofing & Audio Watermarking — ASVspoof 5, AudioSeal, WaveVerify CODE 1 OUTPUTS
✓ → 17 Audio Evaluation — WER, MOS, UTMOS, MMAU, FAD, and the Open Leaderboards CODE 1 OUTPUTS
✓ →