🍉的博客
首页 摄影 文章 小功能 课程 关于
首页 摄影 文章 小功能 课程 关于
← 返回课程大纲

Speech And Audio

17 个课时

01 Audio Fundamentals — Waveforms, Sampling, Fourier Transform
CODE 1 OUTPUTS
✓ →
02 Spectrograms, Mel Scale & Audio Features
CODE 1 OUTPUTS
✓ →
03 Audio Classification — From k-NN on MFCCs to AST and BEATs
CODE 1 OUTPUTS
✓ →
04 Speech Recognition (ASR) — CTC, RNN-T, Attention
CODE 1 OUTPUTS
✓ →
05 Whisper — Architecture & Fine-Tuning
CODE 1 OUTPUTS
✓ →
06 Speaker Recognition & Verification
CODE 1 OUTPUTS
✓ →
07 Text-to-Speech (TTS) — From Tacotron to F5 and Kokoro
CODE 1 OUTPUTS
✓ →
08 Voice Cloning & Voice Conversion
CODE 1 OUTPUTS
✓ →
09 Music Generation — MusicGen, Stable Audio, Suno, and the Licensing Earthquake
CODE 1 OUTPUTS
✓ →
10 Audio-Language Models — Qwen2.5-Omni, Audio Flamingo, GPT-4o Audio
CODE 1 OUTPUTS
✓ →
11 Real-Time Audio Processing
CODE 1 OUTPUTS
✓ →
12 Build a Voice Assistant Pipeline — The Phase 6 Capstone
CODE 1 OUTPUTS
✓ →
13 Neural Audio Codecs — EnCodec, SNAC, Mimi, DAC and the Semantic-Acoustic Split
CODE 1 OUTPUTS
✓ →
14 Voice Activity Detection & Turn-Taking — Silero, Cobra, and the Flush Trick
CODE 1 OUTPUTS
✓ →
15 Streaming Speech-to-Speech — Moshi, Hibiki, and Full-Duplex Dialogue
CODE 1 OUTPUTS
✓ →
16 Voice Anti-Spoofing & Audio Watermarking — ASVspoof 5, AudioSeal, WaveVerify
CODE 1 OUTPUTS
✓ →
17 Audio Evaluation — WER, MOS, UTMOS, MMAU, FAD, and the Open Leaderboards
CODE 1 OUTPUTS
✓ →

🍉的博客

用镜头记录生活
用文字记录思考

快速链接

  • 首页
  • 摄影
  • 文章
  • 小功能
  • AI 课程
  • 关于

社交媒体

  • GitHub
  • Twitter / X
  • Email
  • RSS
© 2026 🍉的博客 · 保留所有权利