Speculative Decoding - Search Videos

Faster LLMs: Accelerate Inference with Speculative Decoding

Faster LLMs: Accelerate Inference with Speculative Decoding

Isaac Ke explains speculative decoding, a technique that accelerates LLM inference speeds by 2-4x without compromising output quality. Learn how "draft and verify" pairs smaller and larger models to optimize token generation, GPU usage, and resource efficiency.

Fast Inference from Transformers via Speculative Decoding Transformer Models

As AI labs race to train and deploy new frontier models, existing models become more affordable with better tokenomics. ✨ "Everybody's trying to get to the next frontier. And every time they get to the next frontier, the last generation AI tokens, the cost starts to decline about a factor of 10x every year," said NVIDIA CEO Jensen Huang in a recent keynote. Model optimization techniques such as speculative decoding and multi-token prediction, combined with inference serving platforms like NVIDIA

As AI labs race to train and deploy new frontier models, existing models become more affordable with better tokenomics. ✨ "Everybody's trying to get to the next frontier. And every time they get to the next frontier, the last generation AI tokens, the cost starts to decline about a factor of 10x every year," said NVIDIA CEO Jensen Huang in a recent keynote. Model optimization techniques such as speculative decoding and multi-token prediction, combined with inference serving platforms like NVIDIA

FacebookNVIDIA AI

12.3K views1 month ago

Unlocking AI Speed: How KV Caching and MLA Make Transformers 20x Faster

Unlocking AI Speed: How KV Caching and MLA Make Transformers 20x Faster

YouTubeSkill Advancement

62 views1 month ago

Hardwear.io NL 2025: Modern memory error exploitation via speculative execution attacks: Anil Kurmus

Hardwear.io NL 2025: Modern memory error exploitation via speculative execution attacks: Anil Kurmus

YouTubehardwear.io

122 views2 weeks ago

Top videos

How to Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats on Qualcomm® Cloud AI 100

How to Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats on Qualcomm® Cloud AI 100

Speculative Decoding: The Easiest Way to Speed Up LLMs

Speculative Decoding: The Easiest Way to Speed Up LLMs

YouTubeFriendliAI

3 views1 week ago

Speculative Decoding — Think Fast⚡, Then Think Right✅

Speculative Decoding — Think Fast⚡, Then Think Right✅

Fast Inference from Transformers via Speculative Decoding NLP Inference Speedup

4M views · 101K reactions | Megatron : Transformers Via : h__super | Gundam World : The Legion | Facebook

4M views · 101K reactions | Megatron : Transformers Via : h__super | Gundam World : The Legion | Facebook

FacebookGundam World : The Legion

4.1M views3 weeks ago

What's new at AWS | Dec 03, 2025

What's new at AWS | Dec 03, 2025

YouTubeWhat's new at AWS

4 views2 months ago

Beyond Speculative Decoding: Jacobi Forcing in LLMs

Beyond Speculative Decoding: Jacobi Forcing in LLMs

YouTubeTales Of Tensors

4 views1 week ago

How to Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats on Qualcomm® Cloud AI 100

How to Quadruple LLM Decoding Performance with Speculative Dec…

Speculative Decoding: The Easiest Way to Speed Up LLMs

Speculative Decoding: The Easiest Way to Speed Up LLMs

3 views1 week ago

YouTubeFriendliAI

Speculative Decoding — Think Fast⚡, Then Think Right✅

Speculative Decoding — Think Fast⚡, Then Think Right✅

What is Speculative Sampling? | Boosting LLM inference speed

What is Speculative Sampling? | Boosting LLM inference speed

3.8K viewsNov 20, 2024

YouTubeAssemblyAI

Understanding Speculative Decoding: Boosting LLM Efficiency and Speed

Understanding Speculative Decoding: Boosting LLM Efficienc…

374 views10 months ago

Speculative Decoding for Faster LLMs

Speculative Decoding for Faster LLMs

129 views2 months ago

How to PROPERLY Use Speculative Decoding in LM Studio to DOUBLE Your AI Speed

How to PROPERLY Use Speculative Decoding in LM Studio to DOUBL…

2 views2 weeks ago

YouTubeAsapGuide

Speculative Decoding Turbocharge Your LLM Inference! #ai, #llm, #inf…

25 views1 month ago

YouTubeThe Code Architect

MASSIVELY speed up local AI models with Speculative Decodin…

19.6K views1 year ago

YouTubeGosuCoder

This Trick Makes LLMs 2X Faster

499 views1 week ago

YouTubeOpenCV University

LLMs | Efficient LLM Decoding-II | Lec15.2

1.8K viewsOct 9, 2024

Faster LLMs: Accelerate Inference with Speculative Decoding

18.9K views9 months ago

YouTubeIBM Technology

Fast Inference from Transformers via Speculative Decoding

1.2K viewsSep 12, 2023

YouTubeArxiv Papers

Speculative Decoding Explained

7.7K viewsDec 21, 2023

YouTubeTrelis Research

Behind the Stack, Ep 11 - Speculative Decoding

63 views3 months ago

YouTubeDoubleword

Speculative Decoding: When Two LLMs are Faster than One

26.1K viewsOct 12, 2023

YouTubeEfficient NLP

Fast Inference from Transformers via Speculative Decoding

134 viewsNov 5, 2024

YouTubeAI Papers Podcast Daily

How AI Replies So Fast! ⚡ Speculative Decoding

130 views2 months ago

YouTubeMr. Doubty – Short. Smart. Techy

How Speculative Decoding Makes LLMs 2.5x Faster (The Secret to F…

121 views5 months ago

YouTubeFranksWorld of AI

What is Speculative Sampling?

2.8K viewsSep 1, 2023

YouTubeDataScienceCastnet

DFlash Boosts Speculative Decoding with Lightweight Block …

2 views1 month ago

Behind the Stack, Ep. 13 - Faster Inference: Speculative Decoding f…

1 views2 months ago

YouTubeDoubleword

Deep Dive: Optimizing LLM inference

45.4K viewsMar 11, 2024

YouTubeJulien Simon

The Future of Efficient LLM Serving: A Deep Dive with Travis Adair l Pr…

137 views6 months ago

YouTubePredibase by Rubrik

Speculative Decoding explained

3.1K views3 weeks ago

YouTubeIndividualKex

Implementation and optimization of MTP for DeepSeek R1 in TensorR…

1.4K views8 months ago

YouTubeNVIDIA Developer

How to make LLMs fast: KV Caching, Speculative Decoding, a…

12.1K viewsOct 9, 2024

YouTubeLex Clips

EP5: Speculative Decoding with Nadav Timor

YouTubeThe Information Bottleneck

Inference Office Hours with SGLang: Performance Optimizations for LL…

1K views3 weeks ago

YouTubeNVIDIA Developer

See more videos