All
Search
Images
Videos
Shorts
Maps
News
More
Shopping
Flights
Travel
Notebook
Report an inappropriate content
Please select one of the options below.
Not Relevant
Offensive
Adult
Child Sexual Abuse
Length
All
Short (less than 5 minutes)
Medium (5-20 minutes)
Long (more than 20 minutes)
Date
All
Past 24 hours
Past week
Past month
Past year
Resolution
All
Lower than 360p
360p or higher
480p or higher
720p or higher
1080p or higher
Source
All
Dailymotion
Vimeo
Metacafe
Hulu
VEVO
Myspace
MTV
CBS
Fox
CNN
MSN
Price
All
Free
Paid
Clear filters
SafeSearch:
Moderate
Strict
Moderate (default)
Off
Filter
ibm.com
Faster LLMs: Accelerate Inference with Speculative Decoding
Isaac Ke explains speculative decoding, a technique that accelerates LLM inference speeds by 2-4x without compromising output quality. Learn how "draft and verify" pairs smaller and larger models to optimize token generation, GPU usage, and resource efficiency.
8 months ago
Fast Inference from Transformers via Speculative Decoding Transformer Models
0:46
As AI labs race to train and deploy new frontier models, existing models become more affordable with better tokenomics. ✨ "Everybody's trying to get to the next frontier. And every time they get to the next frontier, the last generation AI tokens, the cost starts to decline about a factor of 10x every year," said NVIDIA CEO Jensen Huang in a recent keynote. Model optimization techniques such as speculative decoding and multi-token prediction, combined with inference serving platforms like NVIDIA
Facebook
NVIDIA AI
12.3K views
1 month ago
7:07
Unlocking AI Speed: How KV Caching and MLA Make Transformers 20x Faster
YouTube
Skill Advancement
62 views
1 month ago
32:14
Hardwear.io NL 2025: Modern memory error exploitation via speculative execution attacks: Anil Kurmus
YouTube
hardwear.io
122 views
2 weeks ago
Top videos
How to Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats on Qualcomm® Cloud AI 100
qualcomm.com
Aug 1, 2024
1:32
Speculative Decoding: The Easiest Way to Speed Up LLMs
YouTube
FriendliAI
3 views
1 week ago
Speculative Decoding — Think Fast⚡, Then Think Right✅
substack.com
10 months ago
Fast Inference from Transformers via Speculative Decoding NLP Inference Speedup
2:31
4M views · 101K reactions | Megatron : Transformers Via : h__super | Gundam World : The Legion | Facebook
Facebook
Gundam World : The Legion
4.1M views
3 weeks ago
42:34
What's new at AWS | Dec 03, 2025
YouTube
What's new at AWS
4 views
2 months ago
8:26
Beyond Speculative Decoding: Jacobi Forcing in LLMs
YouTube
Tales Of Tensors
4 views
1 week ago
How to Quadruple LLM Decoding Performance with Speculative Dec
…
Aug 1, 2024
qualcomm.com
1:32
Speculative Decoding: The Easiest Way to Speed Up LLMs
3 views
1 week ago
YouTube
FriendliAI
Speculative Decoding — Think Fast⚡, Then Think Right✅
10 months ago
substack.com
6:18
What is Speculative Sampling? | Boosting LLM inference speed
3.8K views
Nov 20, 2024
YouTube
AssemblyAI
14:37
Understanding Speculative Decoding: Boosting LLM Efficienc
…
374 views
10 months ago
YouTube
MLWorks
0:18
Speculative Decoding for Faster LLMs
129 views
2 months ago
YouTube
Zaharah
8:44
How to PROPERLY Use Speculative Decoding in LM Studio to DOUBL
…
2 views
2 weeks ago
YouTube
AsapGuide
0:46
Speculative Decoding Turbocharge Your LLM Inference! #ai, #llm, #inf
…
25 views
1 month ago
YouTube
The Code Architect
22:36
MASSIVELY speed up local AI models with Speculative Decodin
…
19.6K views
1 year ago
YouTube
GosuCoder
1:06
This Trick Makes LLMs 2X Faster
499 views
1 week ago
YouTube
OpenCV University
52:54
LLMs | Efficient LLM Decoding-II | Lec15.2
1.8K views
Oct 9, 2024
YouTube
LCS2
9:39
Faster LLMs: Accelerate Inference with Speculative Decoding
18.9K views
9 months ago
YouTube
IBM Technology
24:17
Fast Inference from Transformers via Speculative Decoding
1.2K views
Sep 12, 2023
YouTube
Arxiv Papers
37:34
Speculative Decoding Explained
7.7K views
Dec 21, 2023
YouTube
Trelis Research
17:56
Behind the Stack, Ep 11 - Speculative Decoding
63 views
3 months ago
YouTube
Doubleword
12:46
Speculative Decoding: When Two LLMs are Faster than One
26.1K views
Oct 12, 2023
YouTube
Efficient NLP
12:42
Fast Inference from Transformers via Speculative Decoding
134 views
Nov 5, 2024
YouTube
AI Papers Podcast Daily
0:36
How AI Replies So Fast! ⚡ Speculative Decoding
130 views
2 months ago
YouTube
Mr. Doubty – Short. Smart. Techy
6:53
How Speculative Decoding Makes LLMs 2.5x Faster (The Secret to F
…
121 views
5 months ago
YouTube
FranksWorld of AI
15:21
What is Speculative Sampling?
2.8K views
Sep 1, 2023
YouTube
DataScienceCastnet
DFlash Boosts Speculative Decoding with Lightweight Block
…
2 views
1 month ago
linkedin.com
19:54
Behind the Stack, Ep. 13 - Faster Inference: Speculative Decoding f
…
1 views
2 months ago
YouTube
Doubleword
36:12
Deep Dive: Optimizing LLM inference
45.4K views
Mar 11, 2024
YouTube
Julien Simon
59:06
The Future of Efficient LLM Serving: A Deep Dive with Travis Adair l Pr
…
137 views
6 months ago
YouTube
Predibase by Rubrik
0:54
Speculative Decoding explained
3.1K views
3 weeks ago
YouTube
IndividualKex
44:58
Implementation and optimization of MTP for DeepSeek R1 in TensorR
…
1.4K views
8 months ago
YouTube
NVIDIA Developer
15:15
How to make LLMs fast: KV Caching, Speculative Decoding, a
…
12.1K views
Oct 9, 2024
YouTube
Lex Clips
1:02:23
EP5: Speculative Decoding with Nadav Timor
5 months ago
YouTube
The Information Bottleneck
41:10
Inference Office Hours with SGLang: Performance Optimizations for LL
…
1K views
3 weeks ago
YouTube
NVIDIA Developer
See more videos
More like this
Feedback