Local LabTested
The LM Studio Long-Context Test: Where a 35B Local Model Started to Hit the Wall
A practical LM Studio long-context benchmark showing how time to first token changed from 2K to 49K tokens on a 35B local model.
Setup
The run used LM Studio against qwen3.6-35b-a3b through the local OpenAI-compatible API at localhost:1234. It tested context windows from 2048 tokens through 49152 tokens, ran three trials per context size, and capped each response at 64 max tokens. Output token counts and throughput were approximate because tokenizer usage was not available for this run.
Findings
LM Studio averaged 0.718s TTFT at 2048 tokens, 0.719s at 4096, 1.440s at 8192, 2.997s at 16384, 3.653s at 24576, 4.758s at 32768, and 5.914s at 40960. All three 49152-token trials failed before the first token arrived. Throughput after the first token moved from about 115.991 tok/s at 2048 tokens to 70.570 tok/s at 40960 tokens.
Verification Proof Path
Claim
Hype Audit
Deconstruct the marketing claims, checking for verification risks.
Setup
Local Assembly
Rebuild the workflow in a local, private container environment.
Benchmark
Runtime Testing
Measure execution speeds, resource usage, and token response latency.
Workflow
Efficiency Compression
Streamline the processes into reusable, repeatable scripts.
Verdict
Tool Rating
Final rating and practicality score determination.
Sources
LM Studio TTFT Benchmark ReportAI Efficiency Toolbox · Jun 7, 2026
LM Studio TTFT Benchmark CSV ResultsAI Efficiency Toolbox · Jun 7, 2026
Share
Join the discussion
Log in with an account to comment. Comments are reviewed before they appear.
Log in to comment