AI Efficiency ToolboxNewsletter
Back to Local Lab
Hero graphic for an LM Studio long-context latency benchmark on a 35B local model
Local LabTested

The LM Studio Long-Context Test: Where a 35B Local Model Started to Hit the Wall

A practical LM Studio long-context benchmark showing how time to first token changed from 2K to 49K tokens on a 35B local model.

Setup

The run used LM Studio against qwen3.6-35b-a3b through the local OpenAI-compatible API at localhost:1234. It tested context windows from 2048 tokens through 49152 tokens, ran three trials per context size, and capped each response at 64 max tokens. Output token counts and throughput were approximate because tokenizer usage was not available for this run.

Findings

LM Studio averaged 0.718s TTFT at 2048 tokens, 0.719s at 4096, 1.440s at 8192, 2.997s at 16384, 3.653s at 24576, 4.758s at 32768, and 5.914s at 40960. All three 49152-token trials failed before the first token arrived. Throughput after the first token moved from about 115.991 tok/s at 2048 tokens to 70.570 tok/s at 40960 tokens.

Verification Proof Path

Claim

Hype Audit

Deconstruct the marketing claims, checking for verification risks.

Setup

Local Assembly

Rebuild the workflow in a local, private container environment.

Benchmark

Runtime Testing

Measure execution speeds, resource usage, and token response latency.

Workflow

Efficiency Compression

Streamline the processes into reusable, repeatable scripts.

Verdict

Tool Rating

Final rating and practicality score determination.

Sources

LM Studio TTFT Benchmark ReportAI Efficiency Toolbox · Jun 7, 2026
LM Studio TTFT Benchmark CSV ResultsAI Efficiency Toolbox · Jun 7, 2026

Share

Join the discussion

Log in with an account to comment. Comments are reviewed before they appear.

Log in to comment