AI Efficiency ToolboxNewsletter
Back to Local Lab
Hero graphic for a 35B local AI runtime benchmark comparing LM Studio and oMLX
Local LabTested

LM Studio vs oMLX on a MacBook Pro: The 35B Local AI Test That Actually Changed My Default

A practical benchmark of LM Studio and oMLX on a MacBook Pro running a 35B local model, including sustained generation and prefix-cache behavior.

Setup

The benchmark compared LM Studio and oMLX on a MacBook Pro Mac17,8 with an Apple M5 Pro, 18-core CPU, 20-core GPU, and 48 GB unified memory. Both runtimes used the Qwen3.6 35B A3B model family with temperature 0.2, top-p 0.9, min-p 0.05, repeat penalty 1.05, top-k 0, one active session/request, and controlled benchmark prompts.

Findings

LM Studio produced 5382 completion tokens in 60.78s at 88.55 tok/s with a 0.23s time to first token. oMLX produced 6144 completion tokens in 70.47s at 87.19 tok/s with a 4.17s time to first token. In the shared-prefix test, LM Studio improved from 3.37s to 0.27s TTFT, while oMLX with cache enabled improved from 6.95s to 3.78s and reported 6144 cached prompt tokens.

Verification Proof Path

Claim

Hype Audit

Deconstruct the marketing claims, checking for verification risks.

Setup

Local Assembly

Rebuild the workflow in a local, private container environment.

Benchmark

Runtime Testing

Measure execution speeds, resource usage, and token response latency.

Workflow

Efficiency Compression

Streamline the processes into reusable, repeatable scripts.

Verdict

Tool Rating

Final rating and practicality score determination.

Sources

Final LM Studio vs oMLX 35B Hermes RunAI Efficiency Toolbox · Jun 7, 2026
Consolidated LM Studio vs oMLX FindingsAI Efficiency Toolbox · Jun 7, 2026

Share

Join the discussion

Log in with an account to comment. Comments are reviewed before they appear.

Log in to comment