AI Efficiency ToolboxNewsletter
Back to Local Lab
Hero graphic for a Rapid-MLX Qwen3.6 35B A3B local benchmark
Local LabTested

Rapid-MLX on the Same Qwen3.6 35B A3B Model: Fast First Token, Slower Sustained Run

A Rapid-MLX benchmark using the same Qwen3.6 35B A3B model path normally loaded in LM Studio, compared with prior LM Studio and oMLX results.

Setup

Rapid-MLX was cloned into /Users/jason/Developer/projects/tool-rapid-mlx and installed from source in an editable virtualenv. LM Studio was unloaded with lms unload --all before the run. Rapid-MLX served ~/.lmstudio/models/mlx-community/Qwen3.6-35B-A3B-4bit on localhost:8000 with --served-model-name qwen3.6-35b-a3b, --max-concurrent-requests 1, --max-num-seqs 1, --no-thinking, and --no-mllm. The --no-mllm flag was needed because the LM Studio model directory includes processor files and the text-only Rapid-MLX install does not include mlx-vlm.

Findings

Rapid-MLX produced 6144 completion tokens in 87.21s at 70.45 tok/s with a 0.18s time to first token. That first token was excellent, but sustained throughput trailed the prior LM Studio result at 88.55 tok/s and the prior oMLX result at 87.19 tok/s. In the shared-prefix benchmark, Request A TTFT was 3.21s and Request B TTFT was 3.23s, so Request B was effectively flat rather than faster. Rapid-MLX did not report cached token counts through the API in this run, and the server logs showed cache misses for both long prefix requests.

Verification Proof Path

Claim

Hype Audit

Deconstruct the marketing claims, checking for verification risks.

Setup

Local Assembly

Rebuild the workflow in a local, private container environment.

Benchmark

Runtime Testing

Measure execution speeds, resource usage, and token response latency.

Workflow

Efficiency Compression

Streamline the processes into reusable, repeatable scripts.

Verdict

Tool Rating

Final rating and practicality score determination.

Sources

Rapid-MLX Qwen3.6 35B A3B Benchmark RunAI Efficiency Toolbox · Jun 7, 2026
Final LM Studio vs oMLX 35B Hermes RunAI Efficiency Toolbox · Jun 7, 2026

Share

Join the discussion

Log in with an account to comment. Comments are reviewed before they appear.

Log in to comment