Submit Blog

Sign up Sign in

Simon Willison • 3/19/2026

Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally

Read Original

Article details an experiment by Dan Woods to run the Qwen3.5-397B-A17B MoE model on a 48GB MacBook Pro M3 Max. It leverages Apple's 'LLM in a Flash' paper for efficient inference from flash storage, using AI-assisted coding (Claude) for optimization, quantization, and performance tuning to achieve several tokens per second.

0 comments

#memory optimization #Quantization #Mixture Of Experts

#memory optimization #Quantization #Mixture Of Experts

Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally

Comments

No comments yet

Be the first to share your thoughts!

Browser Extension

Get instant access to AllDevBlogs from your browser

Top of the Week

1

Webinar Series – SQL Server Indexing

Koen Verbeeck • 1 votes

2

Risks and Limitations of AI in the Life Sciences

Jeremy Howard • 1 votes

3

1M context is now generally available for Opus 4.6 and Sonnet 4.6

Simon Willison • 1 votes

4

Chris Coyier • 1 votes

5

When your coding agent doesn’t understand your project, you’ll get junk

Benjamin Cane • 1 votes

6

LLM Use in the Python Source Code

Miguel Grinberg • 1 votes