Home

Base model Mac Mini M4 is a great coding machine

You can tell your aunt and nephew that a 16GB M4 Apple computer is fine for school. It's even decent for professional coding.

In the past, I've been reluctant to make the recommendation above because I primarily use an M1 Max with 64GB memory at work. I've had a suspicion most people don't need 64GB memory but I felt hypocritical claiming this without more hands-on experience using a lower spec computer.

I've now bought the base model M4 Mac Mini for $600 and reached the opinion that it's an incredibly versatile and powerful computer even if it only has 16GB memory and a 256GB disk.

Benchmark

To demonstrate how good the base model Mac Mini is, this post benchmarks the M4 against an M1 Max and an MSI Titan (high-end Windows laptop) on several real-world coding tasks.

Name CPU Memory Disk Cost Other
M4 M4 (4 performance cores, 6 efficiency cores) 16GB 256GB $600 in November 2024 Base model with no additions.
M1 Max M1 Max (8 performance cores, 2 efficiency cores) 64GB 1TB ~$4,000 at time of purchase in 2022 Has Jamf (device management) installed, which slows down the computer so it's not an apples-to-apples comparison wrt. the hardware
MSI Titan Intel i9-12980HX (8 performance cores, 16 efficiency cores) 64GB 4TB ~$6,000 at time of purchase in 2023 Expensive because it has an Nvidia GTX 4090 GPU. Runs Ubuntu Linux via WSL.

The benchmarks stress the computers at different coding-related tasks:

I'm not intimately familiar with the Ripgrep or Bun codebases so my benchmarks may be flawed there. I included them to give a broader representation of different languages and ecosystems.

CPU

Let's get straight to the numbers. The chart below compares the relative performance of the M4 plotted over the length of the benchmarks (in seconds, logarithmic scale).

A dot below the reference line means the competing computer performed faster than the M4 on that benchmark. A dot above the reference line means the competing computer was slower than the M4.

A few interpretations of the chart:

Where it was practical, I ran the commands via hyperfine to get stable numbers (auto-tuned or 5 runs with appropriate warmup). I ensured the benchmarks were not measuring network I/O or cached runs.

Below are more detailed numbers of the benchmarks and descriptions of the exact commands I ran.

Benchmark Description M4 M1 Max MSI Titan (Windows) MSI Titan (Linux/WSL)
Scala Format Format the com-lihaoyi/mill codebase with standalone fat jar time ./scalafmt 1x, ~3.8s 1.32x, ~5s 1.84x, ~7s 0.9x, ~3.4s
Scala Compile Compile the com-lihaoyi/mill Codebase with command mill __.compile ( 1x, ~60s 0.73x, ~44s n/a (couldn't get it compiling) 0.63x, ~38s
TypeScript Compile Build the sourcegraph/cody codebase with the command pnpm build after doing a clean install. This reflects something I do regularly in my work. 1x, ~8.7s 1.6x, ~14s 1.38x, ~12s 0.98x, ~8.56s
TypeScript Test Run one integration test suite in the sourcegraph/cody repo with the command pnpm test src/index.test.ts under the agent/ subdirectory. I chose this benchmark because it reflects something I do very frequently at work. 1x, 3.69s 1.15x, 4.23s 1.59x, 5.87 1.2x, 4.44s
LLM Ran local Qwen2.5-Coder 7b model via Ollama with command ollama run --verbose qwen2.5-coder:latest 'compare graphql with rest in 2 examples'. The reported number is the "eval rate". 1x, 20 tokens/s 0.46x, 43 tokens/s 0.25x, 80 token/s 0.24x, 84 token/s
Rust Release Build Compile a release build of BurntSushi/ripgrep repo with command cargo build --release. Worth keeping in mind this is a flawed benchmark because link times vary between operating systems. 1x, ~8.5s 1.08x, ~9.16s 1.18x, ~10s 0.91x, ~7.8s
Rust Test Run ripgrep test suite with the comman cargo test 1x, ~985ms 1.21x, ~1.19s 1.48x, ~1.46s 0.57x, ~563ms
Bun Build Compile a debug build of the oven-sh/bun repo based on the contributing guide using the command bun run build. I picked this example because it's a reasonably sized C++/Zig codebase, and Bun is very cool. 1x, 5.92min 1.11x, 6.56min n/a (Windows build has more complicated dependencies) 0.97x, 5.72min

Memory

While benchmarking the M4 and maxing out available CPUs and memory, it stood out to me how responsive the computer remained. For example, IntelliJ remained usable (autocomplete, refactoring, ...) while running heavy tests in the terminal.

In the video below, I'm compiling the Mill codebase (multi-core usage), sbt is file watching my personal website, I have 10 open tabs in Google Chrome, a local 7b LLM is streaming a reply, and I'm doing a refactoring inside IntelliJ in the Scalafmt codebase. The tool btop reports I have 1% memory left and it's still a usable computer!

Make sure to increase the video quality to 1440p.

Even under these extreme conditions, the Mill compilation is only 10 seconds slower compared to the benchmark (70s instead of 60s), and the LLM responds with 5 tokens/s slower compared to the benchmark (15 tok/s instead of 20 tok/s).

In comparison, the MSI Titan quickly starts freezing up while running a CPU-heavy task. Not to mention, the fans are loud!

Disk

The 1tb disk on my work computer fills up quickly from using Docker, running local LLMs, and installing several JetBrains IDEs. However, when I was cleaning up my work computer the other day, I found out that a lot of disk usage also came from surprising applications. The worst culprit was the Biome VS Code extension that had written 70GB of logs! After this came Slack with >30GB of caches.

Knowing this, I was curious to see how far I could take the 256GB disk on the Mac Mini. Once the disk is full, the computer pretty much becomes unusable so it's important to keep an eye on this number.

After running the benchmarks, I have now installed most of the software I need to work professionally on several large codebases across different language ecosystems and have 150GB left of free disk on the computer. This includes downloading Qwen2.5-Coder 7b via Ollama, installing IntelliJ CE, Xcode(!), and a fair number of build caches and dependencies via npm, Maven Central, Crates, and Homebrew.

Running GrandPerspective on the disk reveals that most of the usage so far comes from cloning large repos and building large codebases (dependency cache and compile cache).

Based on my experience, I will probably run out of space after a few weeks of active coding. However, since the usage is mostly from caches, I could clean it up regularly to keep the computer responsive.

Another option is to permanently attach a separate SSD via Thunderbolt. I'm not sure what the implications are wrt. performance or complications in the setup. For example, I guess one challenge is to configure apps to write to that disk.

Conclusion

The base model Mac Mini M4 is an incredibly capable computer at $600. I'm particularly impressed with how responsive this computer is at 1% free memory, Apple is doing some magic there.

Between CPU, memory, and disk, I am most concerned about the 256GB disk on this computer. If you can afford the extra $200, I would consider the 512GB upgrade. However, $200 is a third of the price of this entire computer so it feels painful to pay that much for such a small upgrade.

If you are a professional developer, I would consider getting the M4 Pro with more performance cores, at least 24GB memory and ideally 1TB disk. At that point, you're at $1600, which is an entirely different price point compared to the $600 base model. For myself, I would likely go for the 48GB memory option, which takes me to $2000, which is still not a bad deal considering what a powerful computer you're getting.

Home