Not every prompt needs to be sent to a frontier model. For small things like generating a commit message, naming a branch, or remembering how to iterate over a bash array, a small local model is plenty, and it’s fast.

I’ve been using gemma-4-E2B for this. On Apple Silicon the trick is to load the community MLX build into LM Studio rather than the official Google GGUF. MLX models run noticeably faster on Apple Silicon.

Install LM Studio:

brew install --cask lm-studio

Open it and install the model:

https://huggingface.co/lmstudio-community/gemma-4-E2B-it-MLX-4bit

There’s an 8-bit build too. For my workflows the 4-bit is faster and the accuracy tradeoff is fine. If you want a little more accuracy, grab the 8-bit.

Example 1: asking questions at the shell

Once LM Studio is running with the model loaded, it exposes a local HTTP API. Here’s the script I drop into ~/bin/gemma:

#!/usr/bin/env bash
set -euo pipefail
curl -sS http://localhost:1234/api/v1/chat \
  -H "Content-Type: application/json" \
  -d "$(jq -n --arg input "$*" '{
    model: "gemma-4-e2b-it-mlx@4bit",
    system_prompt: "You are a senior software engineer. Be very concise.",
    input: $input
  }')" \
  | jq -r '.output[] | select(.type=="message") | .content'

Then at the shell the following commands takes 1-2s:

$ gemma how do i iterate over a bash array
Use a `for` loop with `"${array[@]}"`.

**Example:**

```bash
my_array=("apple" "banana" "cherry")
for item in "${my_array[@]}"; do
  echo "$item"
done
```

Faster than a man page lookup, faster than a web search. And convenient to be able to get it immediately in the terminal.

Example 2: generating git commit messages

I also use Gemma in my git commit script. It takes the staged diff, sends it to Gemma, and uses the response as the commit message.

The round trip is under three seconds, which is fast enough that I don’t think twice about using it. The message isn’t always exactly what I’d write, but it’s close enough most of the time, and easy to amend when it isn’t.

$ gcy
● gemma (2.689426s)
[rust-publish-v2-workflow d4ba7f8] publish v2 workflow implementation
 3 files changed, 32 insertions(+), 1 deletion(-)
 create mode 100644 .github/workflows/rust-publish-v2.yml
 create mode 120000 rust-publish-v2/workflow.yml

Example 3: web and code search via MCP

For prompts that do need fresh information, a third variant of the script wires in exa and grep.app as MCP servers for web search and GitHub code search:

#!/usr/bin/env bash
set -euo pipefail
curl -sS http://localhost:1234/api/v1/chat \
  -H "Content-Type: application/json" \
  -d "$(jq -n --arg input "$*" '{
    model: "gemma-4-e2b-it-mlx@4bit",
    system_prompt: "You are a senior software engineer. Be very concise. Use exa mcp to websearch. Use grep to find code on github.",
    input: $input,
    integrations: [
      { type: "ephemeral_mcp", server_label: "exa", server_url: "https://mcp.exa.ai/mcp" },
      { type: "ephemeral_mcp", server_label: "grep", server_url: "https://mcp.grep.app" }
    ]
  }')" \
  | jq -r '.output[] | select(.type=="message") | .content'

Then at the shell the following command takes ~9s:

$ gemma-web grep mcp: find repos using soroban-sdk
I found several repositories related to `soroban-sdk`. The results include:

1.  **OpenZeppelin/stellar-contracts**: Shows usage of `soroban-sdk` for fixed-point mathematics.
2.  **stellar/stellar-cli**: Shows dependency checks for `soroban-sdk` as a workspace dependency.
3.  **OpenZeppelin/contracts-wizard**: Shows `soroban-sdk` being listed as a dependency in generated contract files.
4.  **stellar/rs-soroban-sdk**: Contains source code for the `soroban-sdk` crate, including details on enabling the `alloc` feature.
5.  **stellar/rs-soroban-sdk (Makefile)**: Contains build instructions for the `soroban-sdk`.
6.  **stellar/rs-soroban-sdk (Cargo.lock)**: Shows dependencies for `soroban-sdk` and related crates like `soroban-sdk-macros`.
7.  **stellar/stellar_core**: Shows `soroban-sdk` as a dependency in its Cargo.lock.

These results suggest that `soroban-sdk` is used within the Stellar ecosystem, likely for building smart contracts or related tools.

Gemma’s ability to use MCPs seems somewhat primitive though. I’ve had to be very explicit to get it to call them, and one thing I noticed: attaching the MCP servers, even on prompts that don’t need them, seems to weaken Gemma’s ability to follow the “be concise” instruction in the system prompt. Answers get chattier, with more preamble and more explanation. Drop the integrations array and the same prompts come back terser. My guess is that the tool schemas consume enough of the small model’s attention that the style instruction gets diluted.

So what am I using it for

Things I am using Gemma 4 for:

  • Generic knowledge: “how do I do X in bash”.
  • Basic generation: commit messages, branch names.
  • Things where “roughly right, right now” beats “exactly right, ten seconds from now” without a dependence on any specialist knowledge.

Things I wouldn’t use it for:

  • Anything needing specialised or recent knowledge, unless you hook a tool in (see the third script above).
  • Anything where being wrong has a cost. Save the big models for that.