Eigen Labs built EigenLayer, the Ethereum restaking protocol that lets validators commit their staked ETH as collateral for other services. The design philosophy there was that cryptographic trust minimization — not legal contracts — should be the primary security mechanism. On April 15 they applied the same thinking to AI inference and shipped Darkbloom as a research preview.

The pitch: over 100 million Apple Silicon Macs sit idle for 18 or more hours a day. Collectively that’s substantial compute. Darkbloom connects that idle capacity to inference demand via an OpenAI-compatible API, priced at roughly 70% less than major cloud providers, with node operators keeping 95% of revenue. That’s the economic story. The technical story is more interesting.

The privacy problem with centralized inference

When you send a prompt to a cloud inference provider, you’re trusting the provider not to log, inspect, or train on it. With smaller API aggregators routing traffic through shared infrastructure, you’re sometimes trusting several layers of vendors. The legal paper protecting you is a ToS and privacy policy; the cryptographic protection is none, because the provider needs to decrypt your request to run it.

Darkbloom’s claim is that it can route requests through third-party hardware — someone’s MacBook Pro sitting in an apartment — while making it technically impossible for that Mac owner to read your prompt or response.

How the security model works

The approach layers four mechanisms:

Hardware-bound keys via Secure Enclave. Each Mac generates encryption keys inside Apple’s Secure Enclave, a tamper-resistant coprocessor that refuses to export private keys. The attestation chain is traceable back to Apple’s root CA and published publicly. What this means in practice is that the key material never exists outside the secure hardware — not in RAM, not on disk.

End-to-end encryption. Requests are encrypted on the client before transmission. The Darkbloom coordinator routes ciphertext without decrypting it. Only the target node’s hardware-bound key can decrypt the request, and only after the attestation check passes.

Hardened runtime. The inference process runs with OS-level restrictions: debugger attachment is blocked, external memory inspection is disabled, and binary integrity is verified. This prevents a technically savvy operator from patching the provider software to dump plaintext.

Output signing. Every response is signed by the specific machine that produced it, with the full attestation chain published. This gives users a verifiable record of which hardware handled their request.

The architecture is similar in spirit to Intel SGX trusted execution environments, but leverages Apple’s existing consumer hardware rather than requiring specialized server chips. Apple Silicon’s Secure Enclave is more mature and better audited than early SGX implementations were, which matters.

What’s running on it

The current model roster includes Gemma 4 26B, Qwen3.5 variants (27B and a 122B MoE), and MiniMax M2.5 at 239B parameters. For image generation there’s FLUX.2, and speech-to-text uses Cohere Transcribe. A single M4 Max Mac can run 235-billion-parameter models at meaningful throughput; electricity costs run $0.01–$0.03/hour depending on workload.

What to be skeptical about

The GitHub repository is at Layr-Labs/d-inference with v0.3.8 tagged April 15. The project description explicitly says: “actively under development and has not been audited.” That’s honest. Security claims about hardware attestation need external audits to be meaningful — the design can be sound while the implementation has bugs that break the guarantees.

The privacy argument also depends on Apple’s Secure Enclave working as documented, which is a reasonable assumption but not a verified one for this specific use case. A motivated operator with physical access to a machine has side channels that software protections can’t close.

The economic model also raises questions about adversarial operators. Nodes earn revenue, which creates incentives for operators to run many nodes and optimize them in ways that might not match user expectations. The coordinator selection logic isn’t public yet.

None of this means the project is broken. It means it’s a research preview that’s applying a genuinely interesting design to a real problem, and should be treated as such. The CUDA-Q-style question of whether distributed consumer hardware can actually supplement data center inference at meaningful scale remains open. But the approach is worth tracking: if verified confidential computing on consumer hardware works at inference speeds, the privacy implications for AI are real.