Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Compatibility matrix

PHP versions

VersionStatusNotes
8.3✅ supportedSecurity-only upstream through end of 2026.
8.4✅ supportedActive support.
8.5✅ supportedCurrent release.
8.2 and earlier❌ not supportedcomposer.json declares php: ^8.3.

Every released binary is built against a specific PHP minor. A binary built for PHP 8.4 will not load into PHP 8.5 or 8.3. PIE handles this automatically (it picks the right tarball); manual installs need to match versions explicitly.

Operating systems

PlatformStatusNotes
macOS arm64✅ supportedApple Silicon. Tested on macOS 14+.
macOS x86_64⚠️ not in release matrixBuilds from source. We don’t ship binaries.
Linux x86_64 (glibc)✅ supportedUbuntu 22.04+, Debian 12+, RHEL 9+. Most modern distros.
Linux arm64 (glibc)✅ supportedUbuntu 24.04 arm64, Debian 12 arm64, AWS Graviton.
Linux musl (Alpine)⚠️ builds from source.cargo/config.toml has the right crt-static opt-out; no released binary.
FreeBSD / OpenBSD⚠️ builds from sourceUntested but should work; the build script handles non-Linux non-macOS as Linux.
Windows❌ excludedos-families-exclude: ["windows"] in composer.json. Out of scope for v0.1.

Threading

ext-infer is thread-safe by design — the LlamaBackend singleton is guarded by a Sync mutex, the underlying LlamaModel’s weights are read-only after load (llama.cpp explicitly supports many contexts on one model), and each chat() / raw() / embed() call builds its own per-call LlamaContext. Two threads calling Model::chat() concurrently on the same handle is the supported, intended shape.

PHP buildStatusNotes
NTS✅ supported, the defaultWhat every release binary targets today.
ZTS✅ supported (support-zts: true in composer.json)Not yet exercised in CI. See Threading & ZTS.

Acceleration backends

BackendStatusNotes
CPU (default)✅ supported, the defaultPortable, no hardware requirements.
Apple Metal⚠️ opt-in via cargo featuremake release FEATURES=metal. See Apple Metal.
CUDA (NVIDIA GPU)❌ not yetllama-cpp-2 supports it via a cargo feature; we haven’t exposed or tested it.
ROCm / Vulkan❌ not yetSame — supported upstream, not surfaced.

If you want CUDA or other GPU acceleration sooner rather than later, open an issue describing your use case — surfacing the feature is small work; testing it across the GPU landscape is the hard part.

Tested model families

What the maintainers have actually exercised end-to-end. Other GGUF-supported families almost certainly work; this is the “we’ve seen it produce sensible output” list.

FamilyUsed for
Qwen3 (Instruct)Chat completions, reasoning splitting.
Qwen3-EmbeddingEmbeddings, cosine similarity.
Llama 3 / 3.1 / 3.2Chat completions. No reasoning.
MistralChat completions.
BGE / E5 / GTEEmbeddings.

Versioning policy

  • Pre-1.0 (0.x.y), breaking changes happen between minors (0.1.x0.2.x), not patches.
  • Once v1.0.0 ships, the class / method / argument surface is frozen. New features land additively; behavioral changes that affect existing callers wait for the next major.
  • See RELEASE.md for the cut-a-release flow.

Reporting compatibility issues

If you hit a “should work but doesn’t” combination on this matrix, the issue template asks for:

  • PHP version (php --version)
  • OS / arch (uname -a)
  • libc (Linux: ldd --version | head -1)
  • ZTS or NTS (php -i | grep 'Thread Safety')
  • Whether the extension was installed via PIE, make install, or loaded with -d extension=…

Three of those four are usually enough to triage.