Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

ext-infer is a PHP 8.3+ extension that loads a GGUF model and runs LLM inference inside the PHP process via llama.cpp. PHP-native semantic search, RAG pipelines, and CLI / worker inference run without shelling out to Python or hitting a remote API.

It is written in Rust on top of ext-php-rs and the llama-cpp-2 bindings. The public PHP surface is designed to feel native: a fluent, role-aware Prompt builder; a Response that splits reasoning from answer; an Embedding that knows how to normalize itself and compute cosine similarity. You should rarely, if ever, need to think about <|im_start|> tokens.

Why an extension?

Three reasons local inference belongs in PHP rather than next to it:

  • Latency. A subprocess fork or HTTP roundtrip is at least milliseconds, often tens. An in-process call is bounded only by decode time.
  • Operational surface. No Python sidecar to package, no daemon to supervise, no inference server to scale alongside FPM. The PHP process is the inference server.
  • API ergonomics. Calling a local LLM should be as natural in PHP as calling intl or pdo. The extension API is shaped to match that — see Prompts and Chat completions.

What’s here

This guide is split into five layers, navigable from the sidebar:

SectionWhat you’ll find
Getting StartedInstall, run hello-world, verify it loaded.
GuideConceptual walkthroughs of each public class. Read in order on first pass.
RecipesCopy-paste-ready patterns: multi-turn chat, semantic search, RAG, worker pools.
ReferenceComplete API listing, exceptions, environment variables, compatibility matrix.
AdvancedThreading model, Apple Metal, performance tuning.

Status

ext-infer is pre-release — the class surface is stable but the first tagged release (v0.1.0) is still in flight. See RELEASE.md for the cut-a-release flow and PLAN.md for what’s coming next.

Conventions in this guide

  • Code blocks are runnable as written, with one exception: PHP code assumes the extension is loaded. Either install it system-wide or prepend -d extension=… to your php command. See Installation.
  • Model without a namespace prefix means Displace\Infer\Model; same for Prompt, Response, Embedding. Real code needs the use statement at the top of the file.
  • CLI snippets are written for a POSIX shell (bash / zsh). Adjust for fish / PowerShell as needed; differences are usually only quoting.