Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Options reference

Every option that any ext-infer method accepts, in one table per method. For conceptual context on individual options, follow the links in the rightmost column.

Model::load($path, $options)

The second argument is an associative array. Keys are kept as snake-case strings (like PHP ini settings) because load-time tuning is rare and the array form composes well with config arrays loaded from disk.

KeyTypeDefaultSee
n_gpu_layersint0Performance tuning
use_mmapbooltruePerformance tuning
use_mlockboolfalsePerformance tuning
embeddingboolfalseEmbeddings
poolingstring'unspecified'Embeddings

Validation rules

  • Unknown keys are not rejected — they’re silently ignored. This is deliberate (forward-compatibility for callers loading config from files), but it means typos will be silent. If you suspect a typo, verify with var_dump against the same string before reporting a bug.
  • Type mismatches are rejected, with a clear message: invalid option n_gpu_layers: expected integer.
  • Negative integers and out-of-range values for n_gpu_layers are rejected: invalid option n_gpu_layers: must be non-negative.
  • pooling accepts only the six strings listed in Embeddings → Pooling.

Model::chat($prompt, ...)

Named arguments — no array. PHP 8.0+ named-arguments syntax echoes the ident verbatim, so you write maxTokens: 256 (camelCase, per PSR-12).

ArgumentTypeDefaultSee
$prompt\Displace\Infer\PromptrequiredPrompts
maxTokensint128Chat completions
nCtxint2048Chat completions
temperaturefloat0.0Chat completions
seedint1234Chat completions

Behavior

  • temperature = 0.0 is greedy (deterministic). > 0.0 samples, controlled by seed.
  • seed is only consulted when temperature > 0.
  • maxTokens caps generation. Hitting it sets Response::finishReason() to 'length'.
  • nCtx is the context window for this call. If the rendered prompt exceeds it, InferenceException is raised before generation starts.

Model::raw($prompt, ...)

Same named-argument shape as chat() plus addBos.

ArgumentTypeDefaultSee
$promptstringrequiredRaw completions
maxTokensint128Chat completions
nCtxint2048Chat completions
temperaturefloat0.0Chat completions
seedint1234Chat completions
addBosbooltrueRaw completions → addBos

Model::embed($text)

Just the text. Pooling and embedding-mode are configured at load time (see Model::load above).

ArgumentTypeDefaultSee
$textstringrequiredEmbeddings

Embedding math

Embedding is read-only; the math methods return new instances rather than mutating.

MethodReturns
vector()list<float>
dimensions()int
norm()float
normalize()new Embedding
cosineSimilarity(Embedding $other)float (in [-1, 1])

cosineSimilarity throws InferenceException on a dimension mismatch — see Embeddings → vector math.

Prompt

Static factories + immutable with* builders.

MethodReturns
Prompt::system($content)new Prompt
Prompt::user($content)new Prompt
withSystem($content)new Prompt
withUser($content)new Prompt
withAssistant($content)new Prompt
messages()list<Message>
lastRole()?string
count()int
isEmpty()bool

See Prompts for the immutability semantics.

Response

Read-only. Six getters.

MethodReturns
text()string
reasoning()?string
answer()string
hasReasoning()bool
finishReason()string'eos'/'length'/'stop'
tokensGenerated()int

See Chat completions → Inspecting a Response.

Environment

Not strictly an option, but bears mentioning here:

VariableEffect
EXT_INFER_LOG=1Restore llama.cpp’s verbose stderr logging (silenced by default).

See Environment variables.