# LlamaCppEx v0.8.27 - API Reference

## Modules

- [LlamaCppEx.ChatCompletion](LlamaCppEx.ChatCompletion.md): OpenAI-compatible chat completion response struct.
- [LlamaCppEx.ChatCompletionChunk](LlamaCppEx.ChatCompletionChunk.md): OpenAI-compatible streaming chat completion chunk struct.
- [LlamaCppEx.MTP](LlamaCppEx.MTP.md): Multi-Token Prediction (MTP) speculative decoding.
- [LlamaCppEx.ModelManager](LlamaCppEx.ModelManager.md): Holds multiple models resident and routes requests to them by id.
- [LlamaCppEx.ModelManager.Backend](LlamaCppEx.ModelManager.Backend.md): Behaviour for the model I/O the manager performs on the write path.
- [LlamaCppEx.ModelManager.Budget](LlamaCppEx.ModelManager.Budget.md): Advisory, placement-aware memory budgeting for `LlamaCppEx.ModelManager`.
- [LlamaCppEx.ModelManager.Entry](LlamaCppEx.ModelManager.Entry.md): A single resident-model record held in the `LlamaCppEx.ModelManager` ETS table.
- [LlamaCppEx.ModelManager.ModelIO](LlamaCppEx.ModelManager.ModelIO.md): Default `LlamaCppEx.ModelManager.Backend` implementation.
- [LlamaCppEx.ModelSupervisor](LlamaCppEx.ModelSupervisor.md): Opt-in supervisor for the multi-model manager.
- [LlamaCppEx.Server.Strategy.Batch](LlamaCppEx.Server.Strategy.Batch.md): Shared batch-assembly helpers used by the batching strategies.
- [LlamaCppEx.Thinking](LlamaCppEx.Thinking.md): Parser for `<think>...</think>` blocks in thinking model output.

- High-Level API
  - [LlamaCppEx](LlamaCppEx.md): Elixir bindings for llama.cpp.

- Core Modules
  - [LlamaCppEx.Chat](LlamaCppEx.Chat.md): Chat template formatting using llama.cpp's Jinja template engine.
  - [LlamaCppEx.Context](LlamaCppEx.Context.md): Inference context with KV cache.

  - [LlamaCppEx.Embedding](LlamaCppEx.Embedding.md): Generate embeddings from text using an embedding model.
  - [LlamaCppEx.Grammar](LlamaCppEx.Grammar.md): Converts JSON Schema to GBNF grammar for constrained generation.
  - [LlamaCppEx.Hub](LlamaCppEx.Hub.md): Download GGUF models from HuggingFace Hub.
  - [LlamaCppEx.Model](LlamaCppEx.Model.md): Model loading and introspection.

  - [LlamaCppEx.Sampler](LlamaCppEx.Sampler.md): Token sampling configuration.
  - [LlamaCppEx.Schema](LlamaCppEx.Schema.md): Converts Ecto schema modules to JSON Schema maps for structured output.
  - [LlamaCppEx.Server](LlamaCppEx.Server.md): GenServer for continuous batched multi-sequence inference.
  - [LlamaCppEx.Tokenizer](LlamaCppEx.Tokenizer.md): Text tokenization and detokenization.

- Batching Strategies
  - [LlamaCppEx.Server.BatchStrategy](LlamaCppEx.Server.BatchStrategy.md): Behavior for batch building strategies.
  - [LlamaCppEx.Server.Strategy.Balanced](LlamaCppEx.Server.Strategy.Balanced.md): Balanced batching strategy.
  - [LlamaCppEx.Server.Strategy.DecodeMaximal](LlamaCppEx.Server.Strategy.DecodeMaximal.md): Decode-maximal batching strategy.
  - [LlamaCppEx.Server.Strategy.PrefillPriority](LlamaCppEx.Server.Strategy.PrefillPriority.md): Prefill-priority batching strategy.

