LlamaCppEx.Embedding (LlamaCppEx v0.8.36)

Generate embeddings from text using an embedding model.

Summary

Functions

embed(model, text, opts \\ [])

Computes an embedding for a single text.

embed_batch(model, texts, opts \\ [])

Computes embeddings for multiple texts.

Types

t()

@type t() :: [float()]

Functions

embed(model, text, opts \\ [])

@spec embed(LlamaCppEx.Model.t(), String.t(), keyword()) ::
  {:ok, t() | binary()} | {:error, String.t()}

Computes an embedding for a single text.

Options

:n_ctx - Context size. Defaults to 2048.
:pooling_type - Pooling type. Defaults to :unspecified (model's default). Values: :unspecified, :none, :mean, :cls, :last.
:normalize - Normalization mode. 2 = L2 (default), 0 = max-abs, -1 = none.
:format - :list (default) returns a list of floats; :binary returns the raw native-endian f32 binary as produced by the NIF — zero-copy and directly loadable via Nx.from_binary(bin, :f32).

embed_batch(model, texts, opts \\ [])

@spec embed_batch(LlamaCppEx.Model.t(), [String.t()], keyword()) ::
  {:ok, [t()]} | {:error, String.t()}

Computes embeddings for multiple texts.

Packs multiple texts into a single context as distinct sequences and decodes them in batches, rather than allocating a fresh context (and KV cache) per text. Accepts the same options as embed/3, plus:

:max_batch_sequences - Max texts per decode batch. Defaults to 64.

Pooled embeddings only. When :pooling_type is :none (no per-sequence pooled vector exists), falls back to one context per text.