Open models done the easy way.

No cloud. No API key. Your GPU, your models, your data.

Simple by design

koda pull llama3.2 Download a model from HuggingFace

koda run llama3.2 Chat in your terminal, streamed live

koda serve Start the local API on :11434

koda list See everything you've downloaded

Models

Command	Description	Size
`llama3.2`	Meta Llama 3.2 3B Instruct	2.0 GB
`llama3.1`	Meta Llama 3.1 8B Instruct	4.9 GB
`mistral`	Mistral 7B Instruct v0.3	4.4 GB
`phi3`	Microsoft Phi-3 Mini 4K	2.2 GB
`gemma2`	Google Gemma 2 2B Instruct	1.6 GB
`qwen2.5`	Qwen 2.5 7B Instruct	4.7 GB
`deepseek-r1`	DeepSeek R1 Distill Qwen 7B	4.7 GB

OpenAI-compatible API

Drop Koda anywhere you use OpenAI — same API, fully local.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="koda",
)

response = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "hello"}],
)
print(response.choices[0].message.content)