Open models done the easy way.

No cloud. No API key. Your GPU, your models, your data.

Simple by design

koda pull llama3.2 Download a model from HuggingFace
koda run llama3.2 Chat in your terminal, streamed live
koda serve Start the local API on :11434
koda list See everything you've downloaded

Models

Command Description Size
llama3.2Meta Llama 3.2 3B Instruct2.0 GB
llama3.1Meta Llama 3.1 8B Instruct4.9 GB
mistralMistral 7B Instruct v0.34.4 GB
phi3Microsoft Phi-3 Mini 4K2.2 GB
gemma2Google Gemma 2 2B Instruct1.6 GB
qwen2.5Qwen 2.5 7B Instruct4.7 GB
deepseek-r1DeepSeek R1 Distill Qwen 7B4.7 GB

OpenAI-compatible API

Drop Koda anywhere you use OpenAI — same API, fully local.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="koda",
)

response = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "hello"}],
)
print(response.choices[0].message.content)