Open models done the easy way.
No cloud. No API key. Your GPU, your models, your data.
Simple by design
koda pull llama3.2
Download a model from HuggingFace
koda run llama3.2
Chat in your terminal, streamed live
koda serve
Start the local API on :11434
koda list
See everything you've downloaded
Models
| Command | Description | Size |
|---|---|---|
llama3.2 | Meta Llama 3.2 3B Instruct | 2.0 GB |
llama3.1 | Meta Llama 3.1 8B Instruct | 4.9 GB |
mistral | Mistral 7B Instruct v0.3 | 4.4 GB |
phi3 | Microsoft Phi-3 Mini 4K | 2.2 GB |
gemma2 | Google Gemma 2 2B Instruct | 1.6 GB |
qwen2.5 | Qwen 2.5 7B Instruct | 4.7 GB |
deepseek-r1 | DeepSeek R1 Distill Qwen 7B | 4.7 GB |
OpenAI-compatible API
Drop Koda anywhere you use OpenAI — same API, fully local.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="koda",
)
response = client.chat.completions.create(
model="llama3.2",
messages=[{"role": "user", "content": "hello"}],
)
print(response.choices[0].message.content)