docs/GitHub-Models-Setup.md
This guide will walk you through setting up and using GitHub Models with Fabric CLI. GitHub Models provides free access to multiple AI models from OpenAI, Meta, Microsoft, DeepSeek, xAI, and other providers using only your GitHub credentials.
GitHub Models is a free AI inference API platform that allows you to access multiple AI models using only your GitHub account. It's powered by Azure AI infrastructure and provides:
GitHub Models uses Personal Access Tokens (PAT) instead of separate API keys.
Sign in to GitHub at github.com
Navigate to Token Settings:
Generate New Token:
Fabric CLI - GitHub Modelsmodels:read scopeSave Your Token:
github_pat_ or ghp_)GITHUB_TOKENThis is the easiest and safest method:
Run Fabric Setup:
fabric --setup
Select GitHub from the Menu:
[8] GitHub (configured) or similar8) and press EnterEnter Your GitHub Token:
github_pat_ or ghp_)Verify Base URL (Optional):
https://models.github.ai/inferenceSave and Exit:
If you prefer to manually edit the configuration file:
Edit Environment File:
nano ~/.config/fabric/.env
Add GitHub Configuration:
# GitHub Models API Key (your Personal Access Token)
GITHUB_API_KEY=github_pat_YOUR_TOKEN_HERE
# GitHub Models API Base URL (default, usually don't need to change)
GITHUB_API_BASE_URL=https://models.github.ai/inference
Save and exit (Ctrl+X, then Y, then Enter)
Note: The environment variable is GITHUB_API_KEY, not GITHUB_TOKEN.
Check that your configuration is properly set:
grep GITHUB_API_KEY ~/.config/fabric/.env
You should see:
GITHUB_API_KEY=github_pat_...
Or run setup again to verify:
fabric --setup
Look for [8] GitHub (configured) in the list.
Verify that Fabric can connect to GitHub Models and fetch the model list:
fabric --listmodels | grep GitHub
Expected Output:
Available models:
...
$ fabric -L | grep GitHub
[65] GitHub|ai21-labs/ai21-jamba-1.5-large
[66] GitHub|cohere/cohere-command-a
[67] GitHub|cohere/cohere-command-r-08-2024
[68] GitHub|cohere/cohere-command-r-plus-08-2024
[69] GitHub|deepseek/deepseek-r1
[70] GitHub|deepseek/deepseek-r1-0528
[71] GitHub|deepseek/deepseek-v3-0324
[72] GitHub|meta/llama-3.2-11b-vision-instruct
[73] GitHub|meta/llama-3.2-90b-vision-instruct
... (and more)
Test a basic chat completion with a small, fast model:
# Use gpt-4o-mini (fast and has generous rate limits)
fabric --vendor GitHub -m openai/gpt-4o-mini 'Why is th
e sky blue?'
Expected: You should see a response explaining Rayleigh scattering.
Tip: Model names from --listmodels can be used directly (e.g., openai/gpt-4o-mini, openai/gpt-4o, meta/llama-4-maverick-17b-128e-instruct-fp8).
Use one of Fabric's built-in patterns:
echo "Artificial intelligence is transforming how we work and live." | \
fabric --pattern summarize --vendor GitHub --model "openai/gpt-4o-mini"
Verify streaming responses work:
echo "Count from 1 to 100" | \
fabric --vendor GitHub --model "openai/gpt-4o-mini" --stream
You should see the response appear progressively, word by word.
Try a Meta Llama model:
# Use a Llama model
echo "Explain quantum computing" | \
fabric --vendor GitHub --model "meta/Meta-Llama-3.1-8B-Instruct"
--listmodels shows GitHub modelsGitHub Models provides access to models from multiple providers. Models use the format: {publisher}/{model-name}
| Model ID | Description | Tier | Best For |
|---|---|---|---|
openai/gpt-4.1 | Latest flagship GPT-4 | High | Complex tasks, reasoning |
openai/gpt-4o | Optimized GPT-4 | High | General purpose, fast |
openai/gpt-4o-mini | Compact, cost-effective | Low | Quick tasks, high volume |
openai/o1 | Advanced reasoning | High | Complex problem solving |
openai/o3 | Next-gen reasoning | High | Cutting-edge reasoning |
| Model ID | Description | Tier | Best For |
|---|---|---|---|
meta/llama-3.1-405b | Largest Llama model | High | Complex tasks, accuracy |
meta/llama-3.1-70b | Mid-size Llama | Low | Balanced performance |
meta/llama-3.1-8b | Compact Llama | Low | Fast, efficient tasks |
| Model ID | Description | Tier | Best For |
|---|---|---|---|
microsoft/phi-4 | Latest Phi generation | Low | Efficient reasoning |
microsoft/phi-3-medium | Mid-size variant | Low | General tasks |
microsoft/phi-3-mini | Smallest Phi | Low | Quick, simple tasks |
| Model ID | Description | Tier | Special |
|---|---|---|---|
deepseek/deepseek-r1 | Reasoning model | Very Limited | 8 requests/day |
deepseek/deepseek-r1-0528 | Updated version | Very Limited | 8 requests/day |
| Model ID | Description | Tier | Special |
|---|---|---|---|
xai/grok-3 | Latest Grok | Very Limited | 15 requests/day |
xai/grok-3-mini | Smaller Grok | Very Limited | 15 requests/day |
To see all currently available models:
fabric --listmodels | grep GitHub
Or for a formatted list with details, you can query the GitHub Models API directly:
curl -H "Authorization: Bearer $GITHUB_TOKEN" \
-H "X-GitHub-Api-Version: 2022-11-28" \
https://models.github.ai/catalog/models | jq '.[] | {id, publisher, tier: .rate_limit_tier}'
GitHub Models has tiered rate limits based on model complexity. Understanding these helps you use the free tier effectively.
Models: gpt-4o-mini, llama-3.1-*, phi-*
Best practices: Use these for most Fabric patterns and daily tasks.
Models: gpt-4.1, gpt-4o, o1, o3, llama-3.1-405b
Best practices: Save for complex tasks, important queries, or when you need maximum quality.
Models: deepseek-r1, grok-3
Best practices: Use only for special experiments or when you specifically need these models.
If you have a GitHub Copilot subscription, you get higher limits:
You'll receive an HTTP 429 error with a message like:
Rate limit exceeded. Try again in X seconds.
Fabric will display this error. Wait for the reset time and try again.
gpt-4o-mini, llama-3.1-8b)Cause: Invalid or missing GitHub token
Solutions:
Verify token is in .env file:
grep GITHUB_API_KEY ~/.config/fabric/.env
Check token has models:read permission:
Re-run setup to reconfigure:
fabric --setup
# Select GitHub (number 8 or similar)
# Enter your token again
Generate a new token if needed (tokens expire)
Cause: Too many requests in a short time period
Solutions:
Check which tier your model is in (see Rate Limits)
Wait for the reset (check error message for wait time)
Switch to a lower-tier model:
# Instead of gpt-4.1 (high tier)
fabric --vendor GitHub --model openai/gpt-4.1 ...
# Use gpt-4o-mini (low tier)
fabric --vendor GitHub --model openai/gpt-4o-mini ...
Cause: Model name format incorrect or model not available
Solutions:
Use correct format: {publisher}/{model-name}, e.g., openai/gpt-4o-mini
# ❌ Wrong
fabric --vendor GitHub --model gpt-4o-mini
# ✅ Correct
fabric --vendor GitHub --model openai/gpt-4o-mini
List available models to verify name:
fabric --listmodels --vendor GitHub | grep -i "gpt-4"
Cause: API endpoint issue or authentication problem
Solutions:
Test direct API access:
curl -H "Authorization: Bearer $GITHUB_TOKEN" \
-H "X-GitHub-Api-Version: 2022-11-28" \
https://models.github.ai/catalog/models
If curl works but Fabric doesn't, rebuild Fabric:
cd /path/to/fabric
go build ./cmd/fabric
Check for network/firewall issues blocking models.github.ai
Cause: This should be fixed in the latest version with direct fetch fallback
Solutions:
FetchModelsDirectly fallbackCause: High tier models have limited concurrency, or GitHub Models API congestion
Solutions:
Switch to faster models:
openai/gpt-4o-mini instead of gpt-4.1meta/llama-3.1-8b instead of llama-3.1-405bCheck your internet connection
Try again later (API may be experiencing high traffic)
Cause: Tokens have expiration dates or can be revoked
Solutions:
.env file with new tokenYou can specify which model to use with any pattern:
# Use GPT-4.1 with the analyze_claims pattern
cat article.txt | fabric --pattern analyze_claims \
--vendor GitHub --model openai/gpt-4.1
# Use Llama for summarization
cat document.txt | fabric --pattern summarize \
--vendor GitHub --model meta/llama-3.1-70b
Set default models for specific patterns using environment variables:
Edit ~/.config/fabric/.env:
# Use GPT-4.1 for complex analysis
FABRIC_MODEL_analyze_claims=GitHub|openai/gpt-4.1
FABRIC_MODEL_extract_wisdom=GitHub|openai/gpt-4.1
# Use GPT-4o-mini for simple tasks
FABRIC_MODEL_summarize=GitHub|openai/gpt-4o-mini
FABRIC_MODEL_extract_article_wisdom=GitHub|openai/gpt-4o-mini
# Use Llama for code tasks
FABRIC_MODEL_explain_code=GitHub|meta/llama-3.1-70b
Now when you run:
cat article.txt | fabric --pattern analyze_claims
It will automatically use GitHub|openai/gpt-4.1 without needing to specify the vendor and model.
Compare how different models respond to the same input:
# OpenAI GPT-4o-mini
echo "Explain quantum computing" | \
fabric --vendor GitHub --model openai/gpt-4o-mini > response_openai.txt
# Meta Llama
echo "Explain quantum computing" | \
fabric --vendor GitHub --model meta/llama-3.1-70b > response_llama.txt
# Microsoft Phi
echo "Explain quantum computing" | \
fabric --vendor GitHub --model microsoft/phi-4 > response_phi.txt
# Compare
diff response_openai.txt response_llama.txt
Find the best model for your use case:
# Create a test script
cat > test_models.sh << 'EOF'
#!/bin/bash
INPUT="Explain the concept of recursion in programming"
PATTERN="explain_code"
for MODEL in "openai/gpt-4o-mini" "meta/llama-3.1-8b" "microsoft/phi-4"; do
echo "=== Testing $MODEL ==="
echo "$INPUT" | fabric --pattern "$PATTERN" --vendor GitHub --model "$MODEL"
echo ""
done
EOF
chmod +x test_models.sh
./test_models.sh
If you want to quickly test without running full setup, you can set the environment variable directly:
# Temporary test (this session only)
export GITHUB_API_KEY=github_pat_YOUR_TOKEN_HERE
# Test immediately
fabric --listmodels --vendor GitHub
This is useful for quick tests, but we recommend using fabric --setup for permanent configuration.
For long-form content, use streaming to see results as they generate:
cat long_article.txt | \
fabric --pattern summarize \
--vendor GitHub --model openai/gpt-4o-mini \
--stream
Monitor your usage to stay within rate limits:
# Create a simple usage tracker
echo "$(date): Used gpt-4.1 for analyze_claims" >> ~/.config/fabric/usage.log
# Check daily usage
grep "$(date +%Y-%m-%d)" ~/.config/fabric/usage.log | wc -l
Create different profiles for different use cases:
# Development profile (uses free GitHub Models)
cat > ~/.config/fabric/.env.dev << EOF
GITHUB_TOKEN=github_pat_dev_token_here
DEFAULT_VENDOR=GitHub
DEFAULT_MODEL=openai/gpt-4o-mini
EOF
# Production profile (uses paid OpenAI)
cat > ~/.config/fabric/.env.prod << EOF
OPENAI_API_KEY=sk-prod-key-here
DEFAULT_VENDOR=OpenAI
DEFAULT_MODEL=gpt-4
EOF
# Switch profiles
ln -sf ~/.config/fabric/.env.dev ~/.config/fabric/.env
GitHub Models provides an excellent way to experiment with AI models through Fabric without managing multiple API keys or incurring costs. Key points:
✅ Free to start: No credit card required, 50-150 requests/day
✅ Multiple providers: OpenAI, Meta, Microsoft, DeepSeek, xAI
✅ Simple setup: Just one GitHub token via fabric --setup
✅ Great for learning: Try different models and patterns
✅ Production path: Can upgrade to paid tier when ready
# 1. Get GitHub token with models:read scope from:
# https://github.com/settings/tokens
# 2. Configure Fabric
fabric --setup
# Select [8] GitHub
# Paste your token when prompted
# 3. List available models
fabric --listmodels --vendor GitHub | grep gpt-4o
# 4. Try it out with gpt-4o-mini
echo "What is AI?" | fabric --vendor GitHub --model "gpt-4o-mini"
Recommended starting point: Use gpt-4o-mini for most patterns - it's fast, capable, and has generous rate limits (150 requests/day).
Available Models: gpt-4o, gpt-4o-mini, Meta-Llama-3.1-8B-Instruct, Meta-Llama-3.1-70B-Instruct, Mistral-large-2407, and more. Use --listmodels to see the complete list.
Happy prompting! 🚀