tools/compare-models/README.md
A standalone CLI tool to compare the tagging performance of AI models using your existing Karakeep bookmarks.
Required environment variables:
# Karakeep API configuration
KARAKEEP_API_KEY=your_api_key_here
KARAKEEP_SERVER_ADDR=https://your-karakeep-instance.com
# Comparison mode (default: model-vs-model)
# - "model-vs-model": Compare two models against each other
# - "model-vs-existing": Compare a model against existing AI tags
COMPARISON_MODE=model-vs-model
# Models to compare
# MODEL1_NAME: The new model to test (always required)
# MODEL2_NAME: The second model to compare against (required only for model-vs-model mode)
MODEL1_NAME=gpt-4o-mini
MODEL2_NAME=claude-3-5-sonnet
# OpenAI/OpenRouter API configuration (for running inference)
OPENAI_API_KEY=your_openai_or_openrouter_key
OPENAI_BASE_URL=https://openrouter.ai/api/v1 # Optional, defaults to OpenAI
# Optional: Number of bookmarks to test (default: 10)
COMPARE_LIMIT=10
For OpenRouter, set:
OPENAI_BASE_URL=https://openrouter.ai/api/v1
OPENAI_API_KEY=your_openrouter_key
For OpenAI directly:
OPENAI_API_KEY=your_openai_key
# OPENAI_BASE_URL can be omitted for direct OpenAI
cd tools/compare-models
pnpm install
pnpm run
Create a .env file:
KARAKEEP_API_KEY=your_api_key
KARAKEEP_SERVER_ADDR=https://your-karakeep-instance.com
MODEL1_NAME=gpt-4o-mini
MODEL2_NAME=claude-3-5-sonnet
OPENAI_API_KEY=your_openai_key
COMPARE_LIMIT=10
Then run:
pnpm run
If you prefer to run the compiled JavaScript directly:
pnpm build
export KARAKEEP_API_KEY=your_api_key
export KARAKEEP_SERVER_ADDR=https://your-karakeep-instance.com
export MODEL1_NAME=gpt-4o-mini
export MODEL2_NAME=claude-3-5-sonnet
export OPENAI_API_KEY=your_openai_key
node dist/index.js
Compare two different AI models against each other:
COMPARISON_MODE=model-vs-model
MODEL1_NAME=gpt-4o-mini
MODEL2_NAME=claude-3-5-sonnet
This mode runs inference with both models on each bookmark and lets you choose which tags are better.
Compare a new model against existing AI-generated tags on your bookmarks:
COMPARISON_MODE=model-vs-existing
MODEL1_NAME=gpt-4o-mini
# MODEL2_NAME is not required in this mode
This mode is useful for:
Note: This mode only compares bookmarks that already have AI-generated tags (tags with attachedBy: "ai"). Bookmarks without AI tags are automatically filtered out.
The tool fetches your latest link bookmarks from Karakeep
For each bookmark, it randomly assigns the options to "Model A" or "Model B" and runs tagging
You'll see a side-by-side comparison (randomly shuffled each time):
=== Bookmark 1/10 ===
How to Build Better AI Systems
https://example.com/article
This article explores modern approaches to...
─────────────────────────────────────
Model A (blind):
• ai
• machine-learning
• engineering
Model B (blind):
• artificial-intelligence
• ML
• software-development
─────────────────────────────────────
Which tags do you prefer? [1=Model A, 2=Model B, s=skip, q=quit] >
Choose your preference:
1 - Vote for Model A2 - Vote for Model Bs or skip - Skip this comparisonq or quit - Exit early and show current resultsAfter completing all comparisons (or quitting early), results are displayed:
───────────────────────────────────────
=== FINAL RESULTS ===
───────────────────────────────────────
gpt-4o-mini: 6 votes
claude-3-5-sonnet: 3 votes
Skipped: 1
Errors: 0
───────────────────────────────────────
Total bookmarks tested: 10
🏆 WINNER: gpt-4o-mini
───────────────────────────────────────
The actual model names are only shown in the final results - during voting you see only "Model A" and "Model B"
The tool currently tests only:
attachedBy: "ai")This tool leverages Karakeep's shared infrastructure:
@karakeep/sdk for type-safe API interactions with proper authentication@karakeep/shared/inference for OpenAI client with structured output support@karakeep/shared/prompts for consistent tagging prompt generation with token managementTo build a standalone binary:
pnpm build
The built binary will be in dist/index.js.
includeContent=true from Karakeep API@karakeep/sdk) for type-safe API interactionspnpm run for the best experience (uses tsx for development)