context/chats/max-min-tokens.txt.md
We want to refine how Task Master handles AI model token limits to be more precise, by:
maxInputTokens and maxOutputTokens in the configuration.maxOutputTokens for an API call based on the actual prompt length to stay within the model's total context window (or respecting separate input/output limits if the API and model support that).ai-services-unified.js uses these more granular limits.This is a good improvement for both cost control and preventing errors.
Here's a plan:
Phase 1: Configuration and Core Logic Updates
.taskmasterconfig Structure:
models section in your .taskmasterconfig. For each role (main, research, fallback), maxTokens will be replaced with maxInputTokens and maxOutputTokens.
maxTokens and the model's known limits to make an initial guess.MODEL_MAP in ai-services-unified.js:
model_max_input_tokens, model_max_output_tokens). If these fields are not present, they will need to be added. The values in .taskmasterconfig will then represent user-defined operational limits, which should ideally be validated against these absolute maximums.config-manager.js:
getParametersForRole will be updated to fetch maxInputTokens and maxOutputTokens instead of the singular maxTokens.MODEL_MAP via config-manager.js.ai-services-unified.js (_unifiedServiceRunner):
systemPrompt and userPrompt.
tokenizer or a way to count tokens for a given string.tiktoken for OpenAI/Anthropic models or investigate provider-specific tokenization. This could be complex as tokenization varies between models.configured_max_input_tokens and configured_max_output_tokens from config-manager.js for the current role.model_absolute_max_input_tokens and model_absolute_max_output_tokens from MODEL_MAP.prompt_tokens = countTokens(systemPrompt + userPrompt).prompt_tokens > configured_max_input_tokens OR prompt_tokens > model_absolute_max_input_tokens, throw an error. (We might also consider auto-truncation strategies in the future, but an error is safer to start).max_tokens parameter sent to the API (which typically refers to the max number of tokens to generate) should be configured_max_output_tokens. We also need to ensure this doesn't violate the model's absolute output limit.max_tokens parameter for the API call might need to be min(configured_max_output_tokens, model_absolute_total_tokens - prompt_tokens). However, many modern APIs handle separate input and output limits, simplifying this. We'll aim to pass configured_max_output_tokens as the API's max_tokens parameter, assuming the input validation has passed.callParams to use the correctly determined max_tokens for generation (which will be our configured_max_output_tokens).Phase 2: Error Handling and Refinements
config-manager.js or when task-master models --setup is run) to ensure configured_max_input_tokens and configured_max_output_tokens in .taskmasterconfig do not exceed the model_absolute_max_input_tokens and model_absolute_max_output_tokens from MODEL_MAP.Let's start by inspecting scripts/modules/config-manager.js to see how parameters and MODEL_MAP are currently handled. This will inform the changes needed for .taskmasterconfig and ai-services-unified.js.
I will now read scripts/modules/config-manager.js.
Okay, I've reviewed scripts/modules/config-manager.js. Here's what I've gathered and how it impacts the plan:
MODEL_MAP Source: MODEL_MAP is loaded from supported-models.json. This is excellent, as it means we can update the model capabilities (including new token limits) in a structured JSON file._loadAndValidateConfig handles loading .taskmasterconfig and merging it with DEFAULTS.maxTokens for each role (e.g., getMainMaxTokens). We will need to update these.getParametersForRole: This function already exists and is a good candidate for returning the new maxInputTokens and maxOutputTokens.Here's the refined plan:
Phase 1: Configuration and Data Structures
Update supported-models.json:
For each model entry in supported-models.json, we need to add maxInputTokens and maxOutputTokens fields. These should reflect the absolute maximums for that specific model.
Example entry change:
// Before
{
"id": "claude-3-7-sonnet-20250219",
"name": "Claude 3.7 Sonnet (Preview)",
"context_window": 200000, // This might be total context
"cost_per_1m_tokens": { "input": 3, "output": 15, "currency": "USD" }
}
// After (assuming 200k is total, and we make a reasonable split or find official numbers)
{
"id": "claude-3-7-sonnet-20250219",
"name": "Claude 3.7 Sonnet (Preview)",
"maxInputTokens": 190000, // Example
"maxOutputTokens": 8192, // Example, often smaller for generation
"cost_per_1m_tokens": { "input": 3, "output": 15, "currency": "USD" }
}
I will need to find the official input/output token limits for the models currently in your MODEL_MAP. If you have this information handy, it would speed things up. Otherwise, I can make educated guesses or search for them.
Update .taskmasterconfig (Defaults and User File):
scripts/modules/config-manager.js, modify the DEFAULTS object. For each role (main, research, fallback), replace maxTokens with:
maxInputTokens: A sensible default (e.g., a large portion of the model's capability, but user-configurable).maxOutputTokens: A sensible default for generation (e.g., 4096 or 8192)..taskmasterconfig file to reflect this new structure. I can provide the snippet for you to paste.Update config-manager.js:
getParametersForRole(role, explicitRoot = null):
maxTokens and temperature.maxInputTokens, maxOutputTokens, and temperature from the loaded config for the given role.getMaxTokens functions (e.g., getMainMaxTokens, getResearchMaxTokens, getFallbackMaxTokens). The getParametersForRole will be the central way to get these.getModelCapabilities(providerName, modelId) that reads from MODEL_MAP to return the absolute maxInputTokens and maxOutputTokens for a given model. This would be useful for validating the user's settings in .taskmasterconfig.Phase 2: Core Logic in ai-services-unified.js
Token Counting (_unifiedServiceRunner):
This is the most complex part. We need a reliable way to count tokens for the prompts.
tiktoken for models compatible with OpenAI's tokenization (many are, including some Anthropic models). For other models, we might need provider-specific tokenizers or make estimations (less ideal).tiktoken as a common case and acknowledge that other models might need specific handling later.The function _unifiedServiceRunner will call this token counting utility:
// Placeholder for token counting
function countTokens(text, modelId /* or providerName */) {
// ... implementation using SDK or library ...
// This is a simplification; actual tokenization is model-specific.
// For now, a rough estimate might be (text.length / 4) as a placeholder
// if a proper tokenizer isn't immediately available.
if (!text) return 0;
return Math.ceil(text.length / 3.5); // Very rough general estimate
}
const promptTokens = countTokens(systemPrompt) + countTokens(prompt);
Dynamic Output Token Calculation & Validation (_unifiedServiceRunner):
Fetch parameters using the updated getParametersForRole from config-manager.js:
const roleParams = getParametersForRole(
currentRole,
effectiveProjectRoot
);
// roleParams will now contain { maxInputTokens, maxOutputTokens, temperature }
Fetch model's absolute limits from MODEL_MAP (we'll need a helper in config-manager.js or do it directly in ai-services-unified.js for now):
// Simplified; ideally use a robust getter from config-manager
const modelInfo = MODEL_MAP[providerName?.toLowerCase()]?.find(
(m) => m.id === modelId
);
const modelAbsoluteMaxInput = modelInfo?.maxInputTokens || Infinity;
const modelAbsoluteMaxOutput =
modelInfo?.maxOutputTokens || roleParams.maxOutputTokens; // Fallback to configured if not in map
Input Validation:
if (promptTokens > roleParams.maxInputTokens) {
throw new Error(
`Prompt (${promptTokens} tokens) exceeds configured max input tokens (${roleParams.maxInputTokens}) for role '${currentRole}'.`
);
}
if (promptTokens > modelAbsoluteMaxInput) {
throw new Error(
`Prompt (${promptTokens} tokens) exceeds model's absolute max input tokens (${modelAbsoluteMaxInput}) for ${modelId}.`
);
}
API Call Parameter (max_tokens for generation):
The max_tokens parameter passed to the provider's API call should be roleParams.maxOutputTokens. We must also ensure this doesn't exceed the model's absolute output capability.
const apiMaxOutputTokens = Math.min(
roleParams.maxOutputTokens,
modelAbsoluteMaxOutput
);
const callParams = {
apiKey,
modelId,
// Use apiMaxOutputTokens for the actual API call
maxTokens: apiMaxOutputTokens, // This usually means max tokens to GENERATE
temperature: roleParams.temperature,
messages,
baseUrl,
...(serviceType === 'generateObject' && { schema, objectName }),
...restApiParams
};
Update logAiUsage:
logAiUsage seems correct as it already uses inputCost and outputCost per 1M tokens. This part should remain compatible.Next Steps:
supported-models.json: I need the content of this file to propose the new maxInputTokens and maxOutputTokens fields. Please provide it.tiktoken initially?Once I have the content of supported-models.json, I can draft the changes for it and then proceed to config-manager.js and .taskmasterconfig.
// Before
{
"id": "claude-3-7-sonnet-20250219",
"name": "Claude 3.7 Sonnet (Preview)",
"context_window": 200000, // This might be total context
"cost_per_1m_tokens": { "input": 3, "output": 15, "currency": "USD" }
}
// After (assuming 200k is total, and we make a reasonable split or find official numbers)
{
"id": "claude-3-7-sonnet-20250219",
"name": "Claude 3.7 Sonnet (Preview)",
"maxInputTokens": 190000, // Example
"maxOutputTokens": 8192, // Example, often smaller for generation
"cost_per_1m_tokens": { "input": 3, "output": 15, "currency": "USD" }
}
// Placeholder for token counting
function countTokens(text, modelId /* or providerName */) {
// ... implementation using SDK or library ...
// This is a simplification; actual tokenization is model-specific.
// For now, a rough estimate might be (text.length / 4) as a placeholder
// if a proper tokenizer isn't immediately available.
if (!text) return 0;
return Math.ceil(text.length / 3.5); // Very rough general estimate
}
const promptTokens = countTokens(systemPrompt) + countTokens(prompt);
const roleParams = getParametersForRole(currentRole, effectiveProjectRoot);
// roleParams will now contain { maxInputTokens, maxOutputTokens, temperature }
// Simplified; ideally use a robust getter from config-manager
const modelInfo = MODEL_MAP[providerName?.toLowerCase()]?.find(
(m) => m.id === modelId
);
const modelAbsoluteMaxInput = modelInfo?.maxInputTokens || Infinity;
const modelAbsoluteMaxOutput =
modelInfo?.maxOutputTokens || roleParams.maxOutputTokens; // Fallback to configured if not in map
if (promptTokens > roleParams.maxInputTokens) {
throw new Error(
`Prompt (${promptTokens} tokens) exceeds configured max input tokens (${roleParams.maxInputTokens}) for role '${currentRole}'.`
);
}
if (promptTokens > modelAbsoluteMaxInput) {
throw new Error(
`Prompt (${promptTokens} tokens) exceeds model's absolute max input tokens (${modelAbsoluteMaxInput}) for ${modelId}.`
);
}
const apiMaxOutputTokens = Math.min(
roleParams.maxOutputTokens,
modelAbsoluteMaxOutput
);
const callParams = {
apiKey,
modelId,
// Use apiMaxOutputTokens for the actual API call
maxTokens: apiMaxOutputTokens, // This usually means max tokens to GENERATE
temperature: roleParams.temperature,
messages,
baseUrl,
...(serviceType === 'generateObject' && { schema, objectName }),
...restApiParams
};
Okay, I have the contents of supported-models.json.
It seems many models have a max_tokens field, which I assume is the total context window. Some don't have it, and for others (like OpenAI gpt-4o), the listed max_tokens (16384) is known to be its output token limit when using the Chat Completions API, while the context window is much larger (128k). This highlights the complexity: max_tokens in supported-models.json is used inconsistently.
Revised Plan for supported-models.json and Token Definitions:
To bring clarity, we'll introduce two new fields and ensure their meaning is consistent:
contextWindowTokens: The total number of tokens the model can process (input + output). This would replace the current ambiguous max_tokens.maxOutputTokens: The maximum number of tokens the model can generate in a single response. This is often smaller than the total context window, especially for larger models.If a model only specifies a total context window, we'll have to make a reasonable assumption for maxOutputTokens (e.g., 4096 or 8192, or a fraction of the total context window). If it only specifies an output token limit (like some OpenAI models in certain API modes), we'll need to find its total context window.
Updated supported-models.json Structure (Example):
// For a model like Anthropic Claude 3.7 Sonnet (Preview)
{
"id": "claude-3-7-sonnet-20250219",
"swe_score": 0.623,
"cost_per_1m_tokens": { "input": 3.0, "output": 15.0 },
"allowed_roles": ["main", "fallback"],
"contextWindowTokens": 200000, // From Anthropic's documentation
"maxOutputTokens": 8192 // Anthropic default, user can override in .taskmasterconfig
}
// For a model like OpenAI GPT-4o
{
"id": "gpt-4o",
"swe_score": 0.332,
"cost_per_1m_tokens": { "input": 2.5, "output": 10.0 },
"allowed_roles": ["main", "fallback"],
"contextWindowTokens": 128000, // Official context window
"maxOutputTokens": 16384 // Max output for chat completions is 16k for gpt-4o (used to be 4k/8k for older gpt-4)
// but the model card mentions 128k total. The API call parameter for `max_tokens` sets this output cap.
}
I will proceed to generate the updated supported-models.json content. This will be a significant change. I will make my best effort to find the correct contextWindowTokens and maxOutputTokens for each model. If official numbers are ambiguous or not readily available, I'll use sensible defaults and add a comment.
Regarding Tokenizer Strategy:
ai package (Vercel AI SDK) itself is a lightweight wrapper. Tokenization is usually handled by the underlying provider-specific SDKs (e.g., @anthropic-ai/sdk, openai). The Vercel SDK doesn't provide a universal tokenizer.@anthropic-ai/sdk: Does not expose a public tokenizer directly. Anthropic recommends counting based on their guidelines (roughly 3.5 chars/token for English).openai (for Node.js): The gpt-3-encoder or tiktoken libraries are commonly used. tiktoken is more current.tiktoken: This is a good general-purpose library for OpenAI models and those that use compatible tokenization (many models fine-tuned from Llama, etc., might be close). It supports various encodings.tiktoken. We can select the encoding based on the provider or even specific model if needed (e.g., cl100k_base for gpt-4, gpt-3.5-turbo, DALL-E models).tiktoken is not accurate (e.g., Google's Gemini, Perplexity if they have proprietary tokenizers), we might have to rely on character-based estimations (e.g., length / N) as a rougher guide or investigate if their specific SDKs offer token counting.experimental_streamObject and other functions might internally handle some aspects of token counting for request preparation, but we need explicit counting for our logic before the call.For now, let's assume we'll add a countTokens utility that uses tiktoken for OpenAI and Anthropic (as Anthropic's guidance is character-based but cl100k_base is often a decent proxy for their tokenization for rough estimates, though not perfect), and a character-based fallback for others.
I will now generate the proposed changes for scripts/modules/supported-models.json. This will be a large diff.
After this, we'll move to:
.taskmasterconfig defaults in config-manager.js and providing you the snippet for your user file.config-manager.js getters.ai-services-unified.js.// For a model like Anthropic Claude 3.7 Sonnet (Preview)
{
"id": "claude-3-7-sonnet-20250219",
"swe_score": 0.623,
"cost_per_1m_tokens": { "input": 3.0, "output": 15.0 },
"allowed_roles": ["main", "fallback"],
"contextWindowTokens": 200000, // From Anthropic's documentation
"maxOutputTokens": 8192 // Anthropic default, user can override in .taskmasterconfig
}
// For a model like OpenAI GPT-4o
{
"id": "gpt-4o",
"swe_score": 0.332,
"cost_per_1m_tokens": { "input": 2.5, "output": 10.0 },
"allowed_roles": ["main", "fallback"],
"contextWindowTokens": 128000, // Official context window
"maxOutputTokens": 16384 // Max output for chat completions is 16k for gpt-4o (used to be 4k/8k for older gpt-4)
// but the model card mentions 128k total. The API call parameter for `max_tokens` sets this output cap.
}
// For a model like Anthropic Claude 3.7 Sonnet (Preview)
{
"id": "claude-3-7-sonnet-20250219",
"swe_score": 0.623,
"cost_per_1m_tokens": { "input": 3.0, "output": 15.0 },
"allowed_roles": ["main", "fallback"],
"contextWindowTokens": 200000, // From Anthropic's documentation
"maxOutputTokens": 8192 // Anthropic default, user can override in .taskmasterconfig
}
// For a model like OpenAI GPT-4o
{
"id": "gpt-4o",
"swe_score": 0.332,
"cost_per_1m_tokens": { "input": 2.5, "output": 10.0 },
"allowed_roles": ["main", "fallback"],
"contextWindowTokens": 128000, // Official context window
"maxOutputTokens": 16384 // Max output for chat completions is 16k for gpt-4o (used to be 4k/8k for older gpt-4)
// but the model card mentions 128k total. The API call parameter for `max_tokens` sets this output cap.
}