docs/local/TEMPLATE_MINING_ANALYSIS.md
Date: 2025-10-02
Context: Analyzing whether to fix get_node_for_task (28% failure rate) or replace it with template-based configuration extraction
RECOMMENDATION: Replace get_node_for_task with template-based configuration extraction. The template database contains 2,646 real-world workflows with rich node configurations that far exceed the 31 hardcoded task templates.
Top node types by template usage:
3,820 templates: n8n-nodes-base.httpRequest (144% of total templates!)
3,678 templates: n8n-nodes-base.set
2,445 templates: n8n-nodes-base.code
1,700 templates: n8n-nodes-base.googleSheets
1,471 templates: @n8n/n8n-nodes-langchain.agent
1,269 templates: @n8n/n8n-nodes-langchain.lmChatOpenAi
792 templates: n8n-nodes-base.telegram
702 templates: n8n-nodes-base.httpRequestTool
596 templates: n8n-nodes-base.gmail
466 templates: n8n-nodes-base.webhook
Comparison:
CREATE TABLE templates (
id INTEGER PRIMARY KEY,
workflow_id INTEGER UNIQUE NOT NULL,
name TEXT NOT NULL,
description TEXT,
-- Node information
nodes_used TEXT, -- JSON array: ["n8n-nodes-base.httpRequest", ...]
workflow_json_compressed TEXT, -- Base64 encoded gzip of full workflow
-- Metadata (100% coverage)
metadata_json TEXT, -- AI-generated structured metadata
-- Stats
views INTEGER DEFAULT 0,
created_at DATETIME,
-- ...
);
Simple URL fetch:
{
"url": "https://api.example.com/data",
"options": {}
}
With authentication:
{
"url": "=https://api.wavespeed.ai/api/v3/predictions/{{ $json.data.id }}/result",
"options": {},
"authentication": "genericCredentialType",
"genericAuthType": "httpHeaderAuth"
}
Complex expressions:
{
"url": "=https://image.pollinations.ai/prompt/{{$('Social Media Content Factory').item.json.output.description.replaceAll(' ','-').replaceAll(',','').replaceAll('.','') }}",
"options": {}
}
Basic webhook:
{
"path": "ytube",
"options": {},
"httpMethod": "POST",
"responseMode": "responseNode"
}
With binary data:
{
"path": "your-endpoint",
"options": {
"binaryPropertyName": "data"
},
"httpMethod": "POST"
}
Each template has structured metadata including:
{
"categories": ["automation", "integration", "data processing"],
"complexity": "medium",
"use_cases": [
"Extract transaction data from Gmail",
"Automate bookkeeping",
"Expense tracking"
],
"estimated_setup_minutes": 30,
"required_services": ["Gmail", "Google Sheets", "Google Gemini"],
"key_features": [
"Fetch emails by label",
"Extract transaction data",
"Use LLM for structured output"
],
"target_audience": ["Accountants", "Small business owners"]
}
Pros:
Cons:
Pros:
Cons:
Create a new service: TemplateConfigExtractor
interface ExtractedNodeConfig {
nodeType: string;
configuration: Record<string, any>;
source: {
templateId: number;
templateName: string;
templateViews: number;
useCases: string[];
complexity: 'simple' | 'medium' | 'complex';
};
patterns: {
hasAuthentication: boolean;
hasExpressions: boolean;
hasOptionalFields: boolean;
};
}
class TemplateConfigExtractor {
async extractConfigsForNode(
nodeType: string,
options?: {
complexity?: 'simple' | 'medium' | 'complex';
requiresAuth?: boolean;
limit?: number;
}
): Promise<ExtractedNodeConfig[]> {
// 1. Query templates containing nodeType
// 2. Decompress workflow_json_compressed
// 3. Extract node configurations
// 4. Rank by popularity + complexity match
// 5. Return top N configurations
}
}
Option A: Enhance get_node_essentials
includeExamples: boolean parameterOption B: Enhance get_node_info
examples section with template-sourced configsOption C: New tool get_node_examples
-- Fast: Get templates for a node type (no decompression)
SELECT id, name, views, metadata_json
FROM templates
WHERE nodes_used LIKE '%n8n-nodes-base.httpRequest%'
ORDER BY views DESC
LIMIT 10;
-- Then decompress only top matches
Original P0-R3 Plan: Expand task library from 31 to 100+ tasks using fuzzy matching
New Approach: Mine 2,646 templates for real configurations
Impact Assessment:
| Metric | Original Plan | Template Mining |
|---|---|---|
| Configuration examples | 100 (estimated) | 2,646+ actual |
| Node coverage | ~20% | 103% |
| Maintenance | High (manual) | Low (auto-fetch) |
| Accuracy | Curated | Production-tested |
| Context richness | Limited | Rich metadata |
| Development time | 2-3 weeks | 1 week |
Recommendation: PIVOT to template mining approach for P0-R3
TemplateConfigExtractor serviceget_node_essentials with examplesTotal: 2 weeks vs 3 weeks for original plan
// Test: Extract HTTP Request configs
const configs = await extractor.extractConfigsForNode(
'n8n-nodes-base.httpRequest',
{ complexity: 'simple', limit: 5 }
);
// Expected: 5 configs from top templates
// - Simple URL fetch
// - With authentication
// - With custom headers
// - With expressions
// - With error handling
// Test: Extract webhook configs
const webhookConfigs = await extractor.extractConfigsForNode(
'n8n-nodes-base.webhook',
{ limit: 3 }
);
// Expected: 3 configs showing different patterns
// - Basic POST webhook
// - With response node
// - With binary data handling
Before (get_node_for_task):
Target (template-based):
TemplateConfigExtractor serviceget_node_essentials with examplesThe template database provides a vastly superior alternative to hardcoded task templates:
Recommendation: Pivot P0-R3 from "expand task library" to "mine template configurations"