packages/@n8n/ai-workflow-builder.ee/evaluations/programmatic/python/CONFIGURATION.md
This document provides comprehensive documentation for the workflow comparison configuration format.
Configuration files can be written in either YAML or JSON format. YAML is recommended for readability and easier maintenance. The configuration controls:
version: "1.0"
name: "my-config"
description: "My custom configuration"
costs:
nodes:
insertion: 10.0
deletion: 10.0
# ... more costs
similarity_groups:
triggers:
- "n8n-nodes-base.webhook"
- "n8n-nodes-base.manualTrigger"
ignore:
node_types:
- "n8n-nodes-base.stickyNote"
parameter_comparison:
numeric_tolerance:
- parameter: "options.temperature"
tolerance: 0.1
output:
max_edits: 15
{
"version": "1.0",
"name": "my-config",
"description": "My custom configuration",
"costs": {
"nodes": {
"insertion": 10.0,
"deletion": 10.0
}
},
"similarity_groups": {
"triggers": [
"n8n-nodes-base.webhook",
"n8n-nodes-base.manualTrigger"
]
}
}
version (string, required)Configuration format version. Currently "1.0".
version: "1.0"
name (string, optional)A unique identifier for this configuration.
name: "my-custom-config"
description (string, optional)Human-readable description of what this configuration does.
description: "Strict comparison for production workflows"
The costs section defines penalties for different graph edit operations. These costs directly impact the similarity score.
costs:
nodes:
insertion: <float>
deletion: <float>
substitution:
same_type: <float>
similar_type: <float>
different_type: <float>
trigger_mismatch: <float>
edges:
insertion: <float>
deletion: <float>
substitution: <float>
parameters:
mismatch_weight: <float>
nested_weight: <float>
costs.nodes.insertion (float, default: 10.0)Cost penalty when a node exists in the ground truth but is missing from the generated workflow.
Use case: Set higher for stricter matching (e.g., 15.0), lower for lenient matching (e.g., 5.0).
costs:
nodes:
insertion: 10.0
costs.nodes.deletion (float, default: 10.0)Cost penalty when a node exists in the generated workflow but not in the ground truth.
Use case: Set higher to penalize extra nodes more severely.
costs:
nodes:
deletion: 15.0
costs.nodes.substitution.same_type (float, default: 1.0)Cost when two nodes have the same type but different parameters.
Use case:
costs:
nodes:
substitution:
same_type: 1.0
costs.nodes.substitution.similar_type (float, default: 5.0)Cost when two nodes are in the same similarity group (see Similarity Groups).
Example: Replacing lmChatOpenAi with lmChatAnthropic (both are LLMs).
costs:
nodes:
substitution:
similar_type: 5.0
costs.nodes.substitution.different_type (float, default: 15.0)Cost when replacing a node with a completely different type.
Example: Replacing httpRequest with webhook.
costs:
nodes:
substitution:
different_type: 15.0
costs.nodes.substitution.trigger_mismatch (float, default: 50.0)Special high-cost penalty for trigger node mismatches. Triggers are critical to workflow functionality.
Use case: Keep this high (50.0-100.0) to ensure trigger correctness.
costs:
nodes:
substitution:
trigger_mismatch: 50.0
costs.edges.insertion (float, default: 5.0)Cost for a missing connection between nodes.
costs:
edges:
insertion: 5.0
costs.edges.deletion (float, default: 5.0)Cost for an extra connection that shouldn't exist.
costs:
edges:
deletion: 5.0
costs.edges.substitution (float, default: 3.0)Cost for changing the type or properties of a connection.
costs:
edges:
substitution: 3.0
costs.parameters.mismatch_weight (float, default: 0.5)Weight multiplier for parameter mismatches within a node.
Formula: parameter_cost = base_cost * mismatch_weight * num_mismatches
costs:
parameters:
mismatch_weight: 0.5
costs.parameters.nested_weight (float, default: 0.3)Weight multiplier for nested/deep parameter differences.
Use case: Set lower to be more forgiving about deep configuration differences.
costs:
parameters:
nested_weight: 0.3
Similarity groups define sets of node types that should be considered "similar" rather than "different" when substituted. Nodes within the same group incur the similar_type cost instead of different_type.
similarity_groups:
<group_name>:
- "<node_type_1>"
- "<node_type_2>"
- "<node_type_3>"
similarity_groups:
triggers:
- "n8n-nodes-base.webhook"
- "n8n-nodes-base.manualTrigger"
- "n8n-nodes-base.scheduleTrigger"
ai_llms:
- "@n8n/n8n-nodes-langchain.lmChatOpenAi"
- "@n8n/n8n-nodes-langchain.lmChatAnthropic"
- "@n8n/n8n-nodes-langchain.lmChatOllama"
- "@n8n/n8n-nodes-langchain.lmChatMistralCloud"
http_requests:
- "n8n-nodes-base.httpRequest"
- "@n8n/n8n-nodes-langchain.toolHttpRequest"
ai_agents:
- "n8n-nodes-langchain.agent"
- "@n8n/n8n-nodes-langchain.agent"
- "n8n-nodes-langchain.basicAgent"
ai_tools:
- "@n8n/n8n-nodes-langchain.toolHttpRequest"
- "@n8n/n8n-nodes-langchain.toolCalculator"
- "@n8n/n8n-nodes-langchain.toolCode"
- "@n8n/n8n-nodes-langchain.toolWorkflow"
Ignore rules allow you to exclude certain nodes or parameters from comparison. This is useful for:
ignore:
node_types: [...]
nodes: [...]
global_parameters: [...]
node_type_parameters: {...}
parameter_paths: [...]
ignore.node_types (list of strings)Completely ignore nodes of specific types.
Use case: Ignore decorative nodes like sticky notes.
ignore:
node_types:
- "n8n-nodes-base.stickyNote"
- "n8n-nodes-base.comment"
ignore.nodes (list of objects)Flexible rules for ignoring nodes based on name patterns or other criteria.
Structure:
ignore:
nodes:
- pattern: "<regex_pattern>"
reason: "Why this is ignored"
- name: "<exact_node_name>"
reason: "Why this is ignored"
- node_type: "<node_type>"
reason: "Why this is ignored"
Example:
ignore:
nodes:
- pattern: "^Temp.*"
reason: "Temporary debugging nodes"
- name: "Development Only"
reason: "Used only in development"
ignore.global_parameters (list of strings)Parameter names to ignore across all node types.
Common use case: Ignore UI-specific metadata.
ignore:
global_parameters:
- "position"
- "id"
- "notes"
- "notesInFlow"
- "color"
- "disabled"
ignore.node_type_parameters (object)Parameters to ignore for specific node types.
Structure:
ignore:
node_type_parameters:
"<node_type>":
- "<parameter_path_1>"
- "<parameter_path_2>"
Example:
ignore:
node_type_parameters:
"@n8n/n8n-nodes-langchain.agent":
- "options.systemMessage" # Allow different prompts
- "options.maxIterations" # Allow iteration variance
"n8n-nodes-base.httpRequest":
- "options.timeout" # Timeout can vary by environment
ignore.parameter_paths (list of strings)Ignore parameters using path patterns. Supports wildcards:
* - matches any single path segment** - matches any number of path segmentsExample:
ignore:
parameter_paths:
- "options.*.timeout" # Ignore timeout in any option
- "**.temperature" # Ignore temperature at any nesting level
- "options.advanced.**" # Ignore all advanced options
Parameter comparison rules allow for flexible matching of specific parameters, such as numeric tolerance or semantic similarity.
parameter_comparison:
fuzzy_match: [...]
numeric_tolerance: [...]
For semantic or approximate text matching.
Structure:
parameter_comparison:
fuzzy_match:
- parameter: "<parameter_path>"
type: "semantic"
threshold: <float>
cost_if_below: <float>
options:
<key>: <value>
Example:
parameter_comparison:
fuzzy_match:
- parameter: "options.systemMessage"
type: "semantic"
threshold: 0.8
cost_if_below: 3.0
options:
model: "sentence-transformers"
For numeric parameters that should be "close enough" rather than exact.
Structure:
parameter_comparison:
numeric_tolerance:
- parameter: "<parameter_path>"
tolerance: <float>
cost_if_exceeded: <float>
Example:
parameter_comparison:
numeric_tolerance:
- parameter: "options.temperature"
tolerance: 0.1
cost_if_exceeded: 2.0
- parameter: "options.maxTokens"
tolerance: 100
cost_if_exceeded: 1.0
- parameter: "options.topP"
tolerance: 0.05
cost_if_exceeded: 1.5
How it works:
|value1 - value2| <= tolerance, parameters are considered equal (no cost)|value1 - value2| > tolerance, cost_if_exceeded is added to the edit costParameter paths support wildcards:
parameter_comparison:
numeric_tolerance:
- parameter: "options.*.temperature"
tolerance: 0.1
cost_if_exceeded: 2.0
This applies to options.llm.temperature, options.model.temperature, etc.
Exemptions reduce penalties for certain nodes that are optional or conditionally required.
exemptions:
optional_in_generated: [...]
optional_in_ground_truth: [...]
exemptions.optional_in_generated (list of objects)Nodes that can be missing from the generated workflow without full penalty.
Use case: Ground truth has optional nodes that aren't critical.
Structure:
exemptions:
optional_in_generated:
- name_pattern: "<regex>"
penalty: <float>
reason: "Why this is optional"
- node_type: "<node_type>"
penalty: <float>
when:
<condition_key>: <condition_value>
Example:
exemptions:
optional_in_generated:
- node_type: "@n8n/n8n-nodes-langchain.memoryBufferWindow"
penalty: 2.0
reason: "Memory is optional for simple workflows"
- name_pattern: ".*Debug.*"
penalty: 1.0
reason: "Debug nodes are optional in production"
exemptions.optional_in_ground_truth (list of objects)Nodes that can exist in the generated workflow as extras without full penalty.
Use case: Generated workflow includes helpful but non-essential nodes.
Example:
exemptions:
optional_in_ground_truth:
- node_type: "n8n-nodes-base.set"
penalty: 3.0
reason: "Set nodes for data transformation are okay to add"
- node_type: "@n8n/n8n-nodes-langchain.toolCalculator"
penalty: 2.0
reason: "Extra tools are acceptable"
Use the when clause to apply exemptions conditionally:
exemptions:
optional_in_generated:
- node_type: "n8n-nodes-base.errorTrigger"
penalty: 1.0
when:
disabled: true
reason: "Disabled error handlers are optional"
Rules for handling workflow connections (edges).
connections:
ignore_connection_types: [...]
equivalent_types: [...]
connections.ignore_connection_types (list of strings)Connection types to completely ignore during comparison.
connections:
ignore_connection_types:
- "main" # Ignore main data flow connections
connections.equivalent_types (list of lists)Define groups of connection types that should be treated as equivalent.
Example:
connections:
equivalent_types:
- ["main", "ai"]
- ["error", "fallback"]
This means:
main and ai connections are interchangeableerror and fallback connections are interchangeableControls how results are formatted and presented.
output:
max_edits: <integer>
group_by: "<grouping_strategy>"
include_explanations: <boolean>
include_suggestions: <boolean>
output.max_edits (integer, default: 15)Maximum number of edit operations to return in the results.
Use case:
output:
max_edits: 15
output.group_by (string, default: "priority")How to group edit operations in the output.
Options:
"priority": Group by priority (critical, major, minor)"type": Group by edit type (node, edge, parameter)"cost": Order by cost (highest first)output:
group_by: "priority"
output.include_explanations (boolean, default: true)Include detailed explanations for each edit operation.
output:
include_explanations: true
output.include_suggestions (boolean, default: true)Include suggestions for how to fix issues.
output:
include_suggestions: true
For production workflows where exact matching is critical:
version: "1.0"
name: "production-strict"
description: "Strict matching for production workflows"
costs:
nodes:
insertion: 20.0
deletion: 20.0
substitution:
same_type: 0.5
similar_type: 10.0
different_type: 30.0
trigger_mismatch: 100.0
edges:
insertion: 10.0
deletion: 10.0
substitution: 5.0
parameters:
mismatch_weight: 1.0
nested_weight: 0.8
similarity_groups:
triggers:
- "n8n-nodes-base.webhook"
- "n8n-nodes-base.scheduleTrigger"
ignore:
node_types:
- "n8n-nodes-base.stickyNote"
global_parameters:
- "position"
- "id"
parameter_comparison:
numeric_tolerance:
- parameter: "options.temperature"
tolerance: 0.05
cost_if_exceeded: 5.0
output:
max_edits: 20
group_by: "priority"
include_explanations: true
include_suggestions: true
For development workflows where flexibility is needed:
version: "1.0"
name: "development-lenient"
description: "Lenient matching for development and testing"
costs:
nodes:
insertion: 5.0
deletion: 5.0
substitution:
same_type: 1.0
similar_type: 3.0
different_type: 8.0
trigger_mismatch: 20.0
edges:
insertion: 2.0
deletion: 2.0
substitution: 1.0
parameters:
mismatch_weight: 0.3
nested_weight: 0.1
similarity_groups:
ai_llms:
- "@n8n/n8n-nodes-langchain.lmChatOpenAi"
- "@n8n/n8n-nodes-langchain.lmChatAnthropic"
- "@n8n/n8n-nodes-langchain.lmChatOllama"
ai_tools:
- "@n8n/n8n-nodes-langchain.toolHttpRequest"
- "@n8n/n8n-nodes-langchain.toolCalculator"
- "@n8n/n8n-nodes-langchain.toolCode"
ignore:
node_types:
- "n8n-nodes-base.stickyNote"
global_parameters:
- "position"
- "id"
- "notes"
- "notesInFlow"
- "color"
- "disabled"
node_type_parameters:
"@n8n/n8n-nodes-langchain.agent":
- "options.systemMessage"
- "options.maxIterations"
parameter_comparison:
numeric_tolerance:
- parameter: "options.temperature"
tolerance: 0.2
cost_if_exceeded: 1.0
- parameter: "options.maxTokens"
tolerance: 500
cost_if_exceeded: 0.5
exemptions:
optional_in_generated:
- node_type: "@n8n/n8n-nodes-langchain.memoryBufferWindow"
penalty: 1.0
reason: "Memory is optional"
optional_in_ground_truth:
- node_type: "n8n-nodes-base.set"
penalty: 2.0
reason: "Data transformation nodes are okay to add"
output:
max_edits: 10
group_by: "priority"
include_explanations: true
include_suggestions: false
Optimized for AI workflow comparisons:
version: "1.0"
name: "ai-workflows"
description: "Specialized configuration for AI agent workflows"
costs:
nodes:
insertion: 10.0
deletion: 10.0
substitution:
same_type: 1.0
similar_type: 4.0
different_type: 15.0
trigger_mismatch: 50.0
edges:
insertion: 5.0
deletion: 5.0
substitution: 3.0
parameters:
mismatch_weight: 0.4
nested_weight: 0.2
similarity_groups:
ai_agents:
- "n8n-nodes-langchain.agent"
- "@n8n/n8n-nodes-langchain.agent"
- "n8n-nodes-langchain.basicAgent"
ai_llms:
- "@n8n/n8n-nodes-langchain.lmChatOpenAi"
- "@n8n/n8n-nodes-langchain.lmChatAnthropic"
- "@n8n/n8n-nodes-langchain.lmChatOllama"
- "@n8n/n8n-nodes-langchain.lmChatMistralCloud"
- "@n8n/n8n-nodes-langchain.lmChatAws"
ai_tools:
- "@n8n/n8n-nodes-langchain.toolHttpRequest"
- "@n8n/n8n-nodes-langchain.toolCalculator"
- "@n8n/n8n-nodes-langchain.toolCode"
- "@n8n/n8n-nodes-langchain.toolWorkflow"
memory_types:
- "@n8n/n8n-nodes-langchain.memoryBufferWindow"
- "@n8n/n8n-nodes-langchain.memoryConversation"
ignore:
node_types:
- "n8n-nodes-base.stickyNote"
global_parameters:
- "position"
- "id"
- "notes"
- "color"
node_type_parameters:
"@n8n/n8n-nodes-langchain.agent":
- "options.systemMessage" # Prompts can legitimately vary
"@n8n/n8n-nodes-langchain.lmChatOpenAi":
- "options.modelName" # Different models okay
"@n8n/n8n-nodes-langchain.lmChatAnthropic":
- "options.modelName"
parameter_comparison:
numeric_tolerance:
- parameter: "**.temperature"
tolerance: 0.15
cost_if_exceeded: 2.0
- parameter: "**.maxTokens"
tolerance: 200
cost_if_exceeded: 1.0
- parameter: "**.topP"
tolerance: 0.1
cost_if_exceeded: 1.5
exemptions:
optional_in_generated:
- node_type: "@n8n/n8n-nodes-langchain.memoryBufferWindow"
penalty: 2.0
reason: "Memory is optional for stateless workflows"
- node_type: "@n8n/n8n-nodes-langchain.toolCalculator"
penalty: 3.0
reason: "Calculator tool is optional"
optional_in_ground_truth:
- node_type: "@n8n/n8n-nodes-langchain.toolCode"
penalty: 2.0
reason: "Additional code tools are acceptable"
output:
max_edits: 15
group_by: "priority"
include_explanations: true
include_suggestions: true
# Python CLI
uvx --from . python -m compare_workflows workflow1.json workflow2.json --preset standard
# Python API
from config_loader import load_config
config = load_config("preset:standard")
# Python CLI
uvx --from . python -m compare_workflows workflow1.json workflow2.json --config my-config.yaml
# Python API
from config_loader import load_config
config = load_config("/path/to/my-config.yaml")
from config_loader import WorkflowComparisonConfig
# Create from dictionary
config_dict = {
"version": "1.0",
"name": "custom",
"costs": {
"nodes": {
"insertion": 12.0
}
}
}
config = WorkflowComparisonConfig._from_dict(config_dict)
Start with a preset: Begin with standard, strict, or lenient and customize from there.
Test iteratively: Make small changes and test to understand the impact on similarity scores.
Use similarity groups: Group related node types to avoid harsh penalties for equivalent substitutions.
Ignore UI elements: Always ignore cosmetic parameters like position, id, color, etc.
Set appropriate tolerances: Use numeric tolerances for parameters that shouldn't need exact matches (e.g., temperature, maxTokens).
Document your changes: Use the description field and comments to explain why you made specific choices.
Version control: Keep configuration files in version control alongside your workflows.
Environment-specific configs: Create different configurations for development, testing, and production environments.