docs/ref/modules/engine/README.md
The engine is responsible for transforming raw data into standardized schema documents, enriching it with threat intelligence, and forwarding it to designated destinations.
The data flow begins when an event enters the orchestrator and continues until it is processed by the security policy. Below is a high-level flowchart illustrating this process.
flowchart LR
classDef EventBoxClass font-size: 15px,stroke-width:2px, color:#fff, fill:#3f51b5
%% Router Table
subgraph routerTable["Router Table"]
direction TB
routeC
routeB
routeA
subgraph routeA["Route prod"]
direction LR
policyA("Security Policy")
end
subgraph routeB["Route QA"]
direction LR
policyB("Security Policy")
end
subgraph routeC["Route Dev"]
direction LR
policyC("Security Policy")
end
end
%% Orchestrator
subgraph orchestrator["Orchestrator (Simplified)"]
direction LR
%% Router Table
routeSelector("Route</br>selector")
routerTable
end
%% Routes
eventA@{ shape: doc, label: "Incoming</br>event" }
eventA:::EventBoxClass
eventA-->routeSelector
routeSelector-.->policyA & policyB & policyC
To understand how the engine is structured, it's important to identify the key components involved in this process. When a new event arrives, the engine directs it to different policies for processing. The orchestrator manages these policies at runtime.
The orchestrator routes events to a policy and is composed of the following elements:
A policy defines the processing pipeline of the events and is composed of:
Each policy can be tailored to specific use cases.
The purpose of the Engine is to convert unstructured or semi-structured logs into normalized and enriched events. The agent transmits logs within a JSON payload, which includes additional metadata such as OS information, log source, and other relevant details. The Engine processes these logs and generates a structured JSON event, incorporating all relevant information in accordance with the defined schema.
Input event example:
{
"@timestamp": "2025-01-23T17:40:37Z",
"agent": {
"groups": [
"group1",
"group2"
],
"host": {
"architecture": "x86_64",
"hostname": "wazuh-endpoint-linux",
"ip": [
"192.168.1.2"
],
"os": {
"name": "Amazon Linux 2",
"platform": "Linux"
}
},
"id": "2887e1cf-9bf2-431a-b066-a46860080f56",
"name": "wazuh-agent-name",
"type": "endpoint",
"version": "5.0.0"
},
"event": {
"collector": "file",
"module": "logcollector",
"original": "Dec 13 11:35:28 a-mac-with-esc-key GoogleSoftwareUpdateAgent[21412]: 2016-12-13 11:35:28.421 GoogleSoftwareUpdateAgent[21412/0x700007399000] [lvl=2] -[KSUpdateEngine updateAllExceptProduct:] KSUpdateEngine updating all installed products, except:'com.google.Keystone'."
},
"log": {
"file": {
"path": "/var/log/syslog.log"
}
},
}
Processed event:
{
"@timestamp": "2025-01-23T17:40:37Z",
"agent": {
"groups": [
"group1",
"group2"
],
"host": {
"architecture": "x86_64",
"hostname": "wazuh-endpoint-linux",
"ip": [
"192.168.1.2"
],
"os": {
"name": "Amazon Linux 2",
"platform": "Linux"
}
},
"id": "2887e1cf-9bf2-431a-b066-a46860080f56",
"name": "wazuh-agent-name",
"type": "endpoint",
"version": "5.0.0"
},
"event": {
"collector": "file",
"created": "2024-11-22T02:00:00Z",
"kind": "event",
"module": "logcollector",
"original": "Dec 13 11:35:28 a-mac-with-esc-key GoogleSoftwareUpdateAgent[21412]: 2016-12-13 11:35:28.421 GoogleSoftwareUpdateAgent[21412/0x700007399000] [lvl=2] -[KSUpdateEngine updateAllExceptProduct:] KSUpdateEngine updating all installed products, except:'com.google.Keystone'.",
"start": "2025-12-13T11:35:28.000Z"
},
"host": {
"hostname": "a-mac-with-esc-key"
},
"log": {
"file": {
"path": "/var/log/syslog.log"
}
},
"message": "2016-12-13 11:35:28.421 GoogleSoftwareUpdateAgent[21412/0x700007399000] [lvl=2] -[KSUpdateEngine updateAllExceptProduct:] KSUpdateEngine updating all installed products, except:'com.google.Keystone'.",
"process": {
"name": "GoogleSoftwareUpdateAgent",
"pid": 21412
},
"related": {
"hosts": [
"a-mac-with-esc-key"
]
},
"tags": [
"production-server"
],
"wazuh": {
"decoders": [
"syslog"
]
}
}
The policy is the operational graph applied to each event, structured into decoders, rules, and outputs, each related to normalizing, enriching, and delivery respectively.
---
title: Security policy dataflow
---
flowchart LR
classDef EventBoxClass font-size: 15px,stroke-width:2px, color:#fff, fill:#3f51b5
classDef TreeBoxClass font-size: 15px,stroke-width:2px,stroke-dasharray: 5 5
subgraph decoTree["Decoders"]
direction TB
deco01(" ")
deco02(" ")
deco03(" ")
deco04(" ")
deco05(" ")
deco06(" ")
deco07(" ")
deco08(" ")
deco01 --> deco02 & deco03 & deco04
deco02 --> deco05
deco03 --> deco06 & deco07
deco04 --> deco08
end
subgraph ruleTree["Rules"]
direction TB
rule01(" ")
rule02(" ")
rule03(" ")
rule04(" ")
rule05(" ")
rule06(" ")
rule07(" ")
rule08(" ")
rule01 --> rule02 & rule03 & rule04
rule02 --> rule05
rule03 --> rule06 & rule07
rule04 --> rule08
end
subgraph outputTree["Outputs"]
direction TB
output01(" ")
output02(" ")
output03(" ")
output04(" ")
output05(" ")
output06(" ")
output07(" ")
output08(" ")
output01 --> output02 & output03 & output04
output02 --> output05
output03 --> output06 & output07
output04 --> output08
end
decoTree:::TreeBoxClass
ruleTree:::TreeBoxClass
outputTree:::TreeBoxClass
eventInput:::EventBoxClass
eventOutput:::EventBoxClass
%% Pipeline
eventInput@{shape: doc, label: "Event</br>Input"}==>decoTree==>ruleTree==>outputTree==>eventOutput@{shape: doc, label: "Enriched</br>Event"}
Wazuh comes with a predefined policy that enables all its components to work properly and it is structured on top of Wazuh-supported log sources.
Each source does have a particular way to format and send logs to the engine. The default policy takes care of that, allowing the users to focus on their integrations and not on the nuances of the logs transports for each source.
graph LR;
subgraph Endpoint
Service["Service"]
WazuhAgent["Wazuh agent"]
Service --- WazuhAgent
end
WazuhAgent -.-> Orchestrator["Orchestrator: Router"]
subgraph WazuhServer["Wazuh server"]
subgraph Engine
Orchestrator --> Route["Route"]
Route --> Decoding["Decoding Stage"]
subgraph SecurityPolicy["Security Policy"]
Decoding --> Rule["Rule Stage"]
Rule --> OutputStage["Output Stage"]
end
end
end
The decoding process converts unstructured data received by the engine into schema-based JSON events.
All events enter the pipeline through the root decoder, which determines the appropriate decoder for processing. Each subsequent decoder processes the event as much as possible before passing it to the next suitable decoder. This continues until no further processing can be performed.
A closer examination of the predefined decoders reveals the following structure:
---
title: Decoder tree
---
flowchart TD
%% Style
classDef AssetSuccessClass fill:#2196f3,stroke-width:2px,fill-opacity:0.8
classDef AssetFailClass fill:#f50057,stroke-width:2px,fill-opacity:0.8
classDef AssetNotExecutedClass fill:#90a4ae,stroke-width:2px,fill-opacity:0.8
%% First Level
decoderR("root decoder"):::AssetSuccessClass
decoderR --x decoder1
decoderR --> decoder2 --> decoder21
decoderR -.-> decoder3 -.-> decoder31
decoder1("decoder 1"):::AssetFailClass
decoder2("decoder 2"):::AssetSuccessClass
decoder1 -.-> decoder11 & decoder12
decoder2 -.-> decoder22
decoder11("decoder 1-1"):::AssetNotExecutedClass
decoder12("decoder 1-2"):::AssetNotExecutedClass
decoder21("decoder 2-1"):::AssetSuccessClass
decoder22("decoder 2-2"):::AssetNotExecutedClass
decoder3("decoder 3"):::AssetNotExecutedClass
decoder31("decoder 3-1"):::AssetNotExecutedClass
linkStyle 0 stroke:#f50057,stroke-width:2px
The event is evaluated by a decoder to determine if it matches the conditions defined within the decoder. If the decoder rejects the event, it is passed to the next sibling decoder within the same hierarchy for evaluation. This process continues until a decoder accepts the event or no more sibling decoders are available.
When a decoder accepts an event, it may modify the event by normalizing or enriching its data. After this, the event is passed to the child decoders of the accepted decoder for further processing. Each child decoder evaluates the event using the same logic, ensuring a hierarchical and iterative approach to event processing.
This hierarchical evaluation ensures that events are processed efficiently and routed through the appropriate decoders based on their structure and content.
The following diagram illustrates the event flow on the decoder tree of default policy:
---
title: Event flow on decoder tree
---
flowchart LR
classDef EventBoxClass font-size: 15px,stroke-width:2px, color:#fff, fill:#3f51b5
classDef TreeBoxClass font-size: 15px,stroke-width:2px,stroke-dasharray: 5 5
subgraph decoTree["First layer - Internal decoders"]
direction TB
deco01(" ")
deco02(" ")
deco03(" ")
deco04("Integration Decoder")
deco05(" ")
deco06(" ")
deco07(" ")
deco01 --> deco02 & deco03 & deco04
deco02 --> deco05
deco03 --> deco06 & deco07
end
deco04 -..-> decoIntegration["Integration Decoder"]:::TreeBoxClass
eventInput@{shape: doc, label: "Event</br>Input"} ==> decoTree
decoTree:::TreeBoxClass
subgraph userDecoTree["Integrations & User decoders"]
direction TB
userDeco01(" ")
userDeco02(" ")
userDeco03(" ")
userDeco04(" ")
userDeco05(" ")
userDeco06(" ")
userDeco07(" ")
userDeco08(" ")
userDeco01 --> userDeco02 & userDeco03 & userDeco04
userDeco02 --> userDeco05
userDeco03 --> userDeco06 & userDeco07
userDeco04 --> userDeco08
end
%% decoIntegration --> userDecoTree
decoIntegration --> userDeco01
userDecoTree ----> eventOutput@{shape: doc, label: "Normalized</br>event"}
userDecoTree:::TreeBoxClass
eventInput:::EventBoxClass
eventOutput:::EventBoxClass
In the default policy, the first layer is for internal decoders, which are responsible for normalizing events. The second layer is for integrations and user-defined decoders, which are used to process events from specific sources or applications.
The analysis process evaluates all event fields to identify potential security concerns, which are represented as threat indicators within the common schema. These indicators are later stored in the Wazuh Indexer, where they can be used for threat hunting and detecting security issues.
All decoded events pass through the analysis pipeline, starting with the root rule. The root rule determines the next appropriate rule for processing the event. If a rule matches, it triggers all its child rules for evaluation in a broadcast manner. Each child rule is independently evaluated, contributing additional threat indicators to the event's analysis. If a rule does not match, its child rules are not evaluated, ensuring efficient processing.
This hierarchical and broadcast-based evaluation allows the analysis pipeline to enrich events with relevant security context while maintaining performance and scalability.
---
title: Rules tree
---
flowchart TD
%% Style
classDef AssetSuccessClass fill:#2196f3,stroke-width:2px,fill-opacity:0.8
classDef AssetFailClass fill:#f50057,stroke-width:2px,fill-opacity:0.8
classDef AssetNotExecutedClass fill:#90a4ae,stroke-width:2px,fill-opacity:0.8
ruleR("root rule") --x rule1("rule 1")
rule1 -.-> rule11("rule 1-1") & rule12("rule 1-2")
ruleR --> rule2("rule 2")
rule2 --> rule21("rule 2-1")
rule2 --x rule22("rule 2-2")
rule2 --> rule23("rule 2-3")
ruleR --> rule3("rule 3")
rule3 --> rule31("rule 3-1")
ruleR:::AssetSuccessClass
rule1:::AssetFailClass
rule11:::AssetNotExecutedClass
rule12:::AssetNotExecutedClass
rule2:::AssetSuccessClass
rule21:::AssetSuccessClass
rule22:::AssetFailClass
rule23:::AssetSuccessClass
rule3:::AssetSuccessClass
rule31:::AssetSuccessClass
linkStyle 0,5 stroke:#f50057,stroke-width:2px
The following diagram illustrates the event flow on the rules tree of the default policy:
---
title: Event flow on rules
---
flowchart LR
classDef EventBoxClass font-size: 15px,stroke-width:2px, color:#fff, fill:#3f51b5
classDef TreeBoxClass font-size: 15px,stroke-width:2px,stroke-dasharray: 5 5
subgraph wazuhRulesTree["Wazuh Rules"]
direction TB
wazuhRules01(" ")
wazuhRules02(" ")
wazuhRules03(" ")
wazuhRules04(" ")
wazuhRules05(" ")
wazuhRules06(" ")
wazuhRules07(" ")
wazuhRules08(" ")
wazuhRules01 --> wazuhRules02 & wazuhRules03 & wazuhRules04
wazuhRules02 --> wazuhRules05
wazuhRules03 --> wazuhRules06 & wazuhRules07
wazuhRules04 --> wazuhRules08
end
subgraph userRulesTree["User rules"]
direction TB
userRules01(" ")
userRules02(" ")
userRules03(" ")
userRules04(" ")
userRules05(" ")
userRules06(" ")
userRules07(" ")
userRules08(" ")
userRules01 --> userRules02 & userRules03 & userRules04
userRules02 --> userRules05
userRules03 --> userRules06 & userRules07
userRules04 --> userRules08
end
wazuhRulesTree:::TreeBoxClass
userRulesTree:::TreeBoxClass
eventInput:::EventBoxClass
eventOutput:::EventBoxClass
%% Pipeline
eventInput@{shape: doc, label: "Normalized</br>Event"}==>wazuhRulesTree & userRulesTree-.->eventOutput@{shape: doc, label: "Security</br>event"}
The analysis pipeline is divided into two layers:
Then both the Wazuh and user rules are applied to the event.
Once an event has completed processing through the decoder and rule pipelines, it enters the output pipeline. Similar to previous stages, the event first passes through the root output, which determines the appropriate output(s) for further processing. Multiple outputs can be selected, enabling flexible storage and distribution policies.
The output process in Wazuh is designed to efficiently distribute alerts through broadcasting, with each output capable of filtering alerts to support customized distribution:
---
title: Event flow on outputs
---
flowchart TD
outputR --> output1("Indexer alert output") & output2("File alerts output")
outputR("Broadcaster output") --x output3("File archive output")
outputR("Broadcaster output") --x output4("Other output")
outputR:::AssetSuccessClass
output1:::AssetSuccessClass
output2:::AssetSuccessClass
output3:::AssetFailClass
output4:::AssetSuccessClass
classDef AssetSuccessClass fill:#3f51b5,stroke-width:2px,fill-opacity:0.5
classDef AssetFailClass fill:#f44336,stroke-width:2px,fill-opacity:0.5
classDef AssetNotExecutedClass fill:#9e9e9e,stroke-width:2px,fill-opacity:0.5
linkStyle 2 stroke:#D50000,fill:none
The following diagram illustrates the full pipeline of the default policy, including the decoding, rule, and output stages:
flowchart TD
classDef EventBoxClass font-size: 15px,stroke-width:2px, color:#fff, fill:#3f51b5
classDef TreeBoxClass font-size: 15px,stroke-width:2px,stroke-dasharray: 5 5
classDef ModuleArchClass fill:#673ab7,stroke-width:2px,fill-opacity:0.5, font-size: 20px
classDef SubModuleArchClass fill:#673ab7,stroke-width:2px,fill-opacity:0.5, font-size: 15px
%% --------------------------------------
%% Decoding Stage
%% --------------------------------------
subgraph decoTree["First layer - Internal decoders"]
direction TB
decoInputRoot(" ")
deco02(" ")
deco03(" ")
integrationDecoder("Integration Decoder")
deco05(" ")
deco06(" ")
deco07(" ")
decoInputRoot --> deco02 & deco03 & integrationDecoder
deco02 --> deco05
deco03 --> deco06 & deco07
end
integrationDecoder -..-> userDecoRoot:::TreeBoxClass
subgraph userDecoTree["Integrations & User decoders"]
direction TB
userDecoRoot(" ")
userDeco02(" ")
userDeco03(" ")
userDeco04(" ")
userDeco05(" ")
userDeco06(" ")
userDeco07(" ")
userDeco08(" ")
userDecoRoot --> userDeco02 & userDeco03 & userDeco04
userDeco02 --> userDeco05
userDeco03 --> userDeco06 & userDeco07
userDeco04 --> userDeco08
end
%% Stage block
subgraph decoderStage["Decoding Stage"]
decoTree:::TreeBoxClass
userDecoTree:::TreeBoxClass
end
%% Output decoder stage
eventNormalized@{shape: doc, label: "Normalized</br>event"}
eventNormalized:::EventBoxClass
%% Pipieline
routeSelector ==> decoInputRoot
userDecoTree ====> eventNormalized
%% --------------------------------------
%% Rules Stage
%% --------------------------------------
subgraph wazuhRulesTree["Wazuh Rules"]
direction TB
wazuhRules01(" ")
wazuhRules02(" ")
wazuhRules03(" ")
wazuhRules04(" ")
wazuhRules05(" ")
wazuhRules06(" ")
wazuhRules07(" ")
wazuhRules08(" ")
wazuhRules01 --> wazuhRules02 & wazuhRules03 & wazuhRules04
wazuhRules02 --> wazuhRules05
wazuhRules03 --> wazuhRules06 & wazuhRules07
wazuhRules04 --> wazuhRules08
end
subgraph userRulesTree["User rules"]
direction TB
userRules01(" ")
userRules02(" ")
userRules03(" ")
userRules04(" ")
userRules05(" ")
userRules06(" ")
userRules07(" ")
userRules08(" ")
userRules01 --> userRules02 & userRules03 & userRules04
userRules02 --> userRules05
userRules03 --> userRules06 & userRules07
userRules04 --> userRules08
end
subgraph ruleStage["Rules Stage"]
wazuhRulesTree:::TreeBoxClass
userRulesTree:::TreeBoxClass
end
%% Output stage rules
securityEvent@{shape: doc, label: "Security</br>event"}
securityEvent:::EventBoxClass
%% Pipieline
eventNormalized==>wazuhRulesTree & userRulesTree-.->securityEvent
%% --------------------------------------
%% Output Stage
%% --------------------------------------
subgraph outputTree["Outputs"]
direction TB
output01(" ")
output02(" ")
output03(" ")
output04(" ")
output05(" ")
output06(" ")
output07(" ")
output08(" ")
output01 --> output02 & output03 & output04
output02 --> output05
output03 --> output06 & output07
output04 --> output08
end
outputTree:::TreeBoxClass
%% Pipieline output
securityEvent ==> outputTree
%% --------------------------------------
%% Default Policy
%% --------------------------------------
subgraph defaultPolicy["Default policy"]
decoderStage
eventNormalized
ruleStage
securityEvent
outputTree
end
defaultPolicy:::SubModuleArchClass
%% --------------------------------------
%% Engine
%% --------------------------------------
%% Input Decodeing Stage
eventInput@{shape: doc, label: "Incoming event</br>from endpoint"}
eventInput:::EventBoxClass
subgraph engine["engine"]
defaultPolicy
routeSelector(["Orchestrator: Router (Route selector)"])
end
engine:::ModuleArchClass
eventInput ===> routeSelector
A schema defines a structured format for data, ensuring consistency, interoperability, and efficient querying. It establishes a common set of field names, data types, and relationships that standardize log and event data across different sources.
The Engine ensures that all operations—parsing, normalization, and enrichment—aim to transform unstructured data into structured data that adheres to the schema. This structured approach enhances data integrity, improves search performance, and enables seamless correlation across multiple data sources.
For example, a network event log structured according to the schema might look like this:
{
"event": {
"category": "network",
"type": "connection",
"action": "network_connection"
},
"source": {
"ip": "192.168.1.10",
"port": 443
},
"destination": {
"ip": "10.0.0.5",
"port": 8080
},
"user": {
"name": "admin"
}
}
The schema configuration for the engine follows a structured format where each field is defined with specific attributes. The schema consists of a JSON object with the following key elements:
type: Specifies the OpenSearch field type, such as date, keyword, text, integer, etc.array: A boolean value (true or false) indicating whether the field can store multiple values (i.e., an array) or just a single value.{
"name": "schema/engine-schema/0",
"fields": {
"@timestamp": {
"type": "date",
"array": false
},
"agent.build.original": {
"type": "keyword",
"array": false
},
"agent.ephemeral_id": {
"type": "keyword",
"array": false
},
"agent.id": {
"type": "keyword",
"array": false
}
}
}
Now that we've explored what the Engine does and how its processing works, we’ll introduce key elements involved in managing and defining the operational graph—specifically, routes, policies, and assets.
All management is performed through the API (refer to the API documentation for a complete list of available calls). Before defining the operational graph, all policies and assets must first be loaded into the Engine’s catalog. This ensures that all assets are validated and ready for use before they are referenced in processing.
To organize assets efficiently, the Engine categorizes them into namespaces. Internally, assets are stored directly under a specific namespace, allowing for structured management and role-based segregation of policies.
The default policy asset namespaces in the Engine are:
system – Core assets responsible for handling internal event processing and ensuring basic event normalization.wazuh – Default integrations developed and maintained by Wazuh.user – A default namespace for end-user-defined assets.While these are the predefined namespaces, the Engine allows creating as many namespaces as needed, enabling flexibility in asset management.
The Catalog is responsible for managing the Engine’s assets, organizing them under namespaces. Each asset is uniquely identified by its name, following the convention:
<type>/<name>/<version>
This naming structure ensures clear versioning and categorization of assets. The following asset types are defined:
All API calls to the Catalog support name-path operations, allowing users to manage specific assets or entire groups efficiently. (Refer to the API documentation for a full list of available catalog operations.)
With all assets defined and stored in the Catalog, the next step is to define policies, specifying exactly what functionality we want to apply. A policy organizes assets hierarchically, defining how events are processed.
The API allows users to configure all decoders, rules, and outputs for a policy, along with default management settings—such as defining default asset parents for specific namespaces. (For a complete list of API calls, refer to the documentation.)
Each policy contains references to asset names, and during the building process, the Engine retrieves these assets from the Catalog. The graph is then built following the parent relationships defined in the assets, ensuring a valid structure. If the relationships are invalid or incomplete, the build process will fail, preventing misconfigurations.
Following the same Catalog-first approach, policies are stored before they are actually used. Policies are only referenced when defining routes, ensuring that all assets and relationships are pre-validated before execution.
The Orchestrator is responsible for managing routes that pair namespaces with policies, ensuring that events are processed by the appropriate policies. It also manages loaded policies, routing priority and some event processing configuration.
The Engine also introduces the concept of testing sessions, which are specialized policies designed for processing test events via the API. These sessions allow users to validate how their policies will behave before deploying them in production, ensuring correctness and expected functionality.
For a complete list of API calls related to routing and policy management, refer to the documentation.
The Engine is composed of distinct modules, each responsible for managing a specific aspect of event processing:
All modules follow the same naming convention, ensuring that every item—whether an asset, policy, or route—can be stored and identified homogeneously by the Store module.
For more information on the Engine’s architecture and how the modules interact, refer to architecture documentation.
In the Wazuh Engine, assets represent the fundamental components of security policies and are the smallest unit within such a policy.
Each asset is organized into various stages that dictate operational procedures when processing an event. These stages provide a structured and semantically meaningful sequence of operations, enhancing the engine's capability to execute these operations efficiently based on predefined execution strategies.
Do not confuse stages with attributes, which are configuration details and metadata about the asset.
---
config:
nodeSpacing: 30
rankSpacing: 25
flowchart:
subGraphTitleMargin:
top: 20
bottom: 20
---
flowchart TB
classDef AssetsClass fill:#3f51b5,stroke-width:2px,fill-opacity:0.5, font-size: 20px, min-width:350px
classDef partsAssetsClass font-size:15px,stroke-width:2px,stroke-dasharray:10px,rx:15,ry:15, min-width:250px
classDef AttributesClass min-width: 200px
subgraph assetSchema["Asset Schema"]
direction TB
attributes:::partsAssetsClass ~~~ stages:::partsAssetsClass
end
assetSchema:::AssetsClass
subgraph attributes["Attributes"]
direction LR
attr_1(["     ...     "]):::AttributesClass
attr_2(["     ...     "]):::AttributesClass
attr_3(["     ...     "]):::AttributesClass
attr_4(["     ...     "]):::AttributesClass
end
subgraph stages["Stages"]
direction LR
stage_1(["     ...     "]):::AttributesClass
stage_2(["     ...     "]):::AttributesClass
stage_3(["     ...     "]):::AttributesClass
stage_4(["     ...     "]):::AttributesClass
end
Attributes are configuration details. Although the order of definition does not matter, we follow the convention of defining them in the order of name, metadata, and parents.
<asset_type>/<name>/<version>.The stages define the operation chain and flow the asset performs on events. Each stage is executed in the order of definition:
---
config:
nodeSpacing: 30
rankSpacing: 25
flowchart:
subGraphTitleMargin:
top: 20
bottom: 20
---
flowchart TB
classDef partsAssetsClass font-size:15px,stroke-width:2px,stroke-dasharray:10px,rx:15,ry:15
classDef EventBoxClass font-size: 15px,stroke-width:2px, color:#fff, fill:#3f51b5
classDef stateFailClass fill:#f50057,stroke-width:2px,fill-opacity:0.5
classDef stateSuccessClass fill:#2196f3,stroke-width:2px,fill-opacity:0.8
eventInput@{ shape: doc, label: "Event input"}
eventInput:::EventBoxClass
eventInput ---> stage_1
subgraph stages["Stages"]
direction TB
stage_1(["Process stage 1"])
stage_2(["Process stage 2"])
stage_3(["Process stage ..."])
stage_4(["Process stage n"])
stage_1 --->|success| stage_2
stage_2 --->|success| stage_3
stage_3 --->|success| stage_4
end
stages:::partsAssetsClass
failedState@{ shape: dbl-circ, label: "Asset fail" }
failedState:::stateFailClass
stage_1 --->|failed|failedState
stage_2 --->|failed|failedState
stage_3 --->|failed|failedState
successState@{ shape: dbl-circ, label: "Asset success" }
successState:::stateSuccessClass
stage_4 --->|success| successState
When a stage is executed, it can either fail or succeed, depending on the logic of the stage and the operations performed. Each stage is sequentially executed only if the previous stage succeeds.
Stages:
The type of asset is an allowed combination of certain stages. The following table outlines the stages available for each type of asset:
| Asset Type | Allowed Stages |
|---|---|
| Decoders | check, parse, map, normalize |
| Rules | check, map, normalize_rule |
| Outputs | check, output, first_of |
| Filters | allow |
Each asset has a name and metadata, with custom metadata suited to its specific needs. Additionally, each asset can have parents.
Operations are the fundamental units within the operation graph. Each operation can succeed or fail, forming the basis for defining the graph by combining operations based on their execution results.
Operations are always defined as:
target_field: operation
Where the operation can be:
When building an asset, the process can fail if there is any operation that contains a:
With a basic understanding of a policy and its components, we can look at the global picture of how the operation graph is defined. Within the policy, we have subgraphs, each corresponding to an asset type:
<workflow_placeholder>
Every event traverses each subgraph independently, forming a chain of subgraphs.
Each graph is composed of assets defined by parent relationships. An event moves down in the graph based on the asset's logical output. If an asset operation succeeds, the event is sent to its child assets.
The traversal of each subgraph follows these rules:
An asset is considered successful if it accepts the event, meaning all conditional stages have succeeded, regardless of the outcome of transformational stages. This ensures that events are processed through the appropriate path based on successful acceptance checks.
A stage succeeds if the logical combination of its operations succeeds. The exact combination logic is determined by the stage itself. This ensures that each stage can apply its own logic to decide whether it has successfully processed an event.
<workflow_placeholder>
Implement all the high level operations available to the user when developing the ruleset. Each function defines its signature, its mode of operation and its error management. Users cannot change the behavior of a helper function, and cannot combine two functions into a single expression.
There are two intrinsic operations which do not require additional syntax:
All other operations are accessed through the helper functions.
The syntax for calling a helper function is helper_name(args…). They can be used in both check operations and map operations. Helper functions are classified into three categories:
Check the helper standard library for a complete list of available helper functions.
When using conditional helpers, the syntax is:
target_field: condition_helper(args)
The helper will always test a condition on target_field. If the condition passes, the operation succeeds; otherwise, it fails.
When using mapping helpers, the syntax is:
target_field: map_helper(args)
The helper will always map the target_field if the operation succeeds. If the operation fails, the target_field remains unchanged.
When using transformational helpers, the syntax is:
target_field: transform_helper(args)
The helper will transform the target_field if the operation succeeds. If the operation fails, the target_field remains unchanged.
A transformational helper may fail due to implicit conditions, such as expecting a specific type for the target field, missing a reference passed as an argument, etc.
It is important to understand that every operation can succeed or fail, and this is the foundation for defining the operation graph.
When using a helper function in a map or check operation:
target.field: helper_name(args…)
The arguments for helper_name can be references to other fields, or JSON values (string, number, boolean, array, or object). Multiple arguments should be separated by commas.
When the helper function is built, arguments are parsed in the following order:
'value' starts and ends with a single quote. \ and ' can be escaped.$reference starts with $, followed by alphanumerics plus #, _, @, and -, separated by dots.$, ', ,, ), \, and whitespace.Invalid escape sequences will always fail.
For example:
target.field: helper_name('string', $reference, 123, {"key": "value"})
This call applies helper_name to the target.field with arguments:
'string': A string value.$reference: A reference to another field.123: A numeric value (valid JSON).{"key": "value"}: A JSON object.[!NOTE]
123is a valid json, not only the objects are valid jsons according to the JSON standard, but also the numbers, strings, booleans and null values are valid JSON values.
When parsing a helper function inside a logical check expression the same rules apply adding that at least one argument is expected for the helper, specifying the target field:
check: helper_name($target.field, args…)
Added we can specify comparison helpers as operators:
check: $target.field <op> <value>
Where value is parsed as a single helper argument following the same escaping rules and order.
Where op is any of the following:
==!=<<=>>=When using any operator that is not the equality operator only string or integer values are allowed.
When using the default map or filter functions for string operations, values are parsed according to standard YAML
(or JSON) types. If a value is a string and begins with the reference symbol $, it is treated as a potential
reference. If the reference is invalid the operation building fails.
check:
- target.field: <yaml_type>|<$ref>
map:
- target.field: <yaml_type>|<$ref>
Below are some usage examples:
# Example 1: Simple equality check
check: http.method == "GET"
# Example 2: Comparison with an integer value
check: $event.severity > 3
# Example 3: Using a helper function in check
check: cidr_match($source.ip, "192.168.0.0", 24)
To facilitate the reuse of constructors when building large assets—such as parsing code for events with common headers or repeated constructs like IP/port definitions—definitions can be introduced at the specification level. For example:
definitions:
header: <timestamp> <host.hostname> <daemon>:
source: <source.ip>:<source.port>
destination: <destination.ip>:<destination.port>
a-list:
- item1: value1
- item2: value2
These definitions can then be referenced elsewhere:
parse|field:
- <$header> from <$source> to <$destination> deny
This approach enables text reuse within an asset. Definitions are applied at build time through interpolation and do not function as runtime variables, i.e., cannot be modified once declared.
[!NOTE] The definitions are typed, respecting JSON/YAML data types. For example, you can define a YAML object or a numeric value and use it as an argument for helper functions, not just strings for parsing.
Variables are temporary fields scoped to the current asset that is processing an event. They are identified by prefixing their name with an underscore _, following the standard field naming convention and supporting any operation just like fields.
_field.name
Key characteristics:
Log parsing transforms raw log entries into structured data using parser expressions. These expressions serve as an alternative to Grok, eliminating the need for explicit type declarations by leveraging predefined schema-based parsing. Instead of regular expressions, they use specialized parsers for improved accuracy and efficiency.
Key Components:
Example:
This expression captures an IP or hostname into client.ip or client.address and, if present, captures a port into server.port:
parse|event.original:
- "<client.ip>?<client.address> connected to <server.ip>(?:<server.port>)"
For a log entry:
192.168.1.10 connected to 10.0.0.5:443
It extracts:
{
"client.ip": "192.168.1.10",
"server.ip": "10.0.0.5",
"server.port": "443"
}
Parsers are also available as helper functions for use in map and check operations. For a detailed explanation, see the Parser Stage and Parser Helper Functions sections.
The engine allows the definition of JSON key-value databases, making them available in the assets through helper functions. These databases can be used to map large serializable data and check Indicators of Compromise (IoCs) or extensive key databases that require constant management. This approach separates the maintenance of such data from the decoder, streamlining the process and ensuring efficient data handling.
For more details on managing kvdbs, refer to the How to Manage kvdb section and the kvdb Library helper functions.
Normalizing Large Serializable Data: Maps event IDs to predefined categorization fields using a kvdb, simplifying event classification. Example:
normalize:
- map:
- event: kvdb_get_merge(windows_security_eventid_to_category_type_action, $event.code)
This retrieves event categories, types, and actions based on the event ID.
Checking Indicators of Compromise (IoCs): Compares log entries against a kvdb of known malicious IPs to identify potential threats. Example:
normalize:
- check:
- source.ip: kvdb_match(known_malicious_ips)
If a match is found, security-related data is mapped to the event.
Assets are capable of handling dates in various formats and time zones. This flexibility is achieved through configurable parsers (refer to the date parser documentation for more details).
Once a date is parsed, the Engine normalizes it to UTC. This ensures that all timestamps are stored and processed homogeneously, maintaining consistency across event processing and dashboard visualization.
Assets are capable of enriching events with geolocation information, enhancing event data with location-based context. This is achieved by using Maxmind - GeoLite databases, which provide location data based on IP addresses. For more details, see the geo location helper documentation.
The GeoLite databases are configured through the API, allowing you to specify the relevant databases to be used for geolocation enrichment. For more information on how to configure these databases, refer to the API documentation.
Decoders are the first layer of assets that pass through the event when it is processed by a security policy. They are responsible for normalizing the event, transforming it into a structured event.
All events enter the pipeline through the root decoder, which selects the appropriate decoder to process the event. Each subsequent decoder processes the event as much as it can and then passes it to the next suitable decoder. This continues until no more decoders can process the event. A decoder can only select one next decoder from the available ones.
---
title: Decoder schema
---
kanban
Decoder[Decoder schema]
assetName["name"]@{ priority: 'Very Low'}
assetMetadata["metadata"]@{ priority: 'Very Low'}
assetParents["parents"]
assetChecks["checks"]
decoParsers["parser"]
decoNormalize["normalize"]
Name: Identifies the decoder and follows the pattern <asset_type>/<name>/<version>. The name is unique and cannot
be repeated. The naming convention for components is <type>/<name>/<version>. The component type is decoder, and
the version must be 0, since versioning is not implemented:
Metadata: Each decoder has metadata that provides information about the decoder, such as the supported products, versions, and formats. This metadata does not affect the processing stages. The metadata fields are:
module (string): The module that the decoder is associated with. I.e., syslog, windows, apache, etc.title (string): The title of the decoder. I.e., Windows Event Log Decoder, Linux audit system log decoder, etc.description (string): A brief description of the decoder.compatibility (string): A description of the compatibility of the decoder with different products, versions, and formats.
i.e The Apache datasets were tested with Apache 2.4.12 and 2.4.46 and are expected to work with all versions >= 2.2.31 and >= 2.4.16 (independent from operating system)version (array): A list of versions for which the logs have been tested and supported. I.e., 2.2.x, 3.x, etc.author (object): The author of the decoder, ie:
name: Wazuh, Inc.
email: [email protected]
url: https://wazuh.com
date: 2022-11-15
reference (array): A list of references to the documentation, i.e.:
- https://httpd.apache.org/docs/2.2/logs.html
- https://httpd.apache.org/docs/2.4/logs.html
Parents: Defines the order in the decoder graph, establishing the parent-child relationship between decoders. A decoder can have multiple parents, when an event is successfully processed in a decoder, it will evaluate the children, one by one, until it finds a decoder that successfully processes the event.
[!IMPORTANT] There is no order of priority when evaluating the children, and it cannot be assumed that a sibling decoder will be evaluated before another one.
Rules are the second layer of assets that process events in a security policy. They are responsible for analyzing the normalized event, when the decoding stage is finished, to add context, security indicators, and threat intelligence. Unlike decoders, the rule cannot modify the decoded event, but it can add new certain fields to enrich the event, this prevents the rules from being used to decode events.
---
title: Rule schema
---
kanban
Rule[Rule schema]
assetName["name"]@{ priority: 'Very Low'}
assetMetadata["metadata"]@{ priority: 'Very Low'}
assetParents["parents"]
assetChecks["checks"]
ruleNormalize["rule_enrichment"]
Name: Identifies the rule and follows the pattern <asset_type>/<name>/<version>. The name is unique and cannot
be repeated. The naming convention for components is <type>/<name>/<version>. The component type is rule, and
the version must be 0, since versioning is not implemented:
Metadata: Each rule has metadata that provides information about the rule, such as the supported products, versions, and formats. This metadata does not affect the processing stages. The metadata fields are:
description (string): A brief description of the rule.TODO: Add more fields when the metadata is defined.Parents: Defines the order in the rule graph, establishing the parent-child relationship between rules, a rule can have multiple parents, when an event is successfully processed in a rule (rule matches), it will evaluate all the children. Unlike decoders, and all children will be evaluated.
Checks: The checks stage is a preliminary stage in the asset processing sequence, designed to assess whether an event meets specific conditions. On the rules, the checks stage is used to evaluate the conditions that the event must meet to be considered a security event. More information on the checks stage can be found in the Check section.
Rule Enrichment: The rule enrichment stage is used to add context, security indicators, and threat intelligence to
the normalized event. This stage is used to add new fields to the event, but it cannot modify the normalized event, it
like the map stage, but with the restriction that it cannot modify the normalized event, only rule fields can be added.
Outputs are the last layer of assets that process events in a security policy. They are responsible for storing the security events in a storage system, sending them to a wazuh-indexer, a file, or sending them to a third-party system.
---
title: Output schema
---
kanban
Output[Output schema]
assetName["name"]@{ priority: 'Very Low'}
assetMetadata["metadata"]@{ priority: 'Very Low'}
assetParents["parents"]
assetChecks["checks"]
OutputNormalize["output stage"]
Name: Identifies the output and follows the pattern <asset_type>/<name>/<version>. The name is unique and cannot
be repeated. The naming convention for components is <type>/<name>/<version>. The component type is output, and
the version must be 0, since versioning is not implemented:
Metadata: Each output has metadata that provides information about the output, such as the destination, version, and format. This metadata does not affect the processing stages. The metadata fields are:
description: A brief description of the output.Parents: Defines the order in the output graph, establishing the parent-child relationship between outputs. An output can have multiple parents, when an event is successfully processed in an output, it will evaluate all the children. Usually, the outputs are the last assets in the policy, so they do not have children.
Checks: The checks stage is a stage in the output asset used to evaluate the conditions that the event must meet to be sent to the output. More information on the checks stage can be found in the Check section.
Filters are assets with a single stage (check) used to evaluate conditions without modifying the event. They do not support parse|* or normalize stages.
The filters are used to:
---
title: Filter schema
---
kanban
Output[Filter schema]
assetName["name"]@{ priority: 'Very Low'}
assetId["id"]@{ priority: 'Very Low'}
assetEnabled["enabled"]@{ priority: 'Very Low'}
assetType["type"]@{ priority: 'Very Low'}
assetMetadata["metadata"]@{ priority: 'Very Low'}
%% assetParents["parents"]
assetChecks["check"]
Name: Identifies the filter and follows the pattern <asset_type>/<name>/<version>. The name is unique and cannot
be repeated. The naming convention for components is <type>/<name>/<version>. The component type is filter, and
the version must be 0, since versioning is not implemented:
ID: Unique identifier for the filter in UUIDv4 format.
Enabled: Boolean flag to enable or disable the filter.
Type: Defines when the filter is executed in the pipeline:
pre-filter: evaluated before decoders.post-filter: evaluated after decoders.Metadata: Provides descriptive information about the filter (module, title, description, compatibility, versions, author, references). This metadata does not affect processing stages.
Check: The check stage is a stage in the filter asset used to evaluate the conditions that the event must meet to pass the filter. More information on the checks stage can be found in the Check/allow section.
Parents (optional): Defines parent filters evaluated before this one.
Definitions (optional): Defines symbols that will be replaced throughout the document in its occurrences.
name: filter/prefilter/0
id: fef71314-00c6-41f5-ab26-15e271e9f913
enabled: true
type: pre-filter
metadata:
module: wazuh
title: Platform filter
description: Filter events by platform
compatibility: Wazuh 5.*
versions:
- Wazuh 5.*
author:
name: Wazuh, Inc.
url: https://wazuh.com
date: 2024-01-31
references:
- https://documentation.wazuh.com/
check: $host.os.platform == 'ubuntu'
[!NOTE] When filter assets are used in the orchestrator, they don't have parents; they are a check stage that is evaluated before or after decoders depending on the
typefield.
The check stage is a preliminary stage in the asset processing sequence, designed to assess whether an event meets specific conditions without modifying the event itself. Filters events based on predefined criteria, ensuring that only relevant events trigger the subsequent stages like parse or normalize.
There are two ways to define conditions in a stage check: through a condition list or a conditional expression string.
In a condition list, each condition is described with a pair property:value. Here, property is the name of any field, and value is the condition that the field must meet.
The event is filtered through all listed conditions, and only events that satisfy all conditions in order are processed further.
Depending on the value, the condition to test is:
$field.name.helper_name(args…).Example checklist:
check:
- event.format: text
- user.name: $root_user
- event.original: exists()
- event.id: 1234
All conditions must be met for the event to pass through the check stage. If any condition fails, the event is not processed further.
[!NOTE]
event.id: 1234is not the same asevent.id: "1234"because the first one is a number and the second one is a string.
For scenarios requiring complex conditions, especially in rules, a conditional expression allows for more nuanced logic. This string uses a subset of first-order logic language, including logical connectives and support for grouping through parentheses.
Logical Connectives:
NOT)AND)OR)These connectives facilitate writing conditions between terms where a term can be:
<$field><op><value>.<helper_name>(<field>, args...), except for the “exists” helper, which can be denoted by the field name alone.Supported Operators:
!= and == are applicable to all data types.<=, <, >=, > are supported for numbers and strings (lexicographically).Examples of conditional expressions:
check: $event.category=="registry" AND $event.type=="change" AND ($registry.path=="/some/path" OR $registry.path=="/some/other/path")
check: int_less($http.response.status_code, 400)
check: $wazuh.origin == /var/log/apache2/access.log OR $wazuh.origin == /var/log/httpd/access_log
Executes a series of parsing expressions that transform the event's original message into clearly defined data fields. The parsing operations are processed in sequence, with each operation attempted until one succeeds. If an operation succeeds, subsequent operations in the list are skipped.
If all operations fail, the execution of the stage is marked as failed, the processing of the event could continue with the next substage only if it is within a normalize.
For a complete list of parsers check the Parsers reference.
Parser expressions facilitate the transformation of log entries into structured objects, offering an alternative to Grok by eliminating the need for explicit type declarations as these are predefined in the schema. Instead of regular expressions, these expressions utilize tailored parsers, enhancing precision.
The parser expressions are composed of various tokens or expressions, where these expressions can be any one of the following:
Literals, Direct characters that match input text exactly. with certain reserved characters that require escaping (used in other tokens), with the character \ , precisely: <>?\(
E.g.: The following expression will match exactly that in the log line
[Fri Sep 09 10:42:29.902022 2011] [core:error]
Fields, are expressions of the form <[?]field_name[/param1…]>, where we can identify 4 different types of field expressions:
Schema fields: Directly correspond to fields defined in the schema, with the engine automatically applying the appropriate parser and can have parameters depending on the parser.
E.g.: <source.ip> will match any IPv4 or IPv6 and map it to the field source.ip.
Custom fields: Are those that are not in the schema, in contrast to schema fields, custom fields use the text parser unless specified otherwise by the parameters. These are intended for temporary/auxiliary use.
Custom fields are indexed as text by default in open search, depending on the configuration open search may try to detect and index as other types.
E.g.: <custom_field/long> will match any number and map it to the field custom_field.
Optional fields: we indicate that a field parse expression is optional writing the interrogation symbol ? at the beginning of the expression. If the parsing fails it will continue with the next expressions.
E.g.: <field1>|<?field2>|<field3> will match anything between | symbols three times, and the second may be empty or not.
Field choice: Expresses a choice between two field expressions, meaning one of the fields must match. We write two field expressions splitted by the interrogation symbol ?. As the first choice we can only use parsers that do not require end token, if we use one that does the expression will fail to compile because the end tokens are not sent to the parser.
E.g.: <source.ip>?<~/literal/-> will match an IP and map it to source ip or a hyphen, skipping it.
Note: ? only needs to be scaped when it appears right after a field expression: <source.ip>\?...
Wildcards, follows the same syntax and behaves the same as a custom field but has no name and does not map to any field. It is used to parse some pattern without extracting. Can be a optional or in a field choice also.
E.g.:<~/byte> will parse a byte value and continue.
E.g.:Error Code: <~/long> Description: <message> Here, <~/long> uses a wildcard to parse an integer error code that isn’t mapped to any field, essentially ignoring it while capturing the subsequent error description into message.
Optional groups, make a logpar subexpression optional. The optional expression is denoted with (?sub_expression). Used to make some more complex patterns optional where a simple optional field won’t suffice. It can contains any valid logpar expression with the exception of another optional group. An optional group can not contains another group, and two optional groups may not appear in a row.
E.g.:[<source.ip>(?:<source.port>)] will match any ip address optionally followed by a port separated by double dots, and being between brackets.
Examples:
This expression will capture an IP address or hostname into client.ip or client.address, and optionally capture a port into server.port if it is present.
<client.ip>?<client.address> connected to <server.ip>(?:<server.port>)
Apache error parser:
# [Mon Dec 26 16:15:55.103522 2016] [mpm_prefork:notice] [pid 11379] AH00163: Apache/2.4.23 (Unix) configured -- resuming normal operations
# [Mon Dec 26 16:15:55.103786 2016] [core:notice] [pid 11379] AH00094: Command line: '/usr/local/Cellar/httpd24/2.4.23_2/bin/httpd'
# [Wed Oct 20 19:20:59.121211 2021] [rewrite:trace3] [pid 121591:tid 140413273032448] mod_rewrite.c(470): [client 10.121.192.8:38350] 10.121.192.8 - - [dev.elastic.co/sid#55a374e851c8][rid#7fb438083ac0/initial] applying pattern '^/import/?(.*)$' to uri '/'
# [Wed Oct 20 19:20:59.121211 2021] [rewrite:trace3] [pid 121591:tid 140413273032448] mod_rewrite.c(470): [client milo.dom.com:513] 10.121.192.8 - - [dev.elastic.co/sid#55a374e851c8][rid#7fb438083ac0/initial] applying pattern '^/import/?(.*)$' to uri '/'
# [Mon Dec 26 16:22:08 2016] [error] [client 192.168.33.1] File does not exist: /var/www/favicon.ico
# [Fri Sep 09 10:42:29.902022 2011] [core:error] [pid 35708:tid 4328636416] [client 89.160.20.112] File does not exist: /usr/local/apache2/htdocs/favicon.ico
# [Thu Jun 27 06:58:09.169510 2019] [include:warn] [pid 15934] [client 67.43.156.12:12345] AH01374: mod_include: Options +Includes (or IncludesNoExec) wasn't set, INCLUDES filter removed: /test.html
# [Mon Dec 26 16:17:53 2016] [notice] Apache/2.2.22 (Ubuntu) configured -- resuming normal operations
# [Mon Dec 26 16:22:00 2016] [error] [client 192.168.33.1] File does not exist: /var/www/favicon.ico, referer: http://192.168.33.72/
# [Mon Dec 26 16:22:08 2016] [error] [client 192.168.33.1] File does not exist: /var/www/favicon.ico
parse|event.original:
- "[<@timestamp/Mon Dec 26 16:22:00 2016>] [<log.level>] [client <source.address>(?:<source.port>)] <message>"
- "[<@timestamp/%a %b %d %T %Y/en_US.UTF-8>] [<~apache.error.module>:<log.level>] [pid <process.pid>(?:tid <process.thread.id>)] [client <source.address>(?:<source.port>)] <message>"
- "[<@timestamp/%a %b %d %T %Y/en_US.UTF-8>] [<~apache.error.module>:<log.level>] [pid <process.pid>(?:tid <process.thread.id>)] <message>"
Fields within the schema are bound to specific parsers that ensure the data matches the expected format. For example, date fields may require a specific time format, which is denoted using parameters following the field name in the format <field_name/param_0/param_n>. This allows for flexible data validation against predefined types or custom formats specified by additional parameters.
For standard fields defined in the schema, each field type has an associated parser. For instance, a field of type long (like event.severity) will utilize a numeric parser.
Custom fields not defined in the schema default to using the text parser, which continues parsing until a specified delimiter or the end of the log entry. Fields at the end of a parser expression are interpreted as catch-all, capturing all remaining text in the log entry. This approach facilitates partial parsing where full matching is not required, ensuring flexibility in log analysis.
For example, the case of <tmp_field>c this will parse everything until character c is found. It is possible to specify the parser name as the first argument, for example <tmp_field/ip>c will use the ip parser instead of the text parser. It is also possible to pass parameters to the parser, for example <tmp_field/date/RFC822>c parses the timestamp using the RFC822 format.
Some parsers need an anchor token to stop parsing, i.e. the text parser used by default in all custom fields will parse any character until the end token is found. The end token is the literal following the field expression, or if the field expression is at the end it will parse the remaining string.
E.g.:<header>:<message> will parse and capture any text up to a double dots in the header field, and the remaining of the log will be captured in the message field.
This implicates that two field expressions must be splited by a literal unless the first one does not need an end token, while <custom/long><~> is valid, as the long parsers does not need end token, <text><~> will be invalid.
In choice expressions the end token is the same for both field expressions, it is the literal right after the second field expression. E.g, <custom_ip>?<~>| will be valid, as both parsers require an end token, in this case the literal |. This implies the same as before, if one of the choices needs an end token, it must be followed by a literal.
In optional group expressions, i.e. when a field is followed by an optional group, there are multiple end tokens. The literal at the beginning of the optional group and the literal right after the group, meaning if a field needs an end token and is followed by an optional group, the group must start with a literal and a literal must appear right after the group.
E.g.: <custom_text>(?|<opt/long>):, in this case the text parser will try to search for a |, so the optional group can be parsed, and if the optional group fails, then the text parser will use the symbol : as end token.
Executes each operation of the list in order until the last operation. If any operation fails, it continues with the next one.
If all operations fail the stage is not marked as failed and continues to the next stage.
Type Validation:
source.ip → IP, event.start → timestamp).
Example:
- map:
- event.kind: event
- event.dataset: apache.access
- event.category: +array_append/web
- event.module: apache
- service.type: apache
- event.outcome: success
The normalize stage is where the event undergoes transformations and adjustments after passing through the check and parse stages successfully. Composed of a list of sub-stages that are executed in the specified order. These sub-stages can include operations such as check, map, and parse.
Each set of sub-stages is processed sequentially. If a check and parse within a normalization block is successful, the corresponding map is executed, replicating the check, parse, and normalize stages of the asset.
Example:
normalize:
- map:
- wazuh.decoders: array_append(windows-sysmon)
- event.dataset: sysmon
- event.kind: event
# Only maps network.protocol if event.code is 22
- check: $event.code == '22'
map:
- network.protocol: dns
# Only maps resources if the check and parse stages succeeded
- check: $event.outcome == failure
parse|message:
- "[<error.code/int>]<details>"
map:
- resources: split($details, ",")
The Output Stage allows you to specify and configure the different outputs, defining the final destinations for events before they leave the Engine.
outputs:
- output_name: configuration
For more details on the available output types and configurations, refer to output documentation.
These parsers are used automatically when a field of its type is used in a logpar expression.
For example, if you use the field <event.start> which is of type date, it will be parsed automatically by the date parser.
These parsers will generate fields which are type-compatible with Wazuh Indexer.
| Type | Parser | Description |
|---|---|---|
| null | - | A null field can’t be indexed or searched. When a field is set to null, OpenSearch behaves as if that field has no values. |
| boolean | bool | OpenSearch accepts true and false as boolean values. An empty string is equal to false. |
| float | float | Codified as decimal representation in string format. A single-precision 32-bit IEEE 754 floating point number, restricted to finite values. |
| scaled_float | scaled_float | Codified as decimal representation in string format. The scaling factor is defined in the schema. |
| double | double | Codified as decimal representation in string format. A double-precision 64-bit IEEE 754 floating point number, restricted to finite values. |
| long | long | Codified as decimal representation in string format. A signed 64-bit integer with a minimum value of -2^63 and a maximum value of 2^63-1. |
| byte | byte | Codified as a decimal representation in string format. A signed 8-bit integer with a minimum value of -128 and a maximum value of 127. |
| object | - | - |
| array | - | - |
| nested | - | - |
| text | text | A string sequence of characters that represent full-text values. |
| keyword | text | A string sequence of characters that represent full-text values. |
| ip | ip | A string with IPv4 or IPv6 address. |
| date | date | Date codified as string. All dates are converted to a unified date in UTC timezone. |
| geo_point | - | - |
| binary | binary | A codified base64 string. |
Aditionally we define some types for the purpose to use specific parsers, normally used to parse objects or structured types from an input text. This is the case for url field for example.
| Type | Parser | Description |
|---|---|---|
| url | uri | Parses URI text and generates the URL object with all the parsed parts. |
| useragent | useragent | Parses a user agent string. It does not build the user agent object; this can be done with the OpenSearch plugin. |
By default, the Engine's log information is recorded in journald when launching the wazuh-manager service.
You can retrieve logs specifically for the Engine using journald’s _COMM field:
journalctl _COMM=wazuh-engine
For real-time monitoring of errors:
journalctl -f _COMM=wazuh-engine
To refine logs based on severity levels you can combine grep:
journalctl _COMM=wazuh-engine | grep info
Dec 18 14:59:22 WazPc env[12974]: 2024-12-18 14:59:22.663 12974:12974 info: Logging initialized.
Dec 18 14:59:22 WazPc env[12974]: 2024-12-18 14:59:22.668 12974:12974 fileDriver.cpp:231 at readCol(): debug: FileDriver readCol name: 'namespaces/system/decoder/core-hostinfo'.
Dec 18 14:59:22 WazPc env[12974]: 2024-12-18 14:59:22.669 12974:12974 main.cpp:166 at main(): info: Store initialized.
Dec 18 14:59:22 WazPc env[12974]: 2024-12-18 14:59:22.669 12974:12974 main.cpp:172 at main(): info: RBAC initialized.
Available severity levels:
Traces allow you to inspect the operational graph behavior, providing insights into how events are processed within the Engine. By using the tester endpoint (refer to the API documentation for details), you can specify several options to debug event processing effectively.
Available trace options:
Here is a test example showing the graph history:
traces:
[🔴] decoder/zeek-x509/0 -> failed
[🔴] decoder/zeek-weird/0 -> failed
[🔴] decoder/zeek-traceroute/0 -> failed
[🔴] decoder/zeek-stats/0 -> failed
[🔴] decoder/zeek-software/0 -> failed
[🔴] decoder/zeek-socks/0 -> failed
[🔴] decoder/zeek-snmp/0 -> failed
[🔴] decoder/zeek-smb_mapping/0 -> failed
[🔴] decoder/zeek-smb_files/0 -> failed
[🔴] decoder/apache-error/0 -> failed
[🔴] decoder/zeek-smb_cmd/0 -> failed
[🔴] decoder/zeek-ssl/0 -> failed
[🔴] decoder/snort-json/0 -> failed
[🔴] decoder/squid-access/0 -> failed
[🔴] decoder/zeek-known_certs/0 -> failed
[🔴] decoder/suricata/0 -> failed
[🔴] decoder/zeek-irc/0 -> failed
[🔴] decoder/microsoft-exchange-server-smtp/0 -> failed
[🔴] decoder/snort-plaintext/0 -> failed
[🔴] decoder/pfsense-firewall/0 -> failed
[🔴] decoder/pfsense-dhcp/0 -> failed
[🔴] decoder/apache-access/0 -> failed
[🔴] decoder/snort-plaintext-csv/0 -> failed
[🔴] decoder/zeek-sip/0 -> failed
[🔴] decoder/pfsense-unbound/0 -> failed
[🔴] decoder/iis/0 -> failed
[🔴] decoder/zeek-signature/0 -> failed
[🔴] decoder/modsecurity-nginx/0 -> failed
[🔴] decoder/microsoft-dhcpv6/0 -> failed
[🔴] decoder/zeek-conn/0 -> failed
[🔴] decoder/zeek-modbus/0 -> failed
[🔴] decoder/microsoft-exchange-server-imap4-pop3/0 -> failed
[🔴] decoder/pfsense-php-fpm/0 -> failed
[🔴] decoder/microsoft-exchange-server-messagetracking/0 -> failed
[🔴] decoder/microsoft-exchange-server-httpproxy/0 -> failed
[🔴] decoder/zeek-kerberos/0 -> failed
[🔴] decoder/modsecurity-apache/0 -> failed
[🔴] decoder/microsoft-dhcp/0 -> failed
[🔴] decoder/zeek-pe/0 -> failed
[🔴] decoder/windows-event/0 -> failed
[🔴] decoder/zeek-capture_loss/0 -> failed
[🔴] decoder/zeek-dhcp/0 -> failed
[🔴] decoder/zeek-dnp3/0 -> failed
[🔴] decoder/zeek-dns/0 -> failed
[🔴] decoder/zeek-smtp/0 -> failed
[🔴] decoder/zeek-http/0 -> failed
[🔴] decoder/zeek-rfb/0 -> failed
[🔴] decoder/zeek-files/0 -> failed
[🔴] decoder/zeek-ftp/0 -> failed
[🔴] decoder/zeek-ssh/0 -> failed
[🔴] decoder/zeek-ocsp/0 -> failed
[🔴] decoder/zeek-dce_rpc/0 -> failed
[🔴] decoder/zeek-intel/0 -> failed
[🔴] decoder/zeek-syslog/0 -> failed
[🔴] decoder/zeek-known_hosts/0 -> failed
[🔴] decoder/zeek-dpd/0 -> failed
[🔴] decoder/zeek-known_services/0 -> failed
[🔴] decoder/zeek-mysql/0 -> failed
[🔴] decoder/zeek-ntlm/0 -> failed
[🔴] decoder/zeek-tunnel/0 -> failed
[🔴] decoder/zeek-notice/0 -> failed
[🔴] decoder/zeek-ntp/0 -> failed
[🔴] decoder/zeek-radius/0 -> failed
[🟢] decoder/syslog/0 -> success
[🔴] decoder/sysmon-linux/0 -> failed
[🔴] decoder/system-auth/0 -> failed
[🔴] decoder/snort-plaintext-syslog/0 -> failed
[🔴] decoder/wazuh-dashboard/0 -> failed
Showing full traces:
traces:
[🟢] decoder/syslog/0 -> success
↳ [/event/original: <event.start/Jun 14 15:16:01> <host.hostname> <TAG/alphanumeric/->[<process.pid>]:<~/ignore/ ><message>] -> Failure: Parse operation failed: Parser <event.start/Jun 14 15:16:01> failed at: 2018-08-14T14:30:02.203151+02:00 linux-sqrz systemd[4179]: Stopped target Basic System.
↳ [/event/original: <event.start/Jun 14 15:16:01> <host.hostname> <TAG/alphanumeric/->:<~/ignore/ ><message>] -> Failure: Parse operation failed: Parser <event.start/Jun 14 15:16:01> failed at: 2018-08-14T14:30:02.203151+02:00 linux-sqrz systemd[4179]: Stopped target Basic System.
↳ [/event/original: <event.start/2018-08-14T14:30:02.203151+02:00> <host.hostname> <TAG/alphanumeric/->[<process.pid>]: <message>] -> Success
↳ event.kind: map("event") -> Success
↳ wazuh.decoders: array_append("syslog") -> Success
↳ related.hosts: array_append($host.hostname) -> Success
↳ process.name: rename($TAG) -> Success
↳ host.ip: array_append($tmp.host_ip) -> Failure: 'tmp.host_ip' not found