docs/code/GraphRunner.md
The entry point graph in ML.NET is an array of nodes. More information about the definition of entry points and classes that help construct entry point graphs can be found in the EntryPoint.md document.
Each node is an object with the following fields:
The following types are supported in JSON graphs:
string. Represented as a JSON string, maps to a C# string.float. Represented as a JSON float, maps to a C# float or double.bool. Represented as a JSON bool, maps to a C# bool.enum. Represented as a JSON string, maps to a C# enum. The allowed values are those of the C# enum (they are also listed in the manifest).int. Represented as a JSON integer, maps to a C# int or long.array of the above. Represented as a JSON array, maps to a C# array.dictionary. Currently not implemented. Represented as a JSON object, maps to a C# Dictionary<string,T>.component. Represented as a JSON object with 2 fields: name:string and settings:object.The following input/output types can not be represented as a JSON value:
IDataViewIFileHandleITransformModelIPredictorModelThese must be passed as variables. The variable is represented as a JSON string that begins with $.
Note the following rules:
It is allowed to define variables for arrays and dictionaries, as long as the item types are valid variable types (the four types listed above). They are treated the same way as regular 'scalar' variables.
If we want to reference an item of the collection, we can use the [] syntax:
$var[5] denotes 5th element of an array variable.$var[foo] and $var['foo'] both denote the element with key 'foo' of a dictionary variable.
This is not yet implemented.Conversely, if we want to build a collection (array or dictionary) of variables, we can do it using JSON arrays and objects:
["$v1", "$v2", "$v3"] denotes an array containing 3 variables.{"foo": "$v1", "bar": "$v2"} denotes a collection containing 2 key-value pairs.
This is also not yet implemented.Let's consider the following manifest snippet, describing an entry point 'CVSplit.Split':
{
"name": "CVSplit.Split",
"desc": "Split the dataset into the specified number of cross-validation folds (train and test sets)",
"inputs": [
{
"name": "Data",
"type": "DataView",
"desc": "Input dataset",
"required": true
},
{
"name": "NumFolds",
"type": "Int",
"desc": "Number of folds to split into",
"required": false,
"default": 2
},
{
"name": "StratificationColumn",
"type": "String",
"desc": "Stratification column",
"aliases": [
"strat"
],
"required": false,
"default": null
}
],
"outputs": [
{
"name": "TrainData",
"type": {
"kind": "Array",
"itemType": "DataView"
},
"desc": "Training data (one dataset per fold)"
},
{
"name": "TestData",
"type": {
"kind": "Array",
"itemType": "DataView"
},
"desc": "Testing data (one dataset per fold)"
}
]
}
As we can see, the entry point has 3 inputs (one of them required), and 2 outputs. The following is a correct graph containing call to this entry point:
{
"nodes": [
{
"name": "CVSplit.Split",
"inputs": {
"Data": "$data1"
},
"outputs": {
"TrainData": "$cv"
}
}]
}