catboost/docs/en/concepts/cli-usages-examples.md
Train a classification model with default parameters in silent mode and then calculate model predictions on a custom dataset. The output contains the evaluated class1 probability:
catboost fit --learn-set train.tsv --test-set test.tsv --column-description train.cd --loss-function Logloss
catboost calc -m model.bin --input-path custom_data --cd train.cd -o custom_data.eval -T 4 --prediction-type Probability
Train a model with 100 trees on a comma-separated pool with header:
catboost fit --learn-set train.csv --test-set test.csv --column-description train.cd --loss-function RMSE --iterations 100 --delimiter=',' --has-header
The <q>Verbose</q> logging level mode allows to output additional calculations while learning, such as current learn error or current plus best error on test error. Remaining and elapsed time are also displayed.
The --custom-metric parameter allows to log additional error functions on learn and test for each iteration.
catboost fit --learn-set train --test-set test --column-description train.cd --loss-function Logloss --custom-loss="AUC,Precision,Recall" -i 4 --logging-level Verbose
Example test_error.tsv result:
iter Logloss AUC Precision Recall
0 0.6638384193 0.8759125663 0.8537374221 0.9592193809
1 0.6350880554 0.8840660536 0.8565563873 0.9547779273
2 0.6098460477 0.8914710667 0.8609022556 0.9554508748
3 0.5834954183 0.8954216255 0.8608579414 0.9534320323
Ctr computation on large pools can lead to <q>out of memory</q> problems. In this case it is possible to give Catboost a hint about available memory:
catboost fit --learn-set train.tsv --test-set test.tsv --column-description train.cd --loss-function Logloss --used-ram-limit 4GB
Train a classification model on GPU:
catboost fit --learn-set ../pytest/data/adult/train_small --column-description ../pytest/data/adult/train.cd --task-type GPU
To enable random subspace method for feature bagging use the --rsm parameter:
catboost fit --learn-set train.tsv --test-set test.tsv --column-description train.cd --loss-function Logloss --rsm 0.5
To calculate the object importances:
Train the model:
catboost fit --loss-function Logloss -f train.tsv -t test.tsv --column-description train.cd
Calculate the object importances using the trained model:
catboost ostr -f train.tsv -t test.tsv --column-description train.cd -o object_importances.tsv