catboost/docs/en/concepts/python-reference_utils_eval_metric.md
{% include utils-eval_metric__desc %}
eval_metric(label,
approx,
metric,
weight=None,
group_id=None,
subgroup_id=None,
pairs=None,
thread_count=-1)
A list of target variables (in other words, the label values of the objects).
Possible types
Default value
{{ python--required }}
A list of approximate values for all input objects.
Possible types
Default value
{{ python--required }}
The evaluation metric to calculate.
{% cut "Supported metrics" %}
{% include reusage-all-objectives-and-metrics %}
{% endcut %}
Possible types
{{ python-type--string }}
Default value
{{ python--required }}
The weights of objects.
Possible types
Default value
None
Group identifiers for all input objects. Supported identifier types are:
Possible types
Default value
None
Subgroup identifiers for all input objects.
Possible types
Default value
None
The description is different for each group of possible types.
Possible types
{% cut "{{ python-type--list }}, {{ python-type--numpy-ndarray }}, {{ python-type--pandasDataFrame }}, polars.DataFrame" %}
The pairs description in the form of a two-dimensional matrix of shape N by 2:
N is the number of pairs.{% include reusage-learn_pairs__where_is_used %}
{% endcut %}
{% cut "{{ python-type--string }}" %}
The path to the input file that contains the pairs description.
{% include reusage-learn_pairs__where_is_used %}
{% endcut %}
Default value
None
The number of threads to use.
{% include reusage-thread_count__cpu_cores__optimizes-the-speed-of-execution %}
Possible types
{{ python-type--int }}
Default value
{{ fit__thread_count__wrappers }}
{{ python-type--list }} with metric values.
The following is an example of usage with a regression metric:
from catboost.utils import eval_metric
labels = [0.2, -1, 0.4]
predictions = [0.4, 0.1, 0.9]
rmse = eval_metric(labels, predictions, 'RMSE')
The following is an example of usage with a classification metric:
from catboost.utils import eval_metric
from math import log
labels = [1, 0, 1]
probabilities = [0.4, 0.1, 0.9]
# In binary classification it is necessary to apply the logit function
# to the probabilities to get approxes.
logit = lambda x: log(x / (1 - x))
approxes = list(map(logit, probabilities))
accuracy = eval_metric(labels, approxes, 'Accuracy')
The following is an example of usage with a ranking metric:
from catboost.utils import eval_metric
# The dataset consists of five objects. The first two belong to one group
# and the other three to another.
group_ids = [1, 1, 2, 2, 2]
labels = [0.9, 0.1, 0.5, 0.4, 0.8]
# In ranking tasks it is not necessary to predict the same labels.
# It is important to predict the right order of objects.
good_predictions = [0.5, 0.4, 0.2, 0.1, 0.3]
bad_predictions = [0.4, 0.5, 0.2, 0.3, 0.1]
good_ndcg = eval_metric(labels, good_predictions, 'NDCG', group_id=group_ids)
bad_ndcg = eval_metric(labels, bad_predictions, 'NDCG', group_id=group_ids)