Back to Whylogs

Condition Validator UDFs

python/examples/experimental/Condition_Validator_UDF.ipynb

1.6.43.5 KB
Original Source

🚩 Create a free WhyLabs account to get more value out of whylogs!

Did you know you can store, visualize, and monitor whylogs profiles with the WhyLabs Observability Platform? Sign up for a free WhyLabs account to leverage the power of whylogs and WhyLabs together!

Condition Validator UDFs

In this example, we will show how you can create condition validators in a simplified way by using the condition_validator decorator. This will allow you to easily create a condition validator based on a user-defined function (UDF).

Example

Let's say you are logging a numerical column col1, and you want to trigger an action whenever the evaluated row value for this column is greater than 4. To do so, we'll define two functions: an action and a condition. We will then decorate the condition function with the condition_validator decorator, and pass the action function as an argument.

python
# Note: you may need to restart the kernel to use updated packages.
%pip install whylogs
python
import pandas as pd
from typing import Any
from whylogs.experimental.core.validators import condition_validator
from whylogs.experimental.core.udf_schema import udf_schema
import whylogs as why

data = pd.DataFrame({"col1": [1, 3, 7]})

def do_something_important(validator_name, condition_name: str, value: Any, column_id=None):
    print("Validator: {}\n    Condition name {} failed for value {}".format(validator_name, condition_name, value))
    return

@condition_validator(["col1"], condition_name="less_than_four", actions=[do_something_important])
def lt_4(x):
    return x < 4

schema = udf_schema()
why.log(data, schema=schema).view()

You can see that the action was triggered once for the value 7.

Condition Validators are compatible with Dataset UDFs. Through Dataset UDFs, you can create new columns based on the values of other columns. In this example, we will create a new column add5 that is equal to col1 + 5. We will then assign a condition validator to the newly created column:

python
from typing import Dict, List, Union
from whylogs.experimental.core.udf_schema import register_dataset_udf


@register_dataset_udf(["col1"])
def add5(x: Union[Dict[str, List], pd.DataFrame]) -> Union[List, pd.Series]:
    return [xx + 5 for xx in x["col1"]]

@condition_validator(["add5"], condition_name="less_than_four", actions=[do_something_important])
def lt_4(x):
    return x < 4

schema = udf_schema()
why.log(data, schema=schema).view()

Now, our action was triggered 4 times: once for col1's value 7, and 3 times for add5's values 6, 8 and 12.

You can access the assigned condition validators through the schema object. In the following code snippet, we can see that there's one condition validator assigned to col1 and one to add5, both being named less_than_four:

python
schema.validators

We can get a sample of the data that failed the condition. Let's do that for the first (and only) condition validator for the add5 column:

python
schema.validators["add5"][0].get_samples()