Back to Flink

Debugging

flink-python/docs/user_guide/debugging.rst

0.4-rc14.1 KB
Original Source

.. ################################################################################ Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
limitations under the License.

################################################################################

========= Debugging

This page describes how to debug in PyFlink.

Logging Infos

Client Side Logging

You can log contextual and debug information via print or standard Python logging modules in PyFlink jobs in places outside Python UDFs. The logging messages will be printed in the log files of the client during job submission.

.. code-block:: python

from pyflink.table import EnvironmentSettings, TableEnvironment

# create a TableEnvironment
env_settings = EnvironmentSettings.in_streaming_mode()
table_env = TableEnvironment.create(env_settings)

table = table_env.from_elements([(1, 'Hi'), (2, 'Hello')])

# use logging modules
import logging
logging.warning(table.get_schema())

# use print function
print(table.get_schema())

Note: The default logging level at client side is WARNING and so only messages with logging level WARNING or above will appear in the log files of the client.

Server Side Logging

You can log contextual and debug information via print or standard Python logging modules in Python UDFs. The logging messages will be printed in the log files of the TaskManagers during job execution.

.. code-block:: python

from pyflink.table import DataTypes
from pyflink.table.udf import udf

import logging

@udf(result_type=DataTypes.BIGINT())
def add(i, j):
    # use logging modules
    logging.info("debug")
    # use print function
    print('debug')
    return i + j

Note: The default logging level at server side is INFO and so only messages with logging level INFO or above will appear in the log files of the TaskManagers.

Accessing Logs

If environment variable FLINK_HOME is set, logs will be written in the log directory under FLINK_HOME. Otherwise, logs will be placed in the directory of the PyFlink module. You can execute the following command to find the log directory of the PyFlink module:

.. code-block:: bash

$ python -c "import pyflink;import os;print(os.path.dirname(os.path.abspath(pyflink.__file__))+'/log')"

Debugging Python UDFs

Local Debug

You can debug your python functions directly in IDEs such as PyCharm.

Remote Debug

You can make use of the pydevd_pycharm <https://pypi.org/project/pydevd-pycharm/>_ tool of PyCharm to debug Python UDFs.

  1. Create a Python Remote Debug in PyCharm

    run -> Python Remote Debug -> + -> choose a port (e.g. 6789)

  2. Install the pydevd-pycharm tool

    .. code-block:: bash

    $ pip install pydevd-pycharm
    
  3. Add the following command in your Python UDF

    .. code-block:: python

    import pydevd_pycharm
    pydevd_pycharm.settrace('localhost', port=6789, stdoutToServer=True, stderrToServer=True)
    
  4. Start the previously created Python Remote Debug Server

  5. Run your Python Code

Profiling Python UDFs

You can enable the profile to analyze performance bottlenecks.

.. code-block:: python

t_env.get_config().set("python.profile.enabled", "true")

Then you can see the profile result in the logs (see Accessing Logs_).