docs/en/06-advanced/06-tdgpt/01-introduction.md
Numerous algorithms have been proposed to perform time-series forecasting, anomaly detection, imputation, and classification, with varying technical characteristics suited for different scenarios.
Typically, these analysis algorithms are packaged as toolkits in high-level programming languages (such as Python or R) and are widely distributed and used through open-source channels. This model helps software developers integrate complex analysis algorithms into their systems and greatly lowers the barrier to using advanced algorithms.
Database system developers have also attempted to integrate data analysis algorithm models directly into database systems. By building machine learning libraries (e.g., Spark's MLlib), they aim to leverage mature analytical techniques to enhance the advanced data analysis capabilities of databases or analytical computing engines.
The rapid development of artificial intelligence (AI) has brought new opportunities to time-series data analysis. Efficiently applying AI capabilities to this field also presents new challenges for databases. To this end, TDengine has introduced TDgpt, an intelligent agent for time-series analytics. With TDgpt, you can use statistical analysis algorithms, machine learning models, deep learning models, foundational models for time-series data, and large language models via SQL statements. TDgpt exposes the analytical capabilities of these algorithms and models through SQL and applies them to your time-series data using new windows and functions.
TDgpt is an external agent that integrates seamlessly with TDengine's main process taosd. It allows time-series analysis services to be embedded directly into TDengine's query execution flow.
TDgpt is a stateless platform that includes the classic statsmodels library of statistical analysis models as well as embedded frameworks such as PyTorch and Keras for machine and deep learning. In addition, it can directly invoke TDengine's proprietary foundation model TDtsfm through request forwarding and adaptation.
As an analytics agent, TDgpt will also support integration with third-party time-series model-as-a-service (MaaS) platforms in the future. By modifying just a single parameter (algo), you will be able to access cutting-edge time-series model services.
TDgpt is an open system to which you can easily add your own algorithms for forecasting, anomaly detection, imputation, and classification. Once added, the new algorithms can be used simply by changing the corresponding parameters in the SQL statement, with no need to modify a single line of application code.
TDgpt is composed of one or more stateless analysis nodes, called AI nodes (anodes). These anodes can be deployed as needed across the TDengine cluster in appropriate hardware environments (for example, on compute nodes equipped with GPUs) depending on the requirements of the algorithms being used.
TDgpt provides a unified interface and invocation method for different types of analysis algorithms. Based on user-specified parameters, it calls advanced algorithm packages and other analytical tools, then returns the results to TDengine's main process (taosd) in a predefined format.
TDgpt consists of four main components:
During query execution, the vnode in TDengine forwards any elements involving advanced time-series data analytics directly to the anode. Once the analysis is completed, the results are assembled and embedded back into the query execution process.
The services provided by TDgpt are described as follows:
TDgpt is an extensible platform to which you can add your own algorithms and models using the process described in Algorithm Developer's Guide. After adding an algorithm, you can access it through SQL statements just like the built-in algorithms. It is not necessary to make updates to your applications.
Custom algorithms must be developed in Python. The anode adds algorithms dynamically. When the anode is started, it scans specified directories for files that meet its requirements and adds those files to the platform. To add an algorithm to your TDgpt, perform the following steps:
You can then use your new algorithm in SQL statements.
TDengine Enterprise includes a tool that evaluates the effectiveness of different algorithms and models. You can use this tool on any algorithm or model in TDgpt, including built-in and custom forecasting and anomaly detection algorithms and models. The tool uses quantitative metrics to evaluate the accuracy and performance of each algorithm with a given dataset in TDengine.
Trained models for machine learning frameworks such as PyTorch, TensorFlow, and Keras must be placed in the designated directory on the anode. The anode automatically detects and loads models from this directory.
TDengine Enterprise includes a model manager that integrates seamlessly with open-source end-to-end ML frameworks for time-series data such as Merlion and Kats.
Time-series analytics is a CPU-intensive workflow. Using a more powerful CPU or GPU can improve performance.
Machine and deep learning models in TDgpt are run through PyTorch, and you can use standard methods of improving performance, for example deploying TDgpt on a machine with more RAM and uing a torch model that can use GPUs.
You can add different algorithms and models to different anodes to enable concurrent processing.
With TDengine OSS, permissions and resource management are not provided for TDgpt. TDgpt is deployed as a Flask service through uWSGI. You can monitor its status by opening the port in uWSGI.