Back to Mindsdb

Text Summarization with MindsDB and OpenAI using MQL

docs/use-cases/data_enrichment/text-summarization-inside-mongodb-with-openai.mdx

26.1.05.8 KB
Original Source

Introduction

In this blog post, we present how to create OpenAI models within MindsDB. In this example, we ask a model to provide a summary of a text. The input data is taken from our sample MongoDB database.

Prerequisites

To follow along, install MindsDB locally via Docker or Docker Desktop.

How to Connect MindsDB to a Database

We use a collection from our MongoDB public demo database, so let’s start by connecting MindsDB to it.

You can use Mongo Compass or Mongo Shell to connect our sample database like this:

bash
test> use mindsdb
mindsdb> db.databases.insertOne({
            'name': 'mongo_demo_db',
            'engine': 'mongodb',
            'connection_args': {
                "host": "mongodb+srv://user:[email protected]/",
                "database": "public"
            }
        })

Tutorial

In this tutorial, we create a predictive model to summarize an article.

Now that we've connected our database to MindsDB, let’s query the data to be used in the example:

bash
mindsdb> use mongo_demo_db
mongo_demo_db> db.articles.find({}).limit(3)

Here is the output:

bash
{
  _id: '63d01398bbca62e9c7774ab8',
  article: "Video footage has emerged of a law enforcement officer…",
  highlights: 'The 53-second video features…"
}
{
  _id: '63d01398bbca62e9c7774ab9',
  article: "A new restaurant is offering a five-course…",
  highlights: "The Curious Canine Kitchen is…"
}
{
  _id: '63d01398bbca62e9c7774aba',
  article: 'Mother-of-two Anna Tilley survived after spending four days…',
  highlights: 'Experts have warned hospitals not using standard treatment…'
}

Let's create a model collection to summarize all articles from the input dataset:

<Note> Note that you need to create an OpenAI engine first before deploying the OpenAI model within MindsDB.

Here is how to create this engine:

bash
mongo_demo_db> use mindsdb
mindsdb> db.ml_engines.insertOne(
          {
              "name": "openai_engine",
              "handler": "openai",
              "params": {
                  "openai_api_key": "your-openai-api-key"
                  }
          })
</Note>
bash
mongo_demo_db> use mindsdb
mindsdb> db.models.insertOne({
            name: 'text_summarization',
            predict: 'highlights',
            training_options: {
                        engine: 'openai_engine',
                        prompt_template: 'provide an informative summary of the text text:{{article}} using full sentences'
                }
        })

In practice, the insertOne method triggers MindsDB to generate an AI collection called text_summarization that uses the OpenAI integration to predict a field named highlights. The model is created inside the default mindsdb project. In MindsDB, projects are a natural way to keep artifacts, such as models or views, separate according to what predictive task they solve. You can learn more about MindsDB projects here.

The training_options key specifies the parameters that this handler requires.

  • The engine parameter defines that we use the openai engine.
  • The prompt_template parameter conveys the structure of a message that is to be completed with additional text generated by the model.
<Note> Follow [this instruction](/integrations/ai-engines/openai#setup) to set up the OpenAI integration in MindsDB. </Note>

Once the insertOne method has started execution, we can check the status of the creation process with the following query:

bash
mindsdb> db.models.find({
            'name': 'text_summarization'
        })

It may take a while to register as complete depending on the internet connection. Once the creation is complete, the behavior is the same as with any other AI collection – you can query it either by specifying synthetic data in the actual query:

bash
mindsdb> db.text_summarization.find({
            article: "Apple's Watch hits stores this Friday when customers and employees alike will be able to pre-order the timepiece. And boss Tim Cook is rewarding his staff by offering them a 50 per cent discount on the device."
        })

Here is the output data:

bash
{
  highlights: "Apple's Watch hits stores this Friday, and employees will be able to pre-order the",
  article: "Apple's Watch hits stores this Friday when customers and employees alike will be able to pre-order the timepiece. And boss Tim Cook is rewarding his staff by offering them a 50 per cent discount on the device."
}

Or by joining with a collection for batch predictions:

bash
mindsdb> db.text_summarization.find(
            {
                'collection': 'mongo_demo_db.articles'
            },
            {
                'text_summarization.highlights': 'highlights',
                'articles.article': 'article'
            }
        ).limit(3)

Here is the output data:

bash
{
  highlights: 'A video has emerged of a law enforcement officer grabbing a cell phone from a woman who was',
  article: "Video footage has emerged of a law enforcement officer..."
}
{
  highlights: 'A new restaurant in London is offering a five-course drink-paired menu for dogs',
  article: "A new restaurant is offering a five-course..."
}
{
  highlights: "Sepsis is a potentially life-threatening condition that occurs when the body's response to an",
  article: 'Mother-of-two Anna Tilley survived after spending four days...'
}

The articles collection is used to make batch predictions. Upon joining the text_summarization model with the articles collection, the model uses all values from the article field.

<Tip> Check out [this blog post on time series forecasting with Nixtla and MindsDB using MongoDB-QL](https://mindsdb.com/blog/time-series-forecasting-with-nixtla-and-mindsdb-using-mongodb-query-language). </Tip>