Back to Mindsdb

Extract JSON from Text

docs/use-cases/data_enrichment/json-from-text.mdx

26.1.03.8 KB
Original Source

Extract JSON Values from JSON Data

The json_extract() function extracts values from the JSON data passed as its argument.

To show how it works, we use the home rentals example. The model returns the predicted value and the explanation of the prediction in the form of JSON data.

sql
SELECT rental_price, rental_price_explain
FROM mindsdb.home_rentals_model
WHERE sqft = 823
AND location='good'
AND neighborhood='downtown'
AND days_on_market=10;

On execution, we get:

sql
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| rental_price | rental_price_explain                                                                                                                          |
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| 1580         | {"predicted_value": 1580, "confidence": 0.99, "anomaly": null, "truth": null, "confidence_lower_bound": 1490, "confidence_upper_bound": 1670} |
+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------+

Now, if we want to see the confidence value only, we can use the json_extract() function as below.

sql
SELECT rental_price, json_extract(rental_price_explain, '$.confidence') AS confidence
FROM mindsdb.home_rentals_model
WHERE sqft = 823
AND location='good'
AND neighborhood='downtown'
AND days_on_market=10;

On execution, we get:

sql
+--------------+------------+
| rental_price | confidence |
+--------------+------------+
| 1580         | 0.99       |
+--------------+------------+

Extract JSON from Text Data

In this example, we use the OpenAI model to extract data in a predefined JSON format from the input text data.

<Info> **Default Model**

When you create an OpenAI model in MindsDB, it uses the gpt-3.5-turbo model by default. But you can use the gpt-4 model as well by passing it to the model-name parameter. </Info>

Let's create an OpenAI model.

<Tabs> <Tab title="SQL"> Before creating an OpenAI model, please create an engine, providing your OpenAI API key:
```sql
CREATE ML_ENGINE openai_engine
FROM openai
USING
    openai_api_key = 'your-openai-api-key';
```
```sql
CREATE MODEL mindsdb.nlp_model
PREDICT json
USING
    engine = 'openai_engine',
    json_struct = {
        'rental_price': 'rental price',
        'location': 'location',
        'nob': 'number of bathrooms'
    },
    prompt_template = '{{sentence}}';
```
</Tab>
</Tabs>

We pass three parameters as follows:

  1. The engine parameter ensures we use the OpenAI engine.
  2. The json_struct parameter stores a predefined JSON structure used for the output.
  3. The prompt_template parameter contains the instruction passed to the model that may include variables such as {{sentence}}.

Now we can query the model, passing the input text stored in the sentence column.

<Tabs> <Tab title="SQL"> ```sql SELECT json FROM mindsdb.nlp_model WHERE sentence = 'Amazing 3 bedroom apartment located at the heart of Manhattan, has one full bathrooms and one toilet room for just 3000 a month.'; ```
On execution, we get:

```sql
+----------------------------------------------------------+
| json                                                     |
+----------------------------------------------------------+
| {"location":"Manhattan","nob":"1","rental_price":"3000"} |
+----------------------------------------------------------+
```
</Tab>
</Tabs>