AI_EXTRACT

The AI_EXTRACT function leverages large language models (LLMs) to intelligently extract structured information from unstructured text. Unlike traditional regex or rule-based approaches, this semantic extraction method understands context and intent, enabling it to identify relevant information even when phrasing varies or data appears in non-standard formats.

The function operates by sending a user-specified extraction request (e.g., “client names”, “financial metrics”, “action items”) along with source text to an LLM via the OpenAI-compatible Chat Completions API. By default, it uses Mistral AI’s mistral-small-latest model, though any OpenAI-compatible endpoint can be configured through the api_url and api_key parameters.

The function utilizes JSON mode (response_format: {"type": "json_object"}) to ensure consistent, parseable outputs. The LLM returns extracted items as a JSON array, which the function converts into a 2D list where each row contains one extracted item—a format optimized for Excel’s grid structure.

The temperature parameter (default 0, range 0.0–2.0) controls response randomness. A value of 0 produces deterministic, reproducible results ideal for consistent extraction tasks. Higher values introduce variability, which may be useful for creative interpretation but is generally not recommended for data extraction. The max_tokens parameter (default 1000, range 5–5000) limits response length to control costs and prevent excessive output.

Common use cases include extracting named entities (people, companies, locations) from business documents, pulling financial figures from reports, identifying action items from meeting notes, extracting contact information from correspondence, and harvesting dates and deadlines from project documentation.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=AI_EXTRACT(text, extract_type, api_key, temperature, model, max_tokens, api_url)
  • text (str, required): The unstructured text to extract information from
  • extract_type (str, required): Type of information to extract (e.g., “names”, “dates”, “emails”)
  • api_key (str, required): API key for authentication.
  • temperature (float, optional, default: 0): Controls randomness in AI response (0.0 = deterministic, 2.0 = highly random)
  • model (str, optional, default: “codestral-2508”): Model ID to use. Default is “codestral-2508”.
  • max_tokens (int, optional, default: 1000): Maximum tokens in the AI response (5 to 5000)
  • api_url (str, optional, default: “https://api.mistral.ai/v1/chat/completions”): OpenAI-compatible API endpoint URL. Default is “https://api.mistral.ai/v1/chat/completions”.

Returns (list[list]): 2D list of extracted items, or error message string.

Example 1: Demo case 1

Inputs:

text extract_type temperature model max_tokens
During today’s annual review, we discussed progress with Acme Corporation, Global Enterprises, and TechSolutions Inc. All three clients reported satisfaction with our services. client names 0 codestral-2508 1000

Excel formula:

=AI_EXTRACT("During today's annual review, we discussed progress with Acme Corporation, Global Enterprises, and TechSolutions Inc. All three clients reported satisfaction with our services.", "client names", 0, "codestral-2508", 1000)

Expected output:

Acme Corporation
Global Enterprises
TechSolutions Inc
Example 2: Demo case 2

Inputs:

text extract_type temperature model max_tokens
Q1 results exceeded expectations with revenue of $2.4M, an EBITDA margin of 18.5%, and customer acquisition costs decreasing by 12%. Cash reserves stand at $5.2M and our runway extends to 24 months. financial metrics 0 codestral-2508 1000

Excel formula:

=AI_EXTRACT("Q1 results exceeded expectations with revenue of $2.4M, an EBITDA margin of 18.5%, and customer acquisition costs decreasing by 12%. Cash reserves stand at $5.2M and our runway extends to 24 months.", "financial metrics", 0, "codestral-2508", 1000)

Expected output:

$2.4M
18.5%
12%
$5.2M
24 months
Example 3: Demo case 3

Inputs:

text extract_type temperature model max_tokens
Hi team, Following our strategic planning session: 1) Mark needs to finalize the budget by Friday, 2) Sarah will contact vendors for new quotes, 3) Development team must provide timeline estimates by next Wednesday, and 4) Everyone should review the new marketing materials. action items 0 codestral-2508 1000

Excel formula:

=AI_EXTRACT("Hi team, Following our strategic planning session: 1) Mark needs to finalize the budget by Friday, 2) Sarah will contact vendors for new quotes, 3) Development team must provide timeline estimates by next Wednesday, and 4) Everyone should review the new marketing materials.", "action items", 0, "codestral-2508", 1000)

Expected output:

Mark needs to finalize the budget by Friday
Sarah will contact vendors for new quotes
Development team must provide timeline estimates by next Wednesday
Everyone should review the new marketing materials
Example 4: Demo case 4

Inputs:

text extract_type temperature model max_tokens
John Smith, Senior Project Manager, Innovative Solutions Inc., jsmith@innovativesolutions.com, +1 (555) 123-4567, 123 Business Avenue, Suite 400, San Francisco, CA 94107 contact information 0 codestral-2508 1000

Excel formula:

=AI_EXTRACT("John Smith, Senior Project Manager, Innovative Solutions Inc., jsmith@innovativesolutions.com, +1 (555) 123-4567, 123 Business Avenue, Suite 400, San Francisco, CA 94107", "contact information", 0, "codestral-2508", 1000)

Expected output:

jsmith@innovativesolutions.com
+1 (555) 123-4567
123 Business Avenue, Suite 400, San Francisco, CA 94107

Python Code

Show Code
import requests
import json

def ai_extract(text, extract_type, api_key, temperature=0, model='codestral-2508', max_tokens=1000, api_url='https://api.mistral.ai/v1/chat/completions'):
    """
    Uses an AI model to extract specific types of information from unstructured text.

    This example function is provided as-is without any representation of accuracy.

    Args:
        text (str): The unstructured text to extract information from
        extract_type (str): Type of information to extract (e.g., "names", "dates", "emails")
        api_key (str): API key for authentication.
        temperature (float, optional): Controls randomness in AI response (0.0 = deterministic, 2.0 = highly random) Default is 0.
        model (str, optional): Model ID to use. Default is "codestral-2508". Default is 'codestral-2508'.
        max_tokens (int, optional): Maximum tokens in the AI response (5 to 5000) Default is 1000.
        api_url (str, optional): OpenAI-compatible API endpoint URL. Default is "https://api.mistral.ai/v1/chat/completions". Default is 'https://api.mistral.ai/v1/chat/completions'.

    Returns:
        list[list]: 2D list of extracted items, or error message string.
    """
    if not api_key:
        return "You must include an API key to use this function. Sign up for a free API key at https://aistudio.google.com/, https://console.mistral.ai/, or other providers and add your own api_key.  You may use any OpenAI compatible API, just update the api_url parameter."

    # Handle 2D list input (flatten to a single string)
    if isinstance(text, list):
        if len(text) > 0 and len(text[0]) > 0:
            text = str(text[0][0])
        else:
            return "Error: Empty input text"

    # Validate temperature
    if not (isinstance(temperature, (int, float)) and 0.0 <= float(temperature) <= 2.0):
        return "Error: temperature must be a float between 0.0 and 2.0."

    # Validate max_tokens
    if not (isinstance(max_tokens, int) and max_tokens > 0):
        return "Error: max_tokens must be a positive integer."

    # Construct a specific prompt for data extraction
    extract_prompt = f"Extract the following from the text: {extract_type}\n\nText: {text}"
    extract_prompt += "\n\nReturn ONLY a JSON object with a key 'items' whose value is a JSON array of the items you extracted. "
    extract_prompt += "Each item should be a single value representing one extracted piece of information. "
    extract_prompt += "Do not include any explanatory text, just the JSON object. "
    extract_prompt += 'For example: {"items": ["item1", "item2", "item3"]}'

    payload = {
        "messages": [{"role": "user", "content": extract_prompt}],
        "temperature": temperature,
        "model": model,
        "max_tokens": max_tokens,
        "response_format": {"type": "json_object"}
    }

    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
        "Accept": "application/json"
    }

    try:
        response = requests.post(api_url, headers=headers, json=payload)
        response.raise_for_status()
        response_data = response.json()
        content = response_data["choices"][0]["message"]["content"]
        try:
            extracted_data = json.loads(content)
            if isinstance(extracted_data, dict) and "items" in extracted_data:
                extracted_data = extracted_data["items"]
            elif isinstance(extracted_data, dict):
                if "extracted" in extracted_data:
                    extracted_data = extracted_data["extracted"]
                elif "results" in extracted_data:
                    extracted_data = extracted_data["results"]
            if isinstance(extracted_data, list):
                return [[item] for item in extracted_data]
            else:
                return "Error: Unable to parse response. Expected a list."
        except (json.JSONDecodeError, ValueError):
            return "Error: Unable to extract data. The AI response wasn't in the expected format."
    except requests.exceptions.RequestException as e:
        return f"Error: API request failed. {str(e)}"

Online Calculator

The unstructured text to extract information from
Type of information to extract (e.g., "names", "dates", "emails")
API key for authentication.
Controls randomness in AI response (0.0 = deterministic, 2.0 = highly random)
Model ID to use. Default is "codestral-2508".
Maximum tokens in the AI response (5 to 5000)
OpenAI-compatible API endpoint URL. Default is "https://api.mistral.ai/v1/chat/completions".