Retrieval#

Search your data using semantic similarity.

The Retrieval API allows you to perform semantic search over your data, which is a technique that surfaces semantically similar results — even when they match few or no keywords. Retrieval is useful on its own, but is especially powerful when combined with our models to synthesize responses.

Retrieval depiction

The Retrieval API is powered by vector stores, which serve as indices for your data. This guide will cover how to perform semantic search, and go into the details of vector stores.

Quickstart#

  1. Create vector store and upload files.

from openai import OpenAI
client = OpenAI()

vector_store = client.vector_stores.create(        # Create vector store
    name="Support FAQ",
)

client.vector_stores.files.upload_and_poll(        # Upload file
    vector_store_id=vector_store.id,
    file=open("customer_policies.txt", "rb")
)
  1. Send search query to get relevant results.

user_query = "What is the return policy?"

results = client.vector_stores.search(
    vector_store_id=vector_store.id,
    query=user_query,
)

To learn how to use the results with our models, check out the synthesizing responses section.

Vector stores#

Vector stores are the containers that power semantic search for the Retrieval API and the Assistants API file search tool. When you add a file to a vector store it will be automatically chunked, embedded, and indexed.

Vector stores contain vector_store_file objects, which are backed by a file object.

Object type

Description

file

Represents content uploaded through the Files API. Often used with vector stores, but also for fine-tuning and other use cases.

vector_store

Container for searchable files.

vector_store.file

Wrapper type specifically representing a file that has been chunked and embedded, and has been associated with a vector_store.Contains attributes map used for filtering.

Pricing#

You will be charged based on the total storage used across all your vector stores, determined by the size of parsed chunks and their corresponding embeddings.

Storage

Cost

Up to 1 GB (across all stores)

Free

Beyond 1 GB

$0.10/GB/day

See expiration policies for options to minimize costs.

Vector store operations#

Create vector store

client.vector_stores.create(
    name="Support FAQ",
    file_ids=["file_123"]
)

Retrieve vector store

client.vector_stores.retrieve(
    vector_store_id="vs_123"
)

Update vector store

client.vector_stores.update(
    vector_store_id="vs_123",
    name="Support FAQ Updated"
)

Delete vector store

client.vector_stores.delete(
    vector_store_id="vs_123"
)

List vector stores

client.vector_stores.list()

Vector store file operations#

Some operations, like create for vector_store.file, are asynchronous and may take time to complete — use our helper functions, like create_and_poll to block until it is. Otherwise, you may check the status.

Create vector store file

client.vector_stores.files.create_and_poll(
    vector_store_id="vs_123",
    file_id="file_123"
)

Upload vector store file

client.vector_stores.files.upload_and_poll(
    vector_store_id="vs_123",
    file=open("customer_policies.txt", "rb")
)

Retrieve vector store file

client.vector_stores.files.retrieve(
    vector_store_id="vs_123",
    file_id="file_123"
)

Update vector store file

client.vector_stores.files.update(
    vector_store_id="vs_123",
    file_id="file_123",
    attributes={"key": "value"}
)

Delete vector store file

client.vector_stores.files.delete(
    vector_store_id="vs_123",
    file_id="file_123"
)

List vector store files

client.vector_stores.files.list(
    vector_store_id="vs_123"
)

Batch operations#

Batch create operation

client.vector_stores.file_batches.create_and_poll(
    vector_store_id="vs_123",
    file_ids=["file_123", "file_456"]
)

Batch retrieve operation

client.vector_stores.file_batches.retrieve(
    vector_store_id="vs_123",
    batch_id="vsfb_123"
)

Batch cancel operation

client.vector_stores.file_batches.cancel(
    vector_store_id="vs_123",
    batch_id="vsfb_123"
)

Batch list operation

client.vector_stores.file_batches.list(
    vector_store_id="vs_123"
)

Attributes#

Each vector_store.file can have associated attributes, a dictionary of values that can be referenced when performing semantic search with attribute filtering. The dictionary can have at most 16 keys, with a limit of 256 characters each.

Create vector store file with attributes

client.vector_stores.files.create(
    vector_store_id="<vector_store_id>",
    file_id="file_123",
    attributes={
        "region": "US",
        "category": "Marketing",
        "date": 1672531200      # Jan 1, 2023
    }
)

Expiration policies#

You can set an expiration policy on vector_store objects with expires_after. Once a vector store expires, all associated vector_store.file objects will be deleted and you’ll no longer be charged for them.

Set expiration policy for vector store

client.vector_stores.update(
    vector_store_id="vs_123",
    expires_after={
        "anchor": "last_active_at",
        "days": 7
    }
)

Limits#

The maximum file size is 512 MB. Each file should contain no more than 5,000,000 tokens per file (computed automatically when you attach a file).

Chunking#

By default, max_chunk_size_tokens is set to 800 and chunk_overlap_tokens is set to 400, meaning every file is indexed by being split up into 800-token chunks, with 400-token overlap between consecutive chunks.

You can adjust this by setting chunking_strategy when adding files to the vector store. There are certain limitations to chunking_strategy:

  • max_chunk_size_tokens must be between 100 and 4096 inclusive.

  • chunk_overlap_tokens must be non-negative and should not exceed max_chunk_size_tokens / 2.

Supported file types

For text/ MIME types, the encoding must be one of utf-8, utf-16, or ascii.

File format

MIME type

.c

text/x-c

.cpp

text/x-c++

.cs

text/x-csharp

.css

text/css

.doc

application/msword

.docx

application/vnd.openxmlformats-officedocument.wordprocessingml.document

.go

text/x-golang

.html

text/html

.java

text/x-java

.js

text/javascript

.json

application/json

.md

text/markdown

.pdf

application/pdf

.php

text/x-php

.pptx

application/vnd.openxmlformats-officedocument.presentationml.presentation

.py

text/x-python

.py

text/x-script.python

.rb

text/x-ruby

.sh

application/x-sh

.tex

text/x-tex

.ts

application/typescript

.txt

text/plain

Synthesizing responses#

After performing a query you may want to synthesize a response based on the results. You can leverage our models to do so, by supplying the results and original query, to get back a grounded response.

Perform search query to get results

from openai import OpenAI

client = OpenAI()

user_query = "What is the return policy?"

results = client.vector_stores.search(
    vector_store_id=vector_store.id,
    query=user_query,
)

Synthesize a response based on results

formatted_results = format_results(results.data)

'\n'.join('\n'.join(c.text) for c in result.content for result in results.data)

completion = client.chat.completions.create(
    model="gpt-4.1",
    messages=[
        {
            "role": "developer",
            "content": "Produce a concise answer to the query based on the provided sources."
        },
        {
            "role": "user",
            "content": f"Sources: {formatted_results}\n\nQuery: '{user_query}'"
        }
    ],
)

print(completion.choices[0].message.content)

Output:

"Our return policy allows returns within 30 days of purchase."

This uses a sample format_results function, which could be implemented like so:

Sample result formatting function

def format_results(results):
    formatted_results = ''
    for result in results.data:
        formatted_result = f"<result file_id='{result.file_id}' file_name='{result.file_name}'>"
        for part in result.content:
            formatted_result += f"<content>{part.text}</content>"
        formatted_results += formatted_result + "</result>"
    return f"<sources>{formatted_results}</sources>"