Skip to content

OpenAIEmbeddingBatch

langbatch.openai.OpenAIEmbeddingBatch

Bases: OpenAIBatch, EmbeddingBatch

OpenAIEmbeddingBatch is a class for OpenAI embedding batches. Can be used for batch processing with text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002 models

Usage:

batch = OpenAIEmbeddingBatch("path/to/file.jsonl")
batch.start()

Source code in langbatch\openai.py
class OpenAIEmbeddingBatch(OpenAIBatch, EmbeddingBatch):
    """
    OpenAIEmbeddingBatch is a class for OpenAI embedding batches.
    Can be used for batch processing with text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002 models

    Usage:
    ```python
    batch = OpenAIEmbeddingBatch("path/to/file.jsonl")
    batch.start()
    ```
    """
    _url: str = "/v1/embeddings"

    def _validate_request(self, request):
        OpenAIEmbeddingRequest(**request)

platform_batch_id class-attribute instance-attribute

platform_batch_id: str | None = None

id instance-attribute

id = str(uuid4())

__init__

__init__(file: str, client: Optional[OpenAI | AzureOpenAI] = None) -> None

Initialize the OpenAIBatch class.

Parameters:

  • file (str) –

    The path to the jsonl file in OpenAI batchformat.

  • client (OpenAI, default: None ) –

    The OpenAI client to use. Defaults to OpenAI().

Usage:

batch = ChatCompletionBatch("path/to/file.jsonl")

# With custom OpenAI client
client = OpenAI(
    api_key="sk-proj-...",
    base_url="https://api.provider.com/v1"
)
batch = OpenAIBatch("path/to/file.jsonl", client = client)

Source code in langbatch\openai.py
def __init__(self, file: str, client: Optional[OpenAI | AzureOpenAI] = None) -> None:
    """
    Initialize the OpenAIBatch class.

    Args:
        file (str): The path to the jsonl file in OpenAI batchformat.
        client (OpenAI, optional): The OpenAI client to use. Defaults to OpenAI().

    Usage:
    ```python
    batch = ChatCompletionBatch("path/to/file.jsonl")

    # With custom OpenAI client
    client = OpenAI(
        api_key="sk-proj-...",
        base_url="https://api.provider.com/v1"
    )
    batch = OpenAIBatch("path/to/file.jsonl", client = client)
    ```
    """
    super().__init__(file)
    self._client = client or OpenAI()

create_from_requests classmethod

create_from_requests(requests, batch_kwargs: Dict = {})

Creates a batch when given a list of requests. These requests should be in correct Batch API request format as per the Batch type. Ex. for OpenAIChatCompletionBatch, requests should be a Chat Completion request with custom_id.

Parameters:

  • requests –

    A list of requests.

  • batch_kwargs (Dict, default: {} ) –

    Additional keyword arguments for the batch class. Ex. gcp_project, etc. for VertexAIChatCompletionBatch.

Returns:

  • –

    An instance of the Batch class.

Raises:

  • BatchInitializationError –

    If the input data is invalid.

Usage:

batch = OpenAIChatCompletionBatch.create_from_requests([
    {   "custom_id": "request-1",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4o-mini",
            "messages": [{"role": "user", "content": "Biryani Receipe, pls."}],
            "max_tokens": 1000
        }
    },
    {
        "custom_id": "request-2",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "gpt-4o-mini",
            "messages": [{"role": "user", "content": "Write a short story about AI"}],
            "max_tokens": 1000
        }
    }
]

Source code in langbatch\Batch.py
@classmethod
def create_from_requests(cls, requests, batch_kwargs: Dict = {}):
    """
    Creates a batch when given a list of requests. 
    These requests should be in correct Batch API request format as per the Batch type.
    Ex. for OpenAIChatCompletionBatch, requests should be a Chat Completion request with custom_id.

    Args:
        requests: A list of requests.
        batch_kwargs (Dict, optional): Additional keyword arguments for the batch class. Ex. gcp_project, etc. for VertexAIChatCompletionBatch.

    Returns:
        An instance of the Batch class.

    Raises:
        BatchInitializationError: If the input data is invalid.

    Usage:
    ```python
    batch = OpenAIChatCompletionBatch.create_from_requests([
        {   "custom_id": "request-1",
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": "gpt-4o-mini",
                "messages": [{"role": "user", "content": "Biryani Receipe, pls."}],
                "max_tokens": 1000
            }
        },
        {
            "custom_id": "request-2",
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": "gpt-4o-mini",
                "messages": [{"role": "user", "content": "Write a short story about AI"}],
                "max_tokens": 1000
            }
        }
    ]
    ``` 
    """

    file_path = cls._create_batch_file_from_requests(requests)

    if file_path is None:
        raise BatchInitializationError("Failed to create batch. Check the input data.")

    return cls(file_path, **batch_kwargs)

load classmethod

load(id: str, storage: BatchStorage = FileBatchStorage(), batch_kwargs: Dict = {})

Load a batch from the storage and return a Batch object.

Parameters:

  • id (str) –

    The id of the batch.

  • storage (BatchStorage, default: FileBatchStorage() ) –

    The storage to load the batch from. Defaults to FileBatchStorage().

  • batch_kwargs (Dict, default: {} ) –

    Additional keyword arguments for the batch class. Ex. gcp_project, etc. for VertexAIChatCompletionBatch.

Returns:

  • Batch –

    The batch object.

Usage:

batch = OpenAIChatCompletionBatch.load("123", storage=FileBatchStorage("./data"))

Source code in langbatch\Batch.py
@classmethod
def load(cls, id: str, storage: BatchStorage = FileBatchStorage(), batch_kwargs: Dict = {}):
    """
    Load a batch from the storage and return a Batch object.

    Args:
        id (str): The id of the batch.
        storage (BatchStorage, optional): The storage to load the batch from. Defaults to FileBatchStorage().
        batch_kwargs (Dict, optional): Additional keyword arguments for the batch class. Ex. gcp_project, etc. for VertexAIChatCompletionBatch.

    Returns:
        Batch: The batch object.

    Usage:
    ```python
    batch = OpenAIChatCompletionBatch.load("123", storage=FileBatchStorage("./data"))
    ```
    """
    data_file, meta_file = storage.load(id)

    # Load metadata based on file extension
    if meta_file.suffix == '.json':
        with open(meta_file, 'r') as f:
            meta_data = json.load(f)
    else:  # .pkl
        with open(meta_file, 'rb') as f:
            meta_data = pickle.load(f)

    init_args = cls._get_init_args(meta_data)

    for key, value in batch_kwargs.items():
        if key not in init_args:
            init_args[key] = value

    batch = cls(str(data_file), **init_args)
    batch.platform_batch_id = meta_data['platform_batch_id']
    batch.id = id

    return batch

save

save(storage: BatchStorage = FileBatchStorage())

Save the batch to the storage.

Parameters:

Usage:

batch = OpenAIChatCompletionBatch(file)
batch.save()

# save the batch to file storage
batch.save(storage=FileBatchStorage("./data"))

Source code in langbatch\Batch.py
def save(self, storage: BatchStorage = FileBatchStorage()):
    """
    Save the batch to the storage.

    Args:
        storage (BatchStorage, optional): The storage to save the batch to. Defaults to FileBatchStorage().

    Usage:
    ```python
    batch = OpenAIChatCompletionBatch(file)
    batch.save()

    # save the batch to file storage
    batch.save(storage=FileBatchStorage("./data"))
    ```
    """
    meta_data = self._create_meta_data()
    meta_data["platform_batch_id"] = self.platform_batch_id

    storage.save(self.id, Path(self._file), meta_data)

start

start()
Source code in langbatch\openai.py
def start(self):
    if self.platform_batch_id is not None:
        raise BatchStateError("Batch already started")

    batch_input_file_id = self._upload_batch_file()
    self._create_batch(batch_input_file_id)

get_status

get_status()
Source code in langbatch\openai.py
def get_status(self):
    if self.platform_batch_id is None:
        raise BatchStateError("Batch not started")

    batch = self._client.batches.retrieve(self.platform_batch_id)
    return batch.status

get_results_file

get_results_file()

Usage:

import jsonlines

# create a batch and start batch process
batch = OpenAIChatCompletionBatch(file)
batch.start()

if batch.get_status() == "completed":
    # get the results file
    results_file = batch.get_results_file()

    with jsonlines.open(results_file) as reader:
        for obj in reader:
            print(obj)

Source code in langbatch\Batch.py
def get_results_file(self):
    """
    Usage:
    ```python
    import jsonlines

    # create a batch and start batch process
    batch = OpenAIChatCompletionBatch(file)
    batch.start()

    if batch.get_status() == "completed":
        # get the results file
        results_file = batch.get_results_file()

        with jsonlines.open(results_file) as reader:
            for obj in reader:
                print(obj)
    ```
    """
    file_path = self._download_results_file()
    return file_path

get_results

get_results() -> Tuple[List[Dict[str, Any]], List[Dict[str, Any]]] | Tuple[None, None]

Retrieve the results of the embedding batch.

Returns:

  • Tuple[List[Dict[str, Any]], List[Dict[str, Any]]] | Tuple[None, None] –

    A tuple containing successful and unsuccessful results. Successful results: A list of dictionaries with "embedding" and "custom_id" keys. Unsuccessful results: A list of dictionaries with "error" and "custom_id" keys.

Usage:

successful_results, unsuccessful_results = batch.get_results()
for result in successful_results:
    print(result["embedding"])

Source code in langbatch\EmbeddingBatch.py
def get_results(self) -> Tuple[List[Dict[str, Any]], List[Dict[str, Any]]] | Tuple[None, None]:
    """
    Retrieve the results of the embedding batch.

    Returns:
        A tuple containing successful and unsuccessful results. Successful results: A list of dictionaries with "embedding" and "custom_id" keys. Unsuccessful results: A list of dictionaries with "error" and "custom_id" keys.

    Usage:
    ```python
    successful_results, unsuccessful_results = batch.get_results()
    for result in successful_results:
        print(result["embedding"])
    ```
    """
    process_func = lambda result: {"embedding": result['response']['body']['data'][0]['embedding']}
    return self._prepare_results(process_func)

is_retryable_failure

is_retryable_failure() -> bool
Source code in langbatch\openai.py
def is_retryable_failure(self) -> bool:
    errors = self._get_errors()
    if errors:
        error = errors.data[0]['code']

        if error == "token_limit_exceeded":
            return True
        else:
            return False
    else:
        return False

retry

retry()
Source code in langbatch\openai.py
def retry(self):
    if self.platform_batch_id is None:
        raise BatchStateError("Batch not started")

    batch = self._client.batches.retrieve(self.platform_batch_id)

    self._create_batch(batch.input_file_id)

get_unsuccessful_requests

get_unsuccessful_requests() -> List[Dict[str, Any]]

Retrieve the unsuccessful requests from the batch.

Returns:

  • List[Dict[str, Any]] –

    A list of requests that failed.

Usage:

batch = OpenAIChatCompletionBatch(file)
batch.start()

if batch.get_status() == "completed":
    # get the unsuccessful requests
    unsuccessful_requests = batch.get_unsuccessful_requests()

    for request in unsuccessful_requests:
        print(request["custom_id"])

Source code in langbatch\Batch.py
def get_unsuccessful_requests(self) -> List[Dict[str, Any]]:
    """
    Retrieve the unsuccessful requests from the batch.

    Returns:
        A list of requests that failed.

    Usage:
    ```python
    batch = OpenAIChatCompletionBatch(file)
    batch.start()

    if batch.get_status() == "completed":
        # get the unsuccessful requests
        unsuccessful_requests = batch.get_unsuccessful_requests()

        for request in unsuccessful_requests:
            print(request["custom_id"])
    ```
    """
    custom_ids = []
    _, unsuccessful_results = self.get_results()
    for result in unsuccessful_results:
        custom_ids.append(result["custom_id"])

    return self.get_requests_by_custom_ids(custom_ids)

get_requests_by_custom_ids

get_requests_by_custom_ids(custom_ids: List[str]) -> List[Dict[str, Any]]

Retrieve the requests from the batch file by custom ids.

Parameters:

  • custom_ids (List[str]) –

    A list of custom ids.

Returns:

  • List[Dict[str, Any]] –

    A list of requests.

Usage:

batch = OpenAIChatCompletionBatch(file)
batch.start()

if batch.get_status() == "completed":
    # get the requests by custom ids
    requests = batch.get_requests_by_custom_ids(["custom_id1", "custom_id2"])

    for request in requests:
        print(request["custom_id"])

Source code in langbatch\Batch.py
def get_requests_by_custom_ids(self, custom_ids: List[str]) -> List[Dict[str, Any]]:
    """
    Retrieve the requests from the batch file by custom ids.

    Args:
        custom_ids (List[str]): A list of custom ids.

    Returns:
        A list of requests.

    Usage:
    ```python
    batch = OpenAIChatCompletionBatch(file)
    batch.start()

    if batch.get_status() == "completed":
        # get the requests by custom ids
        requests = batch.get_requests_by_custom_ids(["custom_id1", "custom_id2"])

        for request in requests:
            print(request["custom_id"])
    ```
    """
    requests = []
    with jsonlines.open(self._file) as reader:
        for request in reader:
            if request["custom_id"] in custom_ids:
                requests.append(request)
    return requests

create classmethod

create(data: List[str], request_kwargs: Dict = {}, batch_kwargs: Dict = {}) -> EmbeddingBatch

Create an embedding batch when given a list of texts.

Parameters:

  • data (List[str]) –

    A list of texts to be embedded.

  • request_kwargs (Dict, default: {} ) –

    Additional keyword arguments for the API call. Ex. model, encoding_format, etc.

  • batch_kwargs (Dict, default: {} ) –

    Additional keyword arguments for the batch class.

Returns:

Raises:

  • BatchInitializationError –

    If the input data is invalid.

Usage:

batch = OpenAIEmbeddingBatch.create([
    "Hello world", 
    "Hello LangBatch"
], 
    request_kwargs={"model": "text-embedding-3-small"})

Source code in langbatch\EmbeddingBatch.py
@classmethod
def create(cls, data: List[str], request_kwargs: Dict = {}, batch_kwargs: Dict = {}) -> "EmbeddingBatch":
    """
    Create an embedding batch when given a list of texts.

    Args:
        data (List[str]): A list of texts to be embedded.
        request_kwargs (Dict): Additional keyword arguments for the API call. Ex. model, encoding_format, etc.
        batch_kwargs (Dict): Additional keyword arguments for the batch class.

    Returns:
        An instance of the EmbeddingBatch class.

    Raises:
        BatchInitializationError: If the input data is invalid.

    Usage:
    ```python
    batch = OpenAIEmbeddingBatch.create([
        "Hello world", 
        "Hello LangBatch"
    ], 
        request_kwargs={"model": "text-embedding-3-small"})
    ```
    """
    return cls._create_batch_file("input", data, request_kwargs, batch_kwargs)

cancel

cancel()

Usage:

# create a batch and start batch process
batch = OpenAIChatCompletionBatch(file)
batch.start()

# cancel the batch process
batch.cancel()

Source code in langbatch\openai.py
def cancel(self):
    """
    Usage:
    ```python
    # create a batch and start batch process
    batch = OpenAIChatCompletionBatch(file)
    batch.start()

    # cancel the batch process
    batch.cancel()
    ```
    """
    if self.platform_batch_id is None:
        raise BatchStateError("Batch not started")

    batch = self._client.batches.cancel(self.platform_batch_id)
    if batch.status == "cancelling" or batch.status == "cancelled":
        return True
    else:
        return False