OpenAIEmbeddingBatch
langbatch.openai.OpenAIEmbeddingBatch
Bases: OpenAIBatch
, EmbeddingBatch
OpenAIEmbeddingBatch is a class for OpenAI embedding batches. Can be used for batch processing with text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002 models
Usage:
Source code in langbatch\openai.py
__init__
Initialize the OpenAIBatch class.
Parameters:
-
file
(str
) –The path to the jsonl file in OpenAI batchformat.
-
client
(OpenAI
, default:None
) –The OpenAI client to use. Defaults to OpenAI().
Usage:
batch = ChatCompletionBatch("path/to/file.jsonl")
# With custom OpenAI client
client = OpenAI(
api_key="sk-proj-...",
base_url="https://api.provider.com/v1"
)
batch = OpenAIBatch("path/to/file.jsonl", client = client)
Source code in langbatch\openai.py
create_from_requests
classmethod
Creates a batch when given a list of requests. These requests should be in correct Batch API request format as per the Batch type. Ex. for OpenAIChatCompletionBatch, requests should be a Chat Completion request with custom_id.
Parameters:
-
requests
–A list of requests.
-
batch_kwargs
(Dict
, default:{}
) –Additional keyword arguments for the batch class. Ex. gcp_project, etc. for VertexAIChatCompletionBatch.
Returns:
-
–
An instance of the Batch class.
Raises:
-
BatchInitializationError
–If the input data is invalid.
Usage:
batch = OpenAIChatCompletionBatch.create_from_requests([
{ "custom_id": "request-1",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Biryani Receipe, pls."}],
"max_tokens": 1000
}
},
{
"custom_id": "request-2",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-4o-mini",
"messages": [{"role": "user", "content": "Write a short story about AI"}],
"max_tokens": 1000
}
}
]
Source code in langbatch\Batch.py
load
classmethod
load(id: str, storage: BatchStorage = FileBatchStorage(), batch_kwargs: Dict = {})
Load a batch from the storage and return a Batch object.
Parameters:
-
id
(str
) –The id of the batch.
-
storage
(BatchStorage
, default:FileBatchStorage()
) –The storage to load the batch from. Defaults to FileBatchStorage().
-
batch_kwargs
(Dict
, default:{}
) –Additional keyword arguments for the batch class. Ex. gcp_project, etc. for VertexAIChatCompletionBatch.
Returns:
-
Batch
–The batch object.
Usage:
Source code in langbatch\Batch.py
save
save(storage: BatchStorage = FileBatchStorage())
Save the batch to the storage.
Parameters:
-
storage
(BatchStorage
, default:FileBatchStorage()
) –The storage to save the batch to. Defaults to FileBatchStorage().
Usage:
batch = OpenAIChatCompletionBatch(file)
batch.save()
# save the batch to file storage
batch.save(storage=FileBatchStorage("./data"))
Source code in langbatch\Batch.py
start
get_status
get_results_file
Usage:
import jsonlines
# create a batch and start batch process
batch = OpenAIChatCompletionBatch(file)
batch.start()
if batch.get_status() == "completed":
# get the results file
results_file = batch.get_results_file()
with jsonlines.open(results_file) as reader:
for obj in reader:
print(obj)
Source code in langbatch\Batch.py
get_results
Retrieve the results of the embedding batch.
Returns:
-
Tuple[List[Dict[str, Any]], List[Dict[str, Any]]] | Tuple[None, None]
–A tuple containing successful and unsuccessful results. Successful results: A list of dictionaries with "embedding" and "custom_id" keys. Unsuccessful results: A list of dictionaries with "error" and "custom_id" keys.
Usage:
successful_results, unsuccessful_results = batch.get_results()
for result in successful_results:
print(result["embedding"])
Source code in langbatch\EmbeddingBatch.py
is_retryable_failure
retry
get_unsuccessful_requests
Retrieve the unsuccessful requests from the batch.
Returns:
-
List[Dict[str, Any]]
–A list of requests that failed.
Usage:
batch = OpenAIChatCompletionBatch(file)
batch.start()
if batch.get_status() == "completed":
# get the unsuccessful requests
unsuccessful_requests = batch.get_unsuccessful_requests()
for request in unsuccessful_requests:
print(request["custom_id"])
Source code in langbatch\Batch.py
get_requests_by_custom_ids
Retrieve the requests from the batch file by custom ids.
Parameters:
-
custom_ids
(List[str]
) –A list of custom ids.
Returns:
-
List[Dict[str, Any]]
–A list of requests.
Usage:
batch = OpenAIChatCompletionBatch(file)
batch.start()
if batch.get_status() == "completed":
# get the requests by custom ids
requests = batch.get_requests_by_custom_ids(["custom_id1", "custom_id2"])
for request in requests:
print(request["custom_id"])
Source code in langbatch\Batch.py
create
classmethod
create(data: List[str], request_kwargs: Dict = {}, batch_kwargs: Dict = {}) -> EmbeddingBatch
Create an embedding batch when given a list of texts.
Parameters:
-
data
(List[str]
) –A list of texts to be embedded.
-
request_kwargs
(Dict
, default:{}
) –Additional keyword arguments for the API call. Ex. model, encoding_format, etc.
-
batch_kwargs
(Dict
, default:{}
) –Additional keyword arguments for the batch class.
Returns:
-
EmbeddingBatch
–An instance of the EmbeddingBatch class.
Raises:
-
BatchInitializationError
–If the input data is invalid.
Usage:
batch = OpenAIEmbeddingBatch.create([
"Hello world",
"Hello LangBatch"
],
request_kwargs={"model": "text-embedding-3-small"})
Source code in langbatch\EmbeddingBatch.py
cancel
Usage:
# create a batch and start batch process
batch = OpenAIChatCompletionBatch(file)
batch.start()
# cancel the batch process
batch.cancel()