OpenAI launches real-time API and other features for developers

OpenAI didn’t announce any new models at its Dev Day event, but the new API features will excite developers who want to use its models to build powerful applications.
It’s been a rough few weeks for OpenAI, with CTO Mira Murati and other principal researchers joining a growing list of former employees. The company is facing increasing pressure from other flagship models, including open-source models that offer developers cheaper and more powerful options.
New features introduced by OpenAI include a real-time API (beta version), visual fine-tuning, and efficiency improvement tools such as on-the-fly caching and model distillation.
Real-time API
The real-time API is the most exciting new feature, although it is still in beta. It enables developers to build low-latency speech-to-speech experiences in their applications without having to use separate models for speech recognition and text-to-speech conversion.
With this API, developers can now create applications that allow real-time conversations with AI, such as voice assistants or language learning tools, all through a single API call. This isn’t quite as seamless as GPT-4o’s advanced voice mode, but it’s pretty close.
It’s not cheap though, at about $0.06 per minute for audio input and $0.24 per minute for audio output.
The new real-time API comes from open artificial intelligence It’s incredible…
Watch it actually use twillio to call the store to order 400 strawberries. All have sound. 🍓🎤 pic.twitter.com/J2BBoL9yFv
— Ty (@FieroTy) October 1, 2024
Vision fine-tuning
Vision fine-tuning in the API allows developers to enhance a model’s ability to understand and interact with images. By fine-tuning GPT-4o with images, developers can create applications that perform well in tasks such as visual search or object detection.
Companies like Grab are already taking advantage of this capability, improving the accuracy of their mapping services by fine-tuning their models to recognize traffic signs in street imagery.
OpenAI also provides an example of how GPT-4o can generate additional content for a website after being fine-tuned to stylistically match the site’s existing content.
Prompt cache
To improve cost efficiency, OpenAI introduces hint caching, a tool that reduces the cost and latency of commonly used API calls. By reusing recently processed input, developers can reduce costs by 50% and improve response times. This feature is particularly useful for applications that require long conversations or repetitive context, such as chatbots and customer service tools.
Using cached input can save up to 50% on input token costs.
model distillation
Model distillation allows developers to use the output of a larger, more powerful model to fine-tune smaller, more cost-effective models. This is a game changer because previously, distillation required multiple disconnected steps and tools, making it a time-consuming and error-prone process.
Before OpenAI integrated model distillation capabilities, developers had to manually coordinate different parts of the process, such as generating data from larger models, preparing fine-tuning datasets, and measuring performance using various tools.
Developers can now automatically store output pairs from larger models such as GPT-4o and use these pairs to fine-tune smaller models such as GPT-4o-mini. The entire process of dataset creation, fine-tuning and evaluation can be done in a more structured, automated and efficient way.
A streamlined development process, lower latency, and lower costs will make OpenAI’s GPT-4o model attractive to developers looking to quickly deploy powerful applications. It will be interesting to see what applications multi-modal functionality enables.