This AI paper from Microsoft introduces disk integration systems: Cost-effective and low-latency vector search using Azure Cosmos DB

The ability to search for high-dimensional vector representations has become a core requirement of modern data systems. These vector representations, generated by deep learning models, encapsulate the semantics and contextual meanings of the data. This enables the system to retrieve results based on exact matches, but rather on correlation and similarity. This semantic capability is crucial in large-scale applications (e.g., web search, AI-driven assistant) and content suggestions, where both users and agents need to access information in a meaningful way, rather than just structured queries.
One of the main problems faced in vector-based retrieval is the high cost and complexity of operating separate systems for transaction data and vector indexing. Traditionally, vector databases are only used for semantic search performance, but they require users to copy data from their primary databases, thus introducing latency, storage overhead and inconsistency risks. Developers also bear the burden of syncing two different systems that can limit scalability, flexibility, and data integrity when updates occur quickly.
Some popular vector search tools, such as Zilliz and Pinecone, are independent services that provide effective similarity searches. However, these platforms rely on a segmented or fully memory-based architecture. They usually need to re-create the index repeatedly and may suffer from latency peaks and large memory usage. This makes them inefficient when it involves large-scale or ever-changing data. The problem worsens when handling updates, filtering queries, or managing multiple tenants because these systems lack in-depth integration with transaction operations and structured indexes.
Microsoft researchers have proposed a way to directly integrate vector indexes into Azure Cosmos DB’s NOSQL engine. They used Diskann, a graph-based indexing library known for its performance in large-scale semantic searches and redesigned it to work in the infrastructure of Cosmos DB. This design eliminates the need for separate vector databases. The built-in features of Cosmos DB, such as high availability, resilience, multi-lease and automatic partitioning, have been fully utilized to make the solution both cost-effective and scalable. Each partition of each collection maintains a vector index that is synchronized with the existing BW-Tree index structure with the main document data.
The rewritten Diskann library uses Rust and introduces asynchronous operations to ensure compatibility with the database environment. It allows the database to retrieve or update necessary vector components, such as quantized versions or neighbor lists, thereby reducing memory usage. Vector insertion and query are managed using a hybrid approach, and most calculations are performed in quantized space. The design supports paged search and filter-aware traversal, meaning that queries can effectively process complex predicates and extensions of billions of vectors. The method also includes a fragment indexing pattern that allows separate indexing based on a defined key (such as a tenant ID or time period).
In the experiment, the system showed strong performance. For a dataset of 10 million 768-dimensional vectors, the query latency is still below 20 milliseconds (P50), and the system obtains a 94.64% recall. Azure Cosmos DB provides 41 times the query cost compared to the enterprise layer products. Even if the index increases from 100,000 vectors to 10 million vectors, the delay or request unit (RUS) remains cost-efficient. At the time of intake, Cosmos DB charges about $162.50 for 10 million vector inserts, which is lower than Pinecone and DataStax, although higher than Zilliz. Furthermore, recalls remain stable even during major update cycles, and field absences can significantly improve the accuracy of data distribution.
This study proposes a compelling solution that can be unified vector search through a transaction database. Microsoft’s research team designed a system that simplifies operations and achieves considerable performance in terms of cost, latency and scalability. By embedding vector searches into Cosmos DB, they provide a practical template that integrates semantic functionality directly into operational workloads.
View paper. All credits for this study are to the researchers on the project. Also, please feel free to follow us twitter And don’t forget to join us 90K+ ml reddit.

Nikhil is an intern consultant at Marktechpost. He is studying for a comprehensive material degree in integrated materials at the Haragpur Indian Technical College. Nikhil is an AI/ML enthusiast and has been studying applications in fields such as biomaterials and biomedical sciences. He has a strong background in materials science, and he is exploring new advancements and creating opportunities for contribution.
🚨Build a Genai you can trust. ⭐️Parlant is your open source engine for controlled, compliance and purposeful AI conversations – Star Parlant on Github! (Promotion)