OceanBase releases seekdb: an open source AI native hybrid search database for multi-model RAG and AI Agent

Artificial intelligence applications rarely deal with a clean table. They mix user profiles, chat logs, JSON metadata, embeds, and sometimes spatial data. Most teams solve this problem through a patchwork of OLTP databases, vector stores, and search engines. marine base release Search databasean open source database focused on artificial intelligence (under the Apache 2.0 license). eekdb is described as an AI-native search database that unifies relational data, vector data, text, JSON and GIS in a single engine and exposes hybrid search and database AI workflows.

What is eekdb?

Search database Positioned as a lightweight embedded version of the OceanBase engine, it is targeted at AI applications rather than general distributed deployment. It runs as a single-node database, supports embedded mode and client or server mode, and remains compatible with the MySQL driver and SQL syntax.

In the capability matrix, eekdb tagged:

  • Support embedded database
  • Support independent database
  • Distributed databases are not supported

The complete OceanBase product covers distributed cases.

From a data model perspective, eekdb supports:

  • Relational data using standard SQL
  • vector search
  • Full text search
  • JSON data
  • spatial geographic information system data

All within one storage and indexing layer.

Hybrid search as core feature

The main feature launched by OceanBase is hybrid search. This is a search that combines vector-based semantic retrieval, full-text keyword retrieval, and scalar filters in a single query and a single ranking step.

eekdb implements hybrid search through a system package named DBMS_HYBRID_SEARCH, which has two entry points:

  • DBMS_HYBRID_SEARCH.SEARCH returns results in JSON format, sorted by relevance
  • DBMS_HYBRID_SEARCH.GET_SQL returns the specific SQL string used for execution

Mixed search paths can be run:

  • Pure vector search
  • Pure full text search
  • Combined hybrid search

And relationship filters and connections can be pushed to the store. It also supports query reranking strategies such as weighted score and reciprocal ranking fusion, and can plug in large language model-based rerankers.

For Retrieval Enhanced Generation (RAG) and Agent Memory, this means you can write a single SQL query to perform semantic matching on embeds, exact matching on product codes or proper names, and relational filtering on user or tenant scopes.

Vector and full text engine details

At its core, seekdb exposes a modern vector and Full text stack.

For vectors, seekdb:

  • Supports dense vectors and sparse vectors
  • Supports Manhattan, Euclidean, inner product, and cosine distance metrics
  • Provides memory index types such as HNSW, HNSW SQ, HNSW BQ
  • Provides disk-based index types including IVF and IVF PQ

Hybrid vector indexing shows how to store raw text, let eekdb automatically call the embedding model, and let the system maintain the corresponding vector index without the need for a separate pre-processing pipeline.

For text, seekdb provides full-text search:

  • Keywords, phrases, and Boolean queries
  • BM25 Relevance Ranking
  • Multiple tokenizer modes

The key is that full-text and vector indexes are first-class and integrated in the same query planner as scalar and GIS indexes, so hybrid searches require no external orchestration.

AI capabilities within the database

Search database Includes built-in AI function expressions that let you call models directly from SQL, eliminating the need for a separate application service to mediate each call. The main functions are:

  • AI_EMBED Convert text to embedded
  • AI_COMPLETE Generate text using chat or completion model
  • AI_RERANK Rerank candidate list
    AI_PROMPT assembles the prompt template and dynamic values ​​into the JSON object of AI_COMPLETE

Model metadata and endpoints are managed by the DBMS_AI_SERVICE package, which allows you to register external providers, set URLs, and configure keys, all on the database side.

Multimodal data and workloads

Search database Designed to handle multiple data patterns in a single node. It features a multi-modal data and index layer covering vector, text, JSON and GIS, as well as a multi-model compute layer for mixed workloads across vector, full-text and scalar terms.

It also provides JSON indexes for metadata queries and GIS indexes for spatial conditions. This allows queries like:

  • Find semantically similar documents
  • Filter by JSON metadata such as tenant, region, or category
  • Constrained by spatial extent or polygon

Not leaving the same engine.

Because seekdb is derived from the OceanBase engine, it inherits ACID transactions, mixed row and column storage, and vectorized execution, although large-scale distributed deployment is still the work of a complete OceanBase database.

comparison table

Main points

  1. AI native hybrid search: seekdb unifies vector search, full-text search, and relational filtering into a single SQL and DBMS_HYBRID_SEARCH interface, so RAG and proxy workloads can run multiple signal retrievals in a single query, rather than stitching multiple engines together.
  2. Multimodal data in one engine: eekdb stores and indexes relational data, vectors, text, JSON and GIS in the same engine, which allows AI applications to maintain consistency of documents, embeddings and metadata without maintaining separate databases.
  3. In RAG’s database AI function: With AI_EMBED, AI_COMPLETE, AI_RERANK, and AI_PROMPT, seekdb can call embedded models, LLMs, and reorderers directly from SQL, simplifying the RAG pipeline and moving more orchestration logic to the database layer.
  4. Single-node, embedded-friendly design: Seekdb is a single-node, MySQL-compatible engine that supports embedded and standalone modes, while distributed, large-scale deployments remain the full OceanBase role, making Seekdb suitable for local, edge and service embedded AI workloads.
  5. Open source and tool ecosystem: eekdb is open sourced under Apache 2.0 and integrates with the growing ecosystem of AI tools and frameworks, providing Python support through pyseekdb and MCP-based integration for code assistants and agents, so it can serve as a unified data plane for AI applications.

Check repurchase agreement and project. Please feel free to check out our GitHub page for tutorials, code, and notebooks. In addition, welcome to follow us twitter And don’t forget to join our 100k+ ML SubReddit and subscribe our newsletter. wait! Are you using Telegram? Now you can also join us via telegram.


Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for the benefit of society. His most recent endeavor is the launch of Marktechpost, an artificial intelligence media platform that stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easy to understand for a broad audience. The platform has more than 2 million monthly views, which shows that it is very popular among viewers.

🙌 FOLLOW MARKTECHPOST: Add us as your go-to source on Google.

You may also like...