Building AI Agents are 5% AI and 100% Software Engineering

Production-level agents survive or die on data pipelines, controls, and observability, rather than model selection. The DOC-CHAT pipeline below maps the concrete layers and why they are important.

What is a “doc-to-chat” pipeline?

The Doc-to-Chat pipeline ingests enterprise documents, standardizes them, performs governance, indexes embedded relationships and relationship characteristics, and provides retrieval + generation using intra-human (HITL) checkpoints behind the authentication API. This is a reference architecture for agent Q&A, co-pilot and workflow automation, and the answers must be respected and ready for review. Production implementation is a change in rags (retrieval generator) supported by LLM guardrails, governance and Opentelemetry.

How do you integrate clearly with your existing stack?

Use standard service boundaries (REST/JSON, GRPC) on storage tiers that your organization already trusts. For tables, icebergs can provide acid, pattern evolution, partition evolution and snapshots, which are essential for reproducible retrieval and backfill. For vectors, use a system that coexists with SQL filters: PGVector is embedded in PostgreSQL with business keys and ACL tags; a dedicated engine such as Milvus Hander High-Qps Ann with decomposed storage/computing. In practice, many teams run simultaneously: SQL+PGVECTOR and MILVUS for transactions for major searches.

Key attributes

  • Iceberg Table: Acid, hidden partitions, snapshot isolation; vendor supports warehouses.
  • PGVECTOR: SQL+vector similarity is precisely added and policy execution in a query plan.
  • Milves: A hierarchical, scalable architecture for large-scale similarity searches.

How do agents, humans and workflows coordinate a “knowledge structure”?

Production agents require clear coordination points for human approval, correction or upgrade. AWS A2i provides a managed hitl loop (private labor, traffic definition) that is a specific blueprint for gated low confidence output. Framework models like Langgraph These human checkpoint internal graphs are approved as a first-class step in DAG, rather than a temporary callback. Use them to do doors like publishing summary, filing tickets, or submitting codes.

pattern: LLM → Confidence/guardrail check → HITL door → Side effects. Stick to each artifact (tips, search settings, decisions) for auditability and future reruns.

How to achieve reliability before any model reaches a model?

Think of reliability as a layered defense:

  1. Language + content guardrail: Before verification input/output for security and policy. Options span management (bedrock guardrails) and OSS (Nemo Guardrails, guardrails AI; Llama Guard). Independent comparison and trade-offs on location paper classification.
  2. PII detection/revision: Run the analyzer on source documents and model I/O. Microsoft Presidio provides identification agents and masking, with clear warnings that can be used in conjunction with other controls.
  3. Access Control and Destiny: Perform row/column level ACLs and audits on directories (unified directory) to enable retrieval of permissions; unify lineage and access policies across workspaces.
  4. Search quality gate: Use Ragas/related tools to evaluate rags; blocks or poor drop environments using unreference metrics (loyalty, contextual accuracy/remember).

How do you index and search under actual traffic?

Two axes are important: Intake throughput and Query concurrency.

  • Intake: Normalize at the edge of the lake; write to the iceberg to get a snapshot of the version and then embed asynchronously. This allows for deterministic reconstruction and reindex.
  • Vector Service: Milvus’ shared storage, decomposition computing architecture supports horizontal scaling with independent failure domains; uses HNSW/IVF/Flat Hybrids and replica sets to balance recall/latency.
  • SQL + vector: Keep business on the server side (PGVECTOR), e.g. WHERE tenant_id = ? AND acl_tag @> ... ORDER BY embedding :q LIMIT k. This avoids n+1 trips and respects policy.
  • Blocking/embedding strategy: Adjust block size/overlapping and semantic boundaries; bad change is the silent killer of the recall.

For structured + unstructured fusion, priority Mixed search (BM25 + ANN + RERANKER) and store structured functions next to vectors to support filter and reordering functions at query time.

How do you monitor logs?

you need to Traces, indicators and assessments Stitched together:

  • Distributed tracking: The launch of Opentelemetry spans ingestion, retrieval, model calls and tools; Langsmith locally ingests Otel traces and interoperates with external APMs (Jaeger, Datadog, Elastic). This gives end-to-end timing, hint, context and cost per request.
  • LLM Observability Platform: Compare options via tracking, evals, cost tracking and enterprise ready (Langsmith, Arize Phoenix, Langfuse, Datadog). Independent reviews and matrices can be used.
  • Continuous Assessment: Schedule RAG EVALS (RAGAS/DEEPEVAL/MLFLOW) on Canary Sets and Live Traffic Replays; Track Loyalty and Ground Drift.

Add to Pattern analysis/mapping During ingestion, to maintain observability related to data shape changes (e.g., new templates, evolution) and to explain regression of regression when upstream sources move.

Example: doc-to-chat reference stream (signal and gate)

  1. Intake: Connector → Text Extraction → Normalization → Iceberg Write (Acid, Snapshot).
  2. Governance: PII scan (Presidio) → Edit/mask with ACL policy → Directory registration.
  3. index: Embed work → PGVECTOR (Strategic Awareness Join) and MILVUS (High QPS ANN).
  4. Serve: REST/GRPC → Hybrid Search → Guardrail → LLM → Use.
  5. Hitl: Low Confidence Path to A2I/Langgraph Approval Steps.
  6. observe: Otel tracks Langsmith/APM+ program rag evaluation.

Why “5% AI, 100% Software Engineering” is accurate in practice?

Most power outages and trust failures in proxy systems are not model regressions. them Data quality, permissions, retrieval attenuation or lack of telemetry. The controls above (ACID table, ACL directory, PII guardrail, hybrid search, Otel Traces and Human Gates) determine whether the same basic model is safe, fast and reliable for your users. Invest in these first; if necessary, exchange models later.


refer to:


Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.

🔥[Recommended Read] NVIDIA AI Open Source VIPE (Video Pose Engine): A powerful and universal 3D video annotation tool for spatial AI

You may also like...