How to use open source AI models to design a full-featured enterprise AI assistant with search enhancements and policy guardrails
In this tutorial, we’ll explore how to build a compact yet powerful enterprise AI assistant and run it easily on Colab. We first integrate Retrieval Augmented Generation (RAG), using FAISS for document retrieval and FLAN-T5 for text generation, both of which are completely open source and free. As we progress, we embed enterprise policies such as data redaction, access control and PII protection directly into workflows, ensuring our systems are intelligent and compliant. Check The complete code is here.
!pip -q install faiss-cpu transformers==4.44.2 accelerate sentence-transformers==3.0.1
from typing import List, Dict, Tuple
import re, textwrap, numpy as np, torch
from sentence_transformers import SentenceTransformer
import faiss
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
GEN_MODEL = "google/flan-t5-base"
EMB_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
gen_tok = AutoTokenizer.from_pretrained(GEN_MODEL)
gen_model = AutoModelForSeq2SeqLM.from_pretrained(GEN_MODEL, device_map="auto")
generate = pipeline("text2text-generation", model=gen_model, tokenizer=gen_tok)
emb_device = "cuda" if torch.cuda.is_available() else "cpu"
emb_model = SentenceTransformer(EMB_MODEL, device=emb_device)We first set up the environment and load the required models. We initialize FLAN-T5 for text generation and MiniLM for embedding representation. We ensure that both models are configured to automatically use the GPU when available so that our pipeline runs efficiently. Check The complete code is here.
DOCS = [
 {"id":"policy_sec_001","title":"Data Security Policy",
  "text":"All customer data must be encrypted at rest (AES-256) and in transit (TLS 1.2+). Access is role-based (RBAC). Secrets are stored in a managed vault. Backups run nightly with 35-day retention. PII includes name, email, phone, address, PAN/Aadhaar."},
 {"id":"policy_ai_002","title":"Responsible AI Guidelines",
  "text":"Use internal models for confidential data. Retrieval sources must be logged. No customer decisioning without human-in-the-loop. Redact PII in prompts and outputs. All model prompts and outputs are stored for audit for 180 days."},
 {"id":"runbook_inc_003","title":"Incident Response Runbook",
  "text":"If a suspected breach occurs, page on-call SecOps. Rotate keys, isolate affected services, perform forensic capture, notify DPO within regulatory SLA. Communicate via the incident room only."},
 {"id":"sop_sales_004","title":"Sales SOP - Enterprise Deals",
  "text":"For RFPs, use the approved security questionnaire responses. Claims must match policy_sec_001. Custom clauses need Legal sign-off. Keep records in CRM with deal room links."}
]
def chunk(text:str, chunk_size=600, overlap=80):
   w = text.split()
   if len(w) We create a small business style documentation set to model internal policies and procedures. We then break these long texts into manageable chunks so that they can be embedded and retrieved efficiently. This chunking helps our AI assistant process contextual information more accurately. Check The complete code is here.
def build_index(chunks:List[Dict]) -> Tuple[faiss.IndexFlatIP, np.ndarray]:
   vecs = emb_model.encode([c["text"] for c in chunks], normalize_embeddings=True, convert_to_numpy=True)
   index = faiss.IndexFlatIP(vecs.shape[1]); index.add(vecs); return index, vecs
INDEX, VECS = build_index(CORPUS)
PII_PATTERNS = [
   (re.compile(r"bd{10}b"), ""),
   (re.compile(r"b[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,}b", re.I), ""),
   (re.compile(r"bd{12}b"), ""),
   (re.compile(r"b[A-Z]{5}d{4}[A-Z]b"), "")
]
def redact(t:str)->str:
   for p,r in PII_PATTERNS: t = p.sub(r, t)
   return t
POLICY_DISALLOWED = [
   re.compile(r"b(share|exfiltrate)b.*b(raw|all)b.*bdatab", re.I),
   re.compile(r"bdisableb.*bencryptionb", re.I),
]
def policy_check(q:str):
   for r in POLICY_DISALLOWED:
       if r.search(q): return False, "Request violates security policy (data exfiltration/encryption tampering)."
   return True, ""    We embed all chunks using a sentence converter and store them in the FAISS index for fast retrieval. We introduce PII redaction rules and policy checks to prevent data misuse. By doing this, we ensure that our assistants adhere to corporate security and compliance guidelines. Check The complete code is here.
def retrieve(query:str, k=4)->List[Dict]:
   qv = emb_model.encode([query], normalize_embeddings=True, convert_to_numpy=True)
   scores, idxs = INDEX.search(qv, k)
   return [{**CORPUS[i], "score": float(s)} for s,i in zip(scores[0], idxs[0])]
SYSTEM = ("You are an enterprise AI assistant.n"
         "- Answer strictly from the provided CONTEXT.n"
         "- If missing info, say what is unknown and suggest the correct policy/runbook.n"
         "- Keep it concise and cite titles + doc_ids inline like [Title (doc_id:chunk)].")
def build_prompt(user_q:str, ctx_blocks:List[Dict])->str:
   ctx = "nn".join(f"[{i+1}] {b['title']} (doc:{b['doc_id']}:{b['chunk_id']})n{b['text']}" for i,b in enumerate(ctx_blocks))
   uq = redact(user_q)
   return f"SYSTEM:n{SYSTEM}nnCONTEXT:n{ctx}nnUSER QUESTION:n{uq}nnINSTRUCTIONS:n- Cite sources inline.n- Keep to 5-8 sentences.n- Preserve redactions."
def answer(user_q:str, k=4, max_new_tokens=220)->Dict:
   ok,msg = policy_check(user_q)
   if not ok: return {"answer": f"❌ {msg}", "ctx":[]}
   ctx = retrieve(user_q, k=k); prompt = build_prompt(user_q, ctx)
   out = generate(prompt, max_new_tokens=max_new_tokens, do_sample=False)[0]["generated_text"].strip()
   return {"answer": out, "ctx": ctx}We design the retrieval function to get the relevant document parts for each user query. We then build a structured prompt that combines context and questions for FLAN-T5 to generate precise answers. This step ensures a down-to-earth, policy-compliant response from our assistants. Check The complete code is here.
def eval_query(user_q:str, ctx:List[Dict])->Dict:
   terms = [w.lower() for w in re.findall(r"[a-zA-Z]{4,}", user_q)]
   ctx_text = " ".join(c["text"].lower() for c in ctx)
   hits = sum(t in ctx_text for t in terms)
   return {"terms": len(terms), "hits": hits, "hit_rate": round(hits/max(1,len(terms)), 2)}
QUERIES = [
   "What encryption and backup rules do we follow for customer data?",
   "Can we auto-answer RFP security questionnaires? What should we cite?",
   "If there is a suspected breach, what are the first three steps?",
   "Is it allowed to share all raw customer data externally for testing?"
]
for q in QUERIES:
   res = answer(q, k=3)
   print("n" + "="*100); print("Q:", q); print("nA:", res["answer"])
   if res["ctx"]:
       ev = eval_query(q, res["ctx"]); print("nRetrieved Context (top 3):")
       for r in res["ctx"]: print(f"- {r['title']} [{r['doc_id']}:{r['chunk_id']}] score={r['score']:.3f}")
       print("Eval:", ev)We evaluate our system using sample enterprise queries that test encryption, RFPs, and incident procedures. We display retrieved documents, answers, and simple hit scores to check for relevance. Through this demonstration, we observed our enterprise AI assistant safely and accurately perform retrieval-augmented inference.
In summary, we successfully created a standalone enterprise AI system that can retrieve, analyze, and respond to business queries while maintaining strong guardrails. We really appreciate how seamlessly we can combine FAISS for retrieval, Sentence Transformers for embedding, and FLAN-T5 for generation to simulate an internal enterprise knowledge engine. When we finished, we realized that this simple Colab-based implementation could serve as a blueprint for scalable, auditable, and compliant enterprise deployments.
Check The complete code is here. Please feel free to check out our GitHub page for tutorials, code, and notebooks. In addition, welcome to follow us twitter And don’t forget to join our 100k+ ML SubReddit and subscribe our newsletter. wait! Are you using Telegram? Now you can also join us via telegram.
Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for the benefit of society. His most recent endeavor is the launch of Marktechpost, an AI media platform that stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easy to understand for a broad audience. The platform has more than 2 million monthly views, which shows that it is very popular among viewers.
🙌 FOLLOW MARKTECHPOST: Add us as your go-to source on Google.
 
																								 
																								