Coding implementation to build a unified tool orchestration framework from documentation to automation pipelines
In this tutorial, we build a compact, efficient framework that demonstrates how to convert tool documentation into standardized, callable interfaces, register these tools in a central system, and execute them as part of an automation pipeline. As we complete each stage, we create a simple converter, design simulated bioinformatics tools, organize them into registries, and benchmark single and multi-step pipeline executions. Through this process, we explore how structured tool interfaces and automation can simplify and modularize data workflows. Check The complete code is here.
import re, json, time, random
from dataclasses import dataclass
from typing import Callable, Dict, Any, List, Tuple
@dataclass
class ToolSpec:
name: str
description: str
inputs: Dict[str, str]
outputs: Dict[str, str]
def parse_doc_to_spec(name: str, doc: str) -> ToolSpec:
desc = doc.strip().splitlines()[0].strip() if doc.strip() else name
arg_block = "n".join([l for l in doc.splitlines() if "--" in l or ":" in l])
inputs = {}
for line in arg_block.splitlines():
m = re.findall(r"(--?w[w-]*|bw+b)s*[:=]?s*(w+)?", line)
for key, typ in m:
k = key.lstrip("-")
if k and k not in inputs and k not in ["Returns","Output","Outputs"]:
inputs[k] = (typ or "str")
if not inputs: inputs = {"in": "str"}
return ToolSpec(name=name, description=desc, inputs=inputs, outputs={"out":"json"})
We first define the structure of the tool and write a simple parser that converts ordinary documents into standardized tool specifications. This helps us automatically extract parameters and outputs from text descriptions. Check The complete code is here.
def tool_fastqc(seq_fasta: str, min_len:int=30) -> Dict[str,Any]:
seqs = [s for s in re.split(r">[^n]*n", seq_fasta)[1:]]
lens = [len(re.sub(r"s+","",s)) for s in seqs]
q30 = sum(l>=min_len for l in lens)/max(1,len(lens))
gc = sum(c in "GCgc" for s in seqs for c in s)/max(1,sum(lens))
return {"n_seqs":len(lens),"len_mean":(sum(lens)/max(1,len(lens))),"pct_q30":q30,"gc":gc}
def tool_bowtie2_like(ref:str, reads:str, mode:str="end-to-end") -> Dict[str,Any]:
def revcomp(s):
t=str.maketrans("ACGTacgt","TGCAtgca"); return s.translate
reads_list=[r for r in re.split(r">[^n]*n", reads)[1:]]
ref_seq="".join(ref.splitlines()[1:])
hits=[]
for i,r in enumerate(reads_list):
rseq="".join(r.split())
aligned = (rseq in ref_seq) or (revcomp(rseq) in ref_seq)
hits.append({"read_id":i,"aligned":bool(aligned),"pos":ref_seq.find(rseq)})
return {"n":len(hits),"aligned":sum(h["aligned"] for h in hits),"mode":mode,"hits":hits}
def tool_bcftools_like(ref:str, alt:str, win:int=15) -> Dict[str,Any]:
ref_seq="".join(ref.splitlines()[1:]); alt_seq="".join(alt.splitlines()[1:])
n=min(len(ref_seq),len(alt_seq)); vars=[]
for i in range(n):
if ref_seq[i]!=alt_seq[i]: vars.append({"pos":i,"ref":ref_seq[i],"alt":alt_seq[i]})
return {"n_sites":n,"n_var":len(vars),"variants":vars[:win]}
FASTQC_DOC = """FastQC-like quality control for FASTA
--seq_fasta: str --min_len: int Outputs: json"""
BOWTIE_DOC = """Bowtie2-like aligner
--ref: str --reads: str --mode: str Outputs: json"""
BCF_DOC = """bcftools-like variant caller
--ref: str --alt: str --win: int Outputs: json"""
We create simulation implementations of bioinformatics tools such as FastQC, Bowtie2 and Bcftools. We define their expected inputs and outputs so that they can execute consistently through a unified interface. Check The complete code is here.
@dataclass
class MCPTool:
spec: ToolSpec
fn: Callable[..., Dict[str,Any]]
class MCPServer:
def __init__(self): self.tools: Dict[str,MCPTool] = {}
def register(self, name:str, doc:str, fn:Callable[...,Dict[str,Any]]):
spec = parse_doc_to_spec(name, doc); self.tools[name]=MCPTool(spec, fn)
def list_tools(self) -> List[Dict[str,Any]]:
return [dict(name=t.spec.name, description=t.spec.description, inputs=t.spec.inputs, outputs=t.spec.outputs) for t in self.tools.values()]
def call_tool(self, name:str, args:Dict[str,Any]) -> Dict[str,Any]:
if name not in self.tools: raise KeyError(f"tool {name} not found")
spec = self.tools[name].spec
kwargs={k:args.get(k) for k in spec.inputs.keys()}
return self.tools[name].fn(**kwargs)
server=MCPServer()
server.register("fastqc", FASTQC_DOC, tool_fastqc)
server.register("bowtie2", BOWTIE_DOC, tool_bowtie2_like)
server.register("bcftools", BCF_DOC, tool_bcftools_like)
Task = Tuple[str, Dict[str,Any]]
PIPELINES = {
"rnaseq_qc_align_call":[
("fastqc", {"seq_fasta":"{reads}", "min_len":30}),
("bowtie2", {"ref":"{ref}", "reads":"{reads}", "mode":"end-to-end"}),
("bcftools", {"ref":"{ref}", "alt":"{alt}", "win":15}),
]
}
def compile_pipeline(nl_request:str) -> List[Task]:
key = "rnaseq_qc_align_call" if re.search(r"rna|qc|align|variant|call", nl_request, re.I) else "rnaseq_qc_align_call"
return PIPELINES[key]
We built a lightweight server that registers tools, lists their specifications, and allows us to call them programmatically. We also defined a basic pipeline structure that outlines the order in which the tools are run. Check The complete code is here.
def mk_fasta(header:str, seq:str)->str: return f">{header}n{seq}n"
random.seed(0)
REF_SEQ="".join(random.choice("ACGT") for _ in range(300))
REF = mk_fasta("ref",REF_SEQ)
READS = mk_fasta("r1", REF_SEQ[50:130]) + mk_fasta("r2","ACGT"*15) + mk_fasta("r3", REF_SEQ[180:240])
ALT = mk_fasta("alt", REF_SEQ[:150] + "T" + REF_SEQ[151:])
def run_pipeline(nl:str, ctx:Dict[str,str]) -> Dict[str,Any]:
plan=compile_pipeline(nl); results=[]; t0=time.time()
for name, arg_tpl in plan:
args={k:(v.format(**ctx) if isinstance(v,str) else v) for k,v in arg_tpl.items()}
out=server.call_tool(name, args)
results.append({"tool":name,"args":args,"output":out})
return {"request":nl,"elapsed_s":round(time.time()-t0,4),"results":results}
We prepare small synthetic FASTA data for testing and implement functions that run the entire pipeline. Here we pass the tool parameters dynamically and execute each step sequentially. Check The complete code is here.
def bench_individual() -> List[Dict[str,Any]]:
cases=[
("fastqc", {"seq_fasta":READS,"min_len":25}),
("bowtie2", {"ref":REF,"reads":READS,"mode":"end-to-end"}),
("bcftools", {"ref":REF,"alt":ALT,"win":10}),
]
rows=[]
for name,args in cases:
t0=time.time(); ok=True; err=None; out=None
try: out=server.call_tool(name,args)
except Exception as e: ok=False; err=str(e)
rows.append({"tool":name,"ok":ok,"ms":int((time.time()-t0)*1000),"out_keys":list(out.keys()) if ok else [],"err":err})
return rows
def bench_pipeline() -> Dict[str,Any]:
t0=time.time()
res=run_pipeline("Run RNA-seq QC, align, and variant call.", {"ref":REF,"reads":READS,"alt":ALT})
ok = all(step["output"] for step in res["results"])
return {"pipeline":"rnaseq_qc_align_call","ok":ok,"ms":int((time.time()-t0)*1000),"n_steps":len(res["results"])}
print("== TOOLS =="); print(json.dumps(server.list_tools(), indent=2))
print("n== INDIVIDUAL BENCH =="); print(json.dumps(bench_individual(), indent=2))
print("n== PIPELINE BENCH =="); print(json.dumps(bench_pipeline(), indent=2))
print("n== PIPELINE RUN =="); print(json.dumps(run_pipeline("Run RNA-seq QC, align, and variant call.", {"ref":REF,"reads":READS,"alt":ALT}), indent=2))
We benchmark individual tools and entire pipelines, capturing their output and performance metrics. Finally, we print the results to verify that each stage of the workflow ran successfully and integrated smoothly.
In summary, we have a clear understanding of how lightweight tool transformation, registration, and orchestration work together in a single environment. We look at how a unified interface allows us to seamlessly connect multiple tools, run them sequentially, and measure their performance. This hands-on exercise helps us understand how simple design principles, standardization, automation, and modularity can improve the repeatability and efficiency of computing workflows in any domain.
Check The complete code is here. Please feel free to check out our GitHub page for tutorials, code, and notebooks. In addition, welcome to follow us twitter And don’t forget to join our 100k+ ML SubReddit and subscribe our newsletter. wait! Are you using Telegram? Now you can also join us via telegram.
Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for the benefit of society. His most recent endeavor is the launch of Marktechpost, an AI media platform that stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easy to understand for a broad audience. The platform has more than 2 million monthly views, which shows that it is very popular among viewers.
🙌 FOLLOW MARKTECHPOST: Add us as your go-to source on Google.