Using pathway reasoning to build multi-agent systems for integrated transcriptomics, proteomics, and metabolomics data interpretation

In this tutorial, we build an advanced multi-agent pipeline that can interpret integrated omics data, including transcriptomics, proteomics, and metabolomics, to reveal key biological insights. We start by generating coherent synthetic datasets that mimic real-world biological trends, then progress through agents designed for statistical analysis, network inference, pathway enrichment, and drug repurposing. Each component contributes to a cumulative interpretation process that allows us to identify important genes, infer causal relationships, and generate biologically plausible hypotheses supported by patterns in the data. Check The complete code is here.

Copy codeCopiedUse a different browser
import numpy as np
import pandas as pd
from collections import defaultdict, deque
from dataclasses import dataclass
from typing import Dict, List, Tuple
import warnings
warnings.filterwarnings('ignore')


PATHWAY_DB = {
   'Glycolysis': {'genes': ['HK2', 'PFKM', 'PKM', 'LDHA', 'GAPDH', 'ENO1'],
                  'metabolites': ['Glucose', 'G6P', 'F16BP', 'Pyruvate', 'Lactate'], 'score': 0},
   'TCA_Cycle': {'genes': ['CS', 'IDH1', 'IDH2', 'OGDH', 'SDHA', 'MDH2'],
                 'metabolites': ['Citrate', 'Isocitrate', 'α-KG', 'Succinate', 'Malate'], 'score': 0},
   'Oxidative_Phosphorylation': {'genes': ['NDUFA1', 'NDUFB5', 'COX5A', 'COX7A1', 'ATP5A1', 'ATP5B'],
                                  'metabolites': ['ATP', 'ADP', 'NAD+', 'NADH'], 'score': 0},
   'Fatty_Acid_Synthesis': {'genes': ['ACACA', 'FASN', 'SCD1', 'ACLY'],
                            'metabolites': ['Malonyl-CoA', 'Palmitate', 'Oleate'], 'score': 0},
   'Fatty_Acid_Oxidation': {'genes': ['CPT1A', 'ACOX1', 'HADHA', 'ACADM'],
                            'metabolites': ['Acyl-CoA', 'Acetyl-CoA'], 'score': 0},
   'Amino_Acid_Metabolism': {'genes': ['GOT1', 'GOT2', 'GLUD1', 'BCAT1', 'BCAT2'],
                             'metabolites': ['Glutamate', 'Glutamine', 'Alanine', 'Aspartate'], 'score': 0},
   'Pentose_Phosphate': {'genes': ['G6PD', 'PGD', 'TKTL1'],
                         'metabolites': ['R5P', 'NADPH'], 'score': 0},
   'Cell_Cycle_G1S': {'genes': ['CCND1', 'CDK4', 'CDK6', 'RB1', 'E2F1'], 'metabolites': [], 'score': 0},
   'Cell_Cycle_G2M': {'genes': ['CCNB1', 'CDK1', 'CDC25C', 'WEE1'], 'metabolites': [], 'score': 0},
   'Apoptosis': {'genes': ['BCL2', 'BAX', 'BID', 'CASP3', 'CASP8', 'CASP9'], 'metabolites': [], 'score': 0},
   'mTOR_Signaling': {'genes': ['MTOR', 'RPTOR', 'RICTOR', 'AKT1', 'TSC1', 'TSC2'],
                      'metabolites': ['Leucine', 'ATP'], 'score': 0},
   'HIF1_Signaling': {'genes': ['HIF1A', 'EPAS1', 'VEGFA', 'SLC2A1'], 'metabolites': ['Lactate'], 'score': 0},
   'p53_Signaling': {'genes': ['TP53', 'MDM2', 'CDKN1A', 'BAX'], 'metabolites': [], 'score': 0},
   'PI3K_AKT': {'genes': ['PIK3CA', 'AKT1', 'AKT2', 'PTEN', 'PDK1'], 'metabolites': [], 'score': 0},
}


GENE_INTERACTIONS = {
   'HK2': ['PFKM', 'HIF1A', 'MTOR'], 'PFKM': ['PKM', 'HK2'], 'PKM': ['LDHA', 'HIF1A'],
   'MTOR': ['AKT1', 'HIF1A', 'TSC2'], 'HIF1A': ['VEGFA', 'SLC2A1', 'PKM', 'LDHA'],
   'TP53': ['MDM2', 'CDKN1A', 'BAX', 'CASP3'], 'AKT1': ['MTOR', 'TSC2', 'MDM2'],
   'CCND1': ['CDK4', 'RB1'], 'CDK4': ['RB1'], 'RB1': ['E2F1'],
}


DRUG_TARGETS = {
   'Metformin': ['NDUFA1'], 'Rapamycin': ['MTOR'], '2-DG': ['HK2'],
   'Bevacizumab': ['VEGFA'], 'Palbociclib': ['CDK4', 'CDK6'], 'Nutlin-3': ['MDM2']
}


@dataclass
class OmicsProfile:
   transcriptomics: pd.DataFrame
   proteomics: pd.DataFrame
   metabolomics: pd.DataFrame
   metadata: Dict

We laid the biological foundation for our system. We define pathway databases, gene-gene interactions, and drug-target mappings as reference networks for all downstream analyses. We also import the necessary libraries and create a data class to store the multi-omics dataset in an organized format. Check The complete code is here.

Copy codeCopiedUse a different browser
class AdvancedOmicsGenerator:
   @staticmethod
   def generate_coherent_omics(n_samples=30, n_timepoints=4, noise=0.2):
       genes = list(set(g for p in PATHWAY_DB.values() for g in p['genes']))
       metabolites = list(set(m for p in PATHWAY_DB.values() for m in p['metabolites'] if m))
       proteins = [f"P_{g}" for g in genes]
       n_control = n_samples // 2
       samples_per_tp = n_samples // n_timepoints
       trans = np.random.randn(len(genes), n_samples) * noise + 10
       metab = np.random.randn(len(metabolites), n_samples) * noise + 5
       for tp in range(n_timepoints):
           start_idx = n_control + tp * samples_per_tp
           end_idx = start_idx + samples_per_tp
           progression = (tp + 1) / n_timepoints
           for i, gene in enumerate(genes):
               if gene in PATHWAY_DB['Glycolysis']['genes']:
                   trans[i, start_idx:end_idx] += np.random.uniform(1.5, 3.5) * progression
               elif gene in PATHWAY_DB['Oxidative_Phosphorylation']['genes']:
                   trans[i, start_idx:end_idx] -= np.random.uniform(1, 2.5) * progression
               elif gene in PATHWAY_DB['Cell_Cycle_G1S']['genes'] + PATHWAY_DB['Cell_Cycle_G2M']['genes']:
                   trans[i, start_idx:end_idx] += np.random.uniform(1, 2) * progression
               elif gene in PATHWAY_DB['HIF1_Signaling']['genes']:
                   trans[i, start_idx:end_idx] += np.random.uniform(2, 4) * progression
               elif gene in PATHWAY_DB['p53_Signaling']['genes']:
                   trans[i, start_idx:end_idx] -= np.random.uniform(0.5, 1.5) * progression
           for i, met in enumerate(metabolites):
               if met in ['Lactate', 'Pyruvate', 'G6P']:
                   metab[i, start_idx:end_idx] += np.random.uniform(1.5, 3) * progression
               elif met in ['ATP', 'Citrate', 'Malate']:
                   metab[i, start_idx:end_idx] -= np.random.uniform(1, 2) * progression
               elif met in ['NADPH']:
                   metab[i, start_idx:end_idx] += np.random.uniform(1, 2) * progression
       prot = trans * 0.8 + np.random.randn(*trans.shape) * (noise * 2)
       conditions = ['Control'] * n_control + [f'Disease_T{i//samples_per_tp}' for i in range(n_samples - n_control)]
       trans_df = pd.DataFrame(trans, index=genes, columns=[f"S{i}_{c}" for i, c in enumerate(conditions)])
       prot_df = pd.DataFrame(prot, index=proteins, columns=trans_df.columns)
       metab_df = pd.DataFrame(metab, index=metabolites, columns=trans_df.columns)
       metadata = {'conditions': conditions, 'n_timepoints': n_timepoints}
       return OmicsProfile(trans_df, prot_df, metab_df, metadata)


class StatisticalAgent:
   @staticmethod
   def differential_analysis(data_df, control_samples, disease_samples):
       control = data_df[control_samples]
       disease = data_df[disease_samples]
       fc = (disease.mean(axis=1) - control.mean(axis=1))
       pooled_std = np.sqrt((control.var(axis=1) + disease.var(axis=1)) / 2)
       t_stat = fc / (pooled_std + 1e-6)
       p_values = 2 * (1 - np.minimum(np.abs(t_stat) / (np.abs(t_stat).max() + 1e-6), 0.999))
       sorted_pvals = np.sort(p_values)
       ranks = np.searchsorted(sorted_pvals, p_values) + 1
       fdr = p_values * len(p_values) / ranks
       return pd.DataFrame({'log2FC': fc, 't_stat': t_stat, 'p_value': p_values,
           'FDR': np.minimum(fdr, 1.0), 'significant': (np.abs(fc) > 1.0) & (fdr < 0.05)}).sort_values('log2FC', ascending=False)


   @staticmethod
   def temporal_analysis(data_df, metadata):
       timepoints = metadata['n_timepoints']
       samples_per_tp = data_df.shape[1] // (timepoints + 1)
       trends = {}
       for gene in data_df.index:
           means = []
           for tp in range(timepoints):
               start = samples_per_tp + tp * samples_per_tp
               end = start + samples_per_tp
               means.append(data_df.iloc[:, start:end].loc[gene].mean())
           if len(means) > 1:
               x = np.arange(len(means))
               coeffs = np.polyfit(x, means, deg=min(2, len(means)-1))
               trends[gene] = {'slope': coeffs[0] if len(coeffs) > 1 else 0, 'trajectory': means}
       return trends

We focus on generating synthetic yet biologically consistent multi-omics data and performing initial statistical analyses. We simulated disease progression at different time points and calculated fold changes, p-values, and FDR-corrected significance levels for genes, proteins, and metabolites. We also examine temporal trends to capture how expression values ​​change over time. Check The complete code is here.

Copy codeCopiedUse a different browser
class NetworkAnalysisAgent:
   def __init__(self, interactions):
       self.graph = interactions
   def find_master_regulators(self, diff_genes):
       sig_genes = diff_genes[diff_genes['significant']].index.tolist()
       impact_scores = {}
       for gene in sig_genes:
           if gene in self.graph:
               downstream = self._bfs_downstream(gene, max_depth=2)
               sig_downstream = [g for g in downstream if g in sig_genes]
               impact_scores[gene] = {
                   'downstream_count': len(downstream),
                   'sig_downstream': len(sig_downstream),
                   'score': len(sig_downstream) / (len(downstream) + 1),
                   'fc': diff_genes.loc[gene, 'log2FC']
               }
       return sorted(impact_scores.items(), key=lambda x: x[1]['score'], reverse=True)
   def _bfs_downstream(self, start, max_depth=2):
       visited, queue = set(), deque([(start, 0)])
       downstream = []
       while queue:
           node, depth = queue.popleft()
           if depth >= max_depth or node in visited:
               continue
           visited.add(node)
           if node in self.graph:
               for neighbor in self.graph[node]:
                   if neighbor not in visited:
                       downstream.append(neighbor)
                       queue.append((neighbor, depth + 1))
       return downstream
   def causal_inference(self, diff_trans, diff_prot, diff_metab):
       causal_links = []
       for gene in diff_trans[diff_trans['significant']].index:
           gene_fc = diff_trans.loc[gene, 'log2FC']
           protein = f"P_{gene}"
           if protein in diff_prot.index:
               prot_fc = diff_prot.loc[protein, 'log2FC']
               correlation = np.sign(gene_fc) == np.sign(prot_fc)
               if correlation and abs(prot_fc) > 0.5:
                   causal_links.append(('transcription', gene, protein, gene_fc, prot_fc))
           for pathway, content in PATHWAY_DB.items():
               if gene in content['genes']:
                   for metab in content['metabolites']:
                       if metab in diff_metab.index and diff_metab.loc[metab, 'significant']:
                           metab_fc = diff_metab.loc[metab, 'log2FC']
                           causal_links.append(('enzymatic', gene, metab, gene_fc, metab_fc))
       return causal_links

We implement network analysis agents to identify master moderators and infer causal relationships. We utilize graph traversal to assess the impact of each gene on other genes and identify connections between transcriptional, proteomic and metabolic layers. This helps us understand which nodes have the greatest downstream impact on biological processes. Check The complete code is here.

Copy codeCopiedUse a different browser
class PathwayEnrichmentAgent:
   def __init__(self, pathway_db, interactions):
       self.pathway_db = pathway_db
       self.interactions = interactions
   def topology_weighted_enrichment(self, diff_genes, diff_metab, network_agent):
       enriched = {}
       for pathway, content in self.pathway_db.items():
           sig_genes = [g for g in content['genes'] if g in diff_genes.index and diff_genes.loc[g, 'significant']]
           weighted_score = 0
           for gene in sig_genes:
               base_score = abs(diff_genes.loc[gene, 'log2FC'])
               downstream = network_agent._bfs_downstream(gene, max_depth=1)
               centrality = len(downstream) / 10
               weighted_score += base_score * (1 + centrality)
           sig_metabs = [m for m in content['metabolites'] if m in diff_metab.index and diff_metab.loc[m, 'significant']]
           metab_score = sum(abs(diff_metab.loc[m, 'log2FC']) for m in sig_metabs)
           total_score = (weighted_score + metab_score * 2) / max(len(content['genes']) + len(content['metabolites']), 1)
           if total_score > 0.5:
               enriched[pathway] = {'score': total_score, 'genes': sig_genes, 'metabolites': sig_metabs,
                   'gene_fc': {g: diff_genes.loc[g, 'log2FC'] for g in sig_genes},
                   'metab_fc': {m: diff_metab.loc[m, 'log2FC'] for m in sig_metabs},
                   'coherence': self._pathway_coherence(sig_genes, diff_genes)}
       return enriched
   def _pathway_coherence(self, genes, diff_genes):
       if len(genes) < 2:
           return 0
       fcs = [diff_genes.loc[g, 'log2FC'] for g in genes]
       same_direction = sum(1 for fc in fcs if np.sign(fc) == np.sign(fcs[0]))
       return same_direction / len(genes)

We add path-level inference by incorporating topologically weighted enrichment analysis. We evaluate which biological pathways exhibit significant activation or inhibition and weight them according to network centrality to reflect their broader impact. The agent also evaluates pathway coherence, indicating whether genes in the pathway exhibit consistent directional movement. Check The complete code is here.

Copy codeCopiedUse a different browser
class DrugRepurposingAgent:
   def __init__(self, drug_db):
       self.drug_db = drug_db


   def predict_drug_response(self, diff_genes, master_regulators):
       predictions = []
       for drug, targets in self.drug_db.items():
           score = 0
           affected_targets = []
           for target in targets:
               if target in diff_genes.index:
                   fc = diff_genes.loc[target, 'log2FC']
                   is_sig = diff_genes.loc[target, 'significant']
                   if is_sig:
                       drug_benefit = -fc if fc > 0 else 0
                       score += drug_benefit
                       affected_targets.append((target, fc))
                   if target in [mr[0] for mr in master_regulators[:5]]:
                       score += 2
           if score > 0:
               predictions.append({
                   'drug': drug,
                   'score': score,
                   'targets': affected_targets,
                   'mechanism': 'Inhibition of upregulated pathway'
               })
       return sorted(predictions, key=lambda x: x['score'], reverse=True)


class AIHypothesisEngine:
   def generate_comprehensive_report(self, omics_data, analysis_results):
       report = ["="*80, "ADVANCED MULTI-OMICS INTERPRETATION REPORT", "="*80, ""]
       trends = analysis_results['temporal']
       top_trends = sorted(trends.items(), key=lambda x: abs(x[1]['slope']), reverse=True)[:5]
       report.append("  TEMPORAL DYNAMICS ANALYSIS:")
       for gene, data in top_trends:
           direction = "↑ Increasing" if data['slope'] > 0 else "↓ Decreasing"
           report.append(f"  {gene}: {direction} (slope: {data['slope']:.3f})")
       report.append("n“🕸"  主调控因子(前 5 位):") 基因、分析结果中的数据['master_regs'][:5]:report.append(f" • {gene}:控制{data['sig_downstream']} 失调基因(FC:{数据['fc']:+.2f},影响:{数据['score']:.3f})") 报告.append("n🧬 rich path: ") path, data sorting (analysis_results['pathways'].items(), key=lambda x: x[1]['score']reverse=True):report.append(f"n ► {pathway} (score: {data['score']:.3f}, coherence: {data['coherence']:.2f})") report.append(f" Gene: {', '.join(data['genes'][:6])}") if data['metabolites']: report.append(f" Metabolites: {', '.join(data['metabolites'][:4])}") report.append("n🔗 Causal relationships of link_type, source, target, fc1, fc2 in Analysis_results (top 10): ")['causal'][:10]: report.append(f" {source} →[{link_type}]→ {target} (FC: {fc1:+.2f} → {fc2:+.2f})") report.append("n💊 Drug repurposing prediction: ") for pred in analysis_results['drugs'][:5]: report.append(f" • {pred['drug']} (score: {pred['score']:.2f})") report.append(f" Target: {', '.join([f'{t[0]}({t[1]:+.1f})' for t in pred['targets']])}") report.append("n🤖 Biological hypotheses generated by artificial intelligence:n") for i, hyp in enumerate(self._generate_advanced_hypotheses(analysis_results), 1): report.append(f"{i}. {hyp}n") report.append("="*80) return "n".join(report) def _generate_advanced_hypotheses(self, results): hypothesis = []
       path=result['pathways']
       If "glycolysis" is present in the pathway and "oxidative phosphorylation" is present in the pathway: gly = pathway['Glycolysis']['score']
           
           
           
           oxphos = pathway['Oxidative_Phosphorylation']['score']
           
           
           
           if gamma > oxphos * 1.5: hypothesis.append("Warburg effect detected: upregulation of aerobic glycolysis by oxidative phosphorylation inhibition suggests HIF1A-driven metabolic reprogramming.") if "Cell_Cycle_G1S" in pathway and "mTOR_Signaling" in pathway: hypothesis.append("Proliferation signaling: cell cycle activation of mTOR signaling suggests anabolic reprogramming; dual CDK4/6 and mTOR Suppression may work") if there are results.['master_regs']: top_mr = result['master_regs'][0]
           
           
           
           assume.append(f "upstream regulator: {top_mr[0]} control {top_mr[1]['sig_downstream']} Dysregulated genes; targeting this node propagates network-wide corrections. ") trend = result['temporal']
       Progressive = [g for g, d in trends.items() if abs(d['slope']) > 0.5]if len(progressive) > 5: Hypotheses.append( f"PROGRESSIVE DYSREGULATION: {len(progressive)} genes exhibit strong temporal changes indicating evolving pathology and benefit from early pathway intervention.") if 'HIF1_Signaling' in pathway: hypotheses.append( "Hypoxia Response: HIF1 Signaling suggests a hypoxic microenvironment; anti-angiogenic strategies may normalize perfusion. " ) If 'p53_Signaling' is present in the pathway: hypothesis.append( "Tumor suppressor loss: p53 pathway inhibition suggests that TP53 may benefit from MDM2 inhibition if it is wild-type." ) Return hypothesis if any other hypothesis is present. ["Complex multi-factorial dysregulation detected."]

We introduce drug repurposing and hypothesis generating agents. We score potential drugs based on the deregulation of their targets and the network importance of the affected genes, and then compile explanatory hypotheses linking pathway activity to possible interventions. The report generation engine summarizes these findings in a structured, readable format. Check The complete code is here.

Copy codeCopiedUse a different browser
def run_advanced_omics_interpretation():
   print("🧬 Initializing Advanced Multi-Agent Omics System...n")
   omics = AdvancedOmicsGenerator.generate_coherent_omics()
   print("“📊" 生成的多组学数据集") stat_agent = StatisticalAgent() control_samples = [c for c in omics.transcriptomics.columns if 'Control' in c]
   疾病样本= [c for c in omics.transcriptomics.columns if 'Disease' in c]
   diff_trans = stat_agent. Differential_analysis(omics.transcriptomics,control_samples,disease_samples) diff_prot = stat_agent. Differential_analysis(omics.proteomics,control_samples,disease_samples) diff_metab = stat_agent. Differential_analysis(omics.metabolomics,control_samples,disease_samples) 时间 = stat_agent.temporal_analysis(omics.transcriptomics, omics.metadata)network_agent = NetworkAnalysisAgent(GENE_INTERACTIONS)master_regs = network_agent.find_master_regulators(diff_trans)causal_links = network_agent.causal_inference(diff_trans,diff_prot,diff_metab)pathway_agent = PathwayEnrichmentAgent(PATHWAY_DB,GENE_INTERACTIONS)丰富= Pathology_weighted_enrichment(diff_trans,diff_metab,network_agent) drug_agent = DrugRepurifyingAgent(DRUG_TARGETS) drug_predictions = drug_agent.predict_drug_response(diff_trans,master_regs) 结果 = { 'temporal':temporal, 'master_regs':master_regs, 'causal':causal_links, 'pathways':丰富,'drugs':drug_predictions }假设_engine = AIHypothesisEngine()报告=假设_engine.generate_compressive_report(omics,结果)打印(报告)返回组学,结果如果__name__ ==“__main__”:omics_data,分析= run_advanced_omics_interpretation()

We orchestrate the entire workflow, running all agents in sequence and summarizing their results into a comprehensive report. We execute the pipeline end-to-end, from data generation to insight generation, validating whether each component contributes to the overall explanation. The final output provides an integrated multi-omics view with actionable insights.

In summary, this tutorial demonstrates how a structured, modular workflow can connect different layers of omics data into an interpretable analysis framework. By combining statistical inference, network topology, and biological context, we derive a comprehensive summary that highlights potential regulatory mechanisms and candidate therapeutic directions. The approach remains clear, data-driven, and applicable to both simulated and real multi-omics data sets.


Check The complete code is here. Please feel free to check out our GitHub page for tutorials, code, and notebooks. In addition, welcome to follow us twitter And don’t forget to join our 100k+ ML SubReddit and subscribe our newsletter. wait! Are you using Telegram? Now you can also join us via telegram.

The article “Leveraging pathway reasoning to build multi-agent systems for integrated transcriptomics, proteomics, and metabolomics data interpretation” appeared first on MarkTechPost.

You may also like...