Accelerate coding implementation of active learning annotations with Adala and Google Gemini

by admin · May 11, 2025

In this tutorial, we will learn how to leverage the Adala framework to build a modular active learning pipeline for medical symptoms classification. We first integrate Adala installation and verification with the required dependencies and then integrate Google Gemini as a custom annotator to classify symptoms into predefined medical domains. Through a simple three-media active learning cycle, prioritizing key symptoms such as chest pain, we will see how to select, annotate, and visualize classification confidence to gain practical insights into model behavior and Adala’s scalable architecture.

!pip install -q git+
!pip list | grep adala

We installed the latest Adala version directly from its GitHub repository. Meanwhile, the subsequent PIP List | GREP ADALA command scans any entry containing “Adala” in your environment package list to quickly confirm that the library has been installed successfully.

import sys
import os
print("Python path:", sys.path)
print("Checking if adala is in installed packages...")
!find /usr/local -name "*adala*" -type d | grep -v "__pycache__"




!git clone 
!ls -la Adala

We print out your current Python module search path and search for any installed “adala” folder in the /usr/local directory (excluding __pycache__) to verify the package. Next, it clips the Adala GitHub repository into your working directory and lists its contents so you can confirm that all source files are properly fetched.

import sys
sys.path.append('/content/Adala')

By appending the cloned Adala folder to SYS.Path, we tell Python to treat /content/Adala as an importable package directory. This ensures that subsequent imports of Adala… statements will be loaded directly from your local clone, not (or except) any installed version.

!pip install -q google-generativeai pandas matplotlib


import google.generativeai as genai
import pandas as pd
import json
import re
import numpy as np
import matplotlib.pyplot as plt
from getpass import getpass

We install the Google Generative AI SDK alongside data-analysis and plotting libraries (pandas and matplotlib), then import key modules, genai for interacting with Gemini, pandas for tabular data, json and re for parsing, numpy for numerical operations, matplotlib.pyplot for visualization, and getpass to prompt the user for their API key securely.

try:
    from Adala.adala.annotators.base import BaseAnnotator
    from Adala.adala.strategies.random_strategy import RandomStrategy
    from Adala.adala.utils.custom_types import TextSample, LabeledSample
    print("Successfully imported Adala components")
except Exception as e:
    print(f"Error importing: {e}")
    print("Falling back to simplified implementation...")

This attempt/except block attempts to load Adala’s core classes, Baseannotator, RandomStrategy, TextSample and tag samples so that we can take advantage of its built-in annotator and sampling strategy. On success, it confirms that the Adala component is available; if any import fails, it catches errors, prints exception messages, and gracefully returns a simpler implementation.

GEMINI_API_KEY = getpass("Enter your Gemini API Key: ")
genai.configure(api_key=GEMINI_API_KEY)

We can safely prompt you for the Gemini API key without echoing it to your laptop. We then use this key to configure the Google Generative AI client (Genai) to verify all subsequent calls.

CATEGORIES = ["Cardiovascular", "Respiratory", "Gastrointestinal", "Neurological"]


class GeminiAnnotator:
    def __init__(self, model_name="models/gemini-2.0-flash-lite", categories=None):
        self.model = genai.GenerativeModel(model_name=model_name,
                                          generation_config={"temperature": 0.1})
        self.categories = categories
       
    def annotate(self, samples):
        results = []
        for sample in samples:
            prompt = f"""Classify this medical symptom into one of these categories:
            {', '.join(self.categories)}.
            Return JSON format: {{"category": "selected_category",
            "confidence": 0.XX, "explanation": "brief_reason"}}
           
            SYMPTOM: {sample.text}"""
           
            try:
                response = self.model.generate_content(prompt).text
                json_match = re.search(r'({.*})', response, re.DOTALL)
                result = json.loads(json_match.group(1) if json_match else response)
               
                labeled_sample = type('LabeledSample', (), {
                    'text': sample.text,
                    'labels': result["category"],
                    'metadata': {
                        "confidence": result["confidence"],
                        "explanation": result["explanation"]
                    }
                })
            except Exception as e:
                labeled_sample = type('LabeledSample', (), {
                    'text': sample.text,
                    'labels': "unknown",
                    'metadata': {"error": str(e)}
                })
            results.append(labeled_sample)
        return results

We defined the list of medical categories and implemented the Geminiannotator class that wraps Google Gemini’s generative model for symptom classification. In its annotation method, it builds a return prompt for each text sample, parsing the model’s response into a structured tag, confidence score and interpretation, and wraps it in a lightweight tagged sample object and drops back to the “Unknown” tag if any errors occur.

sample_data = [
    "Chest pain radiating to left arm during exercise",
    "Persistent dry cough with occasional wheezing",
    "Severe headache with sensitivity to light",
    "Stomach cramps and nausea after eating",
    "Numbness in fingers of right hand",
    "Shortness of breath when climbing stairs"
]


text_samples = [type('TextSample', (), {'text': text}) for text in sample_data]


annotator = GeminiAnnotator(categories=CATEGORIES)
labeled_samples = []

We define the original symptom string list and wrap the list in each symptom string in a lightweight text sample object to pass it to the commenter. It then instantiates your Geminiannotator with a predefined set of categories and prepares an empty tag_samples list to store the results of the upcoming annotation iteration.

print("nRunning Active Learning Loop:")
for i in range(3):  
    print(f"n--- Iteration {i+1} ---")
   
    remaining = [s for s in text_samples if s not in [getattr(l, '_sample', l) for l in labeled_samples]]
    if not remaining:
        break
       
    scores = np.zeros(len(remaining))
    for j, sample in enumerate(remaining):
        scores[j] = 0.1
        if any(term in sample.text.lower() for term in ["chest", "heart", "pain"]):
            scores[j] += 0.5  
   
    selected_idx = np.argmax(scores)
    selected = [remaining[selected_idx]]
   
    newly_labeled = annotator.annotate(selected)
    for sample in newly_labeled:
        sample._sample = selected[0]  
    labeled_samples.extend(newly_labeled)
   
    latest = labeled_samples[-1]
    print(f"Text: {latest.text}")
    print(f"Category: {latest.labels}")
    print(f"Confidence: {latest.metadata.get('confidence', 0)}")
    print(f"Explanation: {latest.metadata.get('explanation', '')[:100]}...")

This active learning loop performs three iterations, filtering out labeled samples each time and assigning a base score of 0.1 to 0.1, promoting key symptoms such as “chest”, “heart”, “heart” or “pain” at 0.5 to prioritize key symptoms. It then selects the highest sample, calls Geminiannotator to generate categories, confidence and explanations, and prints these details for review.

categories = [s.labels for s in labeled_samples]
confidence = [s.metadata.get("confidence", 0) for s in labeled_samples]


plt.figure(figsize=(10, 5))
plt.bar(range(len(categories)), confidence, color="skyblue")
plt.xticks(range(len(categories)), categories, rotation=45)
plt.title('Classification Confidence by Category')
plt.tight_layout()
plt.show()

Finally, we extract the predicted category labels and their confidence scores and plot the vertical bar chart using matplotlib, where the height of each bar reflects the model’s confidence in the category. The category name rotates for readability, adds a title, and thict_layout() ensures that the chart elements are arranged neatly before displaying.

In summary, by combining Adala’s plug-in annotation and sampling strategy with Google Gemini’s generative power, we built a simplified workflow that iteratively improves the annotation quality of medical texts. This tutorial introduces Gemini mentors to install, set up, and customize, and demonstrates how to implement priority-based sampling and confidence visualization. With this foundation, you can easily exchange, expand category sets in other models, or integrate more advanced active learning strategies for larger, more complex annotation tasks.

Check COLAB notebook is here. All credits for this study are to the researchers on the project. Also, please feel free to follow us twitter And don’t forget to join us 90K+ ml reddit.

Here is a brief overview of what we built in Marktechpost:

Asif Razzaq is CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, ASIF is committed to harnessing the potential of artificial intelligence to achieve social benefits. His recent effort is to launch Marktechpost, an artificial intelligence media platform that has an in-depth coverage of machine learning and deep learning news that can sound both technically, both through technical voices and be understood by a wide audience. The platform has over 2 million views per month, demonstrating its popularity among its audience.