0

Langgraph Tutorial: A Step-by-Step Guide to Creating a Text Analysis Pipeline

Estimated reading time: 5 minute

Introduction to Langgraph

Langgraph is a powerful framework for Langchain, designed to create state multi-actor applications using LLMS. It provides the structure and tools needed to build complex AI agents through graph-based methods.

Think of Langgraph as the drafting table for architects – it provides us with tools for how design agents will think and act. Just as architects draw blueprints showing how different rooms connect and how people flow through buildings, Langgraph also allowed us to design how different functions will connect and how information flows through our agents.

Key Features:

  • National Administration: Stay persistent in interaction
  • Flexible routing: Define composite logistics between components
  • Persistence: Save and restore workflow
  • Visualization: View and understand the structure of your agent

In this tutorial, we will demonstrate Langgraph by building a multi-step text analysis pipeline that processes text in three phases:

  1. Text classification: Classify input text into predefined categories
  2. Entity Extraction: Identify key entities from text
  3. Text summary: Generate a concise summary of the input text

The pipeline shows how to use langgraph to create modular, scalable workflows for natural language processing tasks.

Build our environment

Before we dive into the code, let’s build our development environment.

Install

# Install required packages
!pip install langgraph langchain langchain-openai python-dotenv

Set API keys

We need an OpenAI API key to use their model. If you haven’t already, you can get one from it

Check The complete code is here

import os
from dotenv import load_dotenv

# Load environment variables from .env file (create this with your API key)
load_dotenv()

# Set OpenAI API key
os.environ["OPENAI_API_KEY"] = os.getenv('OPENAI_API_KEY')

Test our settings

Let’s make sure our environment works properly by creating a simple test using the OpenAI model:

from langchain_openai import ChatOpenAI

# Initialize the ChatOpenAI instance
llm = ChatOpenAI(model="gpt-4o-mini")

# Test the setup
response = llm.invoke("Hello! Are you working?")
print(response.content)

Build our text analysis pipeline

Now let’s import the necessary packages for our langgraph text analysis pipeline:

import os
from typing import TypedDict, List, Annotated
from langgraph.graph import StateGraph, END
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain.schema import HumanMessage
from langchain_core.runnables.graph import MermaidDrawMethod
from IPython.display import display, Image

Memories of design agents

Just as human intelligence requires memory, our agents need a way to track information. We use TypedDict to define our state structure: View The complete code is here

class State(TypedDict):
    text: str
    classification: str
    entities: List[str]
    summary: str

# Initialize our language model with temperature=0 for more deterministic outputs
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

The core features of creating an agent

Now we will create the actual skills that the agent will use. Each of these functions is a function that performs a specific type of analysis. Check The complete code is here

1. Classification nodes

def classification_node(state: State):
    '''Classify the text into one of the categories: News, Blog, Research, or Other'''
    prompt = PromptTemplate(
        input_variables=["text"],
        template="Classify the following text into one of the categories: News, Blog, Research, or Other.nnText:{text}nnCategory:"
    )
    message = HumanMessage(content=prompt.format(text=state["text"]))
    classification = llm.invoke([message]).content.strip()
    return {"classification": classification}

2. Entity extraction node

def entity_extraction_node(state: State):
    '''Extract all the entities (Person, Organization, Location) from the text'''
    prompt = PromptTemplate(
        input_variables=["text"],
        template="Extract all the entities (Person, Organization, Location) from the following text. Provide the result as a comma-separated list.nnText:{text}nnEntities:"
    )
    message = HumanMessage(content=prompt.format(text=state["text"]))
    entities = llm.invoke([message]).content.strip().split(", ")
    return {"entities": entities}

3. Summary node

def summarization_node(state: State):
    '''Summarize the text in one short sentence'''
    prompt = PromptTemplate(
        input_variables=["text"],
        template="Summarize the following text in one short sentence.nnText:{text}nnSummary:"
    )
    message = HumanMessage(content=prompt.format(text=state["text"]))
    summary = llm.invoke([message]).content.strip()
    return {"summary": summary}

Put all of this together

Now is the most exciting part – connect these features to the coordination system using Langgraph:

Check The complete code is here

# Create our StateGraph
workflow = StateGraph(State)

# Add nodes to the graph
workflow.add_node("classification_node", classification_node)
workflow.add_node("entity_extraction", entity_extraction_node)
workflow.add_node("summarization", summarization_node)

# Add edges to the graph
workflow.set_entry_point("classification_node")  # Set the entry point of the graph
workflow.add_edge("classification_node", "entity_extraction")
workflow.add_edge("entity_extraction", "summarization")
workflow.add_edge("summarization", END)

# Compile the graph
app = workflow.compile()

Workflow structure: Our pipeline follows this path:
classification_node → entity_extraction → summary → end

Test our agent

Now that we have established the proxy, let’s see how it can be performed using real-world text examples:

Check The complete code is here

sample_text = """ OpenAI has announced the GPT-4 model, which is a large multimodal model that exhibits human-level performance on various professional benchmarks. It is developed to improve the alignment and safety of AI systems. Additionally, the model is designed to be more efficient and scalable than its predecessor, GPT-3. The GPT-4 model is expected to be released in the coming months and will be available to the public for research and development purposes. """ 
state_input = {"text": sample_text} 
result = app.invoke(state_input) 
print("Classification:", result["classification"]) 
print("nEntities:", result["entities"]) 
print("nSummary:", result["summary"])
Classification: News Entities: ['OpenAI', 'GPT-4', 'GPT-3'] Summary: OpenAI's upcoming GPT-4 model is a multimodal AI that aims for human-level performance and improved safety, efficiency, and scalability compared to GPT-3.

Understand the power of coordinated processing

What makes this result particularly impressive is not just a single output, but each step establishes a complete understanding of the text on the other steps.

  • this Classification Provides context that helps structure our understanding of text types
  • this Entity Extraction Identify important names and concepts
  • this summary Refining the nature of a document

This reflects human reading comprehension, and we naturally form an understanding of what kind of text it is, pay attention to important names and concepts, and form a psychological summary while maintaining the relationship between these different aspects of understanding.

Try your own text

Now let’s try our pipeline with another text example:

Check The complete code is here

# Replace this with your own text to analyze your_text = """ The recent advancements in quantum computing have opened new possibilities for cryptography and data security. Researchers at MIT and Google have demonstrated quantum algorithms that could potentially break current encryption methods. However, they are also developing new quantum-resistant encryption techniques to protect data in the future. """ 

# Process the text through our pipeline your_result = app.invoke({"text": your_text}) print("Classification:", your_result["classification"]) 

print("nEntities:", your_result["entities"]) 
print("nSummary:", your_result["summary"])

Classification: Research Entities: ['MIT', 'Google'] Summary: Recent advancements in quantum computing may threaten current encryption methods while also prompting the development of new quantum-resistant techniques.

Add more features (advanced)

One of the powerful aspects of Langgraph is that we can easily extend the proxy with new features. Let’s add a sentiment analysis node to our pipeline:

Check The complete code is here

# First, let's update our State to include sentiment
class EnhancedState(TypedDict):
    text: str
    classification: str
    entities: List[str]
    summary: str
    sentiment: str

# Create our sentiment analysis node
def sentiment_node(state: EnhancedState):
    '''Analyze the sentiment of the text: Positive, Negative, or Neutral'''
    prompt = PromptTemplate(
        input_variables=["text"],
        template="Analyze the sentiment of the following text. Is it Positive, Negative, or Neutral?nnText:{text}nnSentiment:"
    )
    message = HumanMessage(content=prompt.format(text=state["text"]))
    sentiment = llm.invoke([message]).content.strip()
    return {"sentiment": sentiment}

# Create a new workflow with the enhanced state
enhanced_workflow = StateGraph(EnhancedState)

# Add the existing nodes
enhanced_workflow.add_node("classification_node", classification_node)
enhanced_workflow.add_node("entity_extraction", entity_extraction_node)
enhanced_workflow.add_node("summarization", summarization_node)

# Add our new sentiment node
enhanced_workflow.add_node("sentiment_analysis", sentiment_node)

# Create a more complex workflow with branches
enhanced_workflow.set_entry_point("classification_node")
enhanced_workflow.add_edge("classification_node", "entity_extraction")
enhanced_workflow.add_edge("entity_extraction", "summarization")
enhanced_workflow.add_edge("summarization", "sentiment_analysis")
enhanced_workflow.add_edge("sentiment_analysis", END)

# Compile the enhanced graph
enhanced_app = enhanced_workflow.compile()

Test enhancer

# Try the enhanced pipeline with the same text
enhanced_result = enhanced_app.invoke({"text": sample_text})

print("Classification:", enhanced_result["classification"])
print("nEntities:", enhanced_result["entities"])
print("nSummary:", enhanced_result["summary"])
print("nSentiment:", enhanced_result["sentiment"])
Classification: News

Entities: ['OpenAI', 'GPT-4', 'GPT-3']

Summary: OpenAI's upcoming GPT-4 model is a multimodal AI that aims for human-level performance and improved safety, efficiency, and scalability compared to GPT-3.

Sentiment: The sentiment of the text is Positive. It highlights the advancements and improvements of the GPT-4 model, emphasizing its human-level performance, efficiency, scalability, and the positive implications for AI alignment and safety. The anticipation of its release for public use further contributes to the positive tone.

Add conditional edges (advanced logic)

Why have conditions to be at the edge?

So far, our graph follows a fixed linear path: Classification → Entity_extraction → Summary → (Emotion)

But in the real world, we usually just want to run certain steps when needed. For example:

  • Extract entities only when the text is news or research articles
  • Skip the summary if the text is short
  • Add custom processing to blog posts

langgraph makes it easy by dynamically routing logic gates executed by conditional edge-logic gates (based on data in the current state).

Check The complete code is here

Create routing function

# Route after classification
def route_after_classification(state: EnhancedState) -> str:
    category = state["classification"].lower() # returns: "news", "blog", "research", "other"
    return category in ["news", "research"]

Define the condition diagram

from langgraph.graph import StateGraph, END

conditional_workflow = StateGraph(EnhancedState)

# Add nodes
conditional_workflow.add_node("classification_node", classification_node)
conditional_workflow.add_node("entity_extraction", entity_extraction_node)
conditional_workflow.add_node("summarization", summarization_node)
conditional_workflow.add_node("sentiment_analysis", sentiment_node)

# Set entry point
conditional_workflow.set_entry_point("classification_node")

# Add conditional edge
conditional_workflow.add_conditional_edges("classification_node", route_after_classification, path_map={
    True: "entity_extraction",
    False: "summarization"
})

# Add remaining static edges
conditional_workflow.add_edge("entity_extraction", "summarization")
conditional_workflow.add_edge("summarization", "sentiment_analysis")
conditional_workflow.add_edge("sentiment_analysis", END)

# Compile
conditional_app = conditional_workflow.compile()

Test condition pipeline

test_text = """
OpenAI released the GPT-4 model with enhanced performance on academic and professional tasks. It's seen as a major breakthrough in alignment and reasoning capabilities.
"""

result = conditional_app.invoke({"text": test_text})

print("Classification:", result["classification"])
print("Entities:", result.get("entities", "Skipped"))
print("Summary:", result["summary"])
print("Sentiment:", result["sentiment"])
Classification: News
Entities: ['OpenAI', 'GPT-4']
Summary: OpenAI's GPT-4 model significantly improves performance in academic and professional tasks, marking a breakthrough in alignment and reasoning.
Sentiment: The sentiment of the text is Positive. It highlights the release of the GPT-4 model as a significant advancement, emphasizing its enhanced performance and breakthrough capabilities.

Check The complete code is here

Now try using the blog:

blog_text = """
Here's what I learned from a week of meditating in silence. No phones, no talking—just me, my breath, and some deep realizations.
"""

result = conditional_app.invoke({"text": blog_text})

print("Classification:", result["classification"])
print("Entities:", result.get("entities", "Skipped (not applicable)"))
print("Summary:", result["summary"])
print("Sentiment:", result["sentiment"])
Classification: Blog
Entities: Skipped (not applicable)
Summary: A week of silent meditation led to profound personal insights.
Sentiment: The sentiment of the text is Positive. The mention of "deep realizations" and the overall reflective nature of the experience suggests a beneficial and enlightening outcome from the meditation practice.

With conditional edges, our agent can now:

  • Make decisions based on context
  • Skip unnecessary steps
  • Run faster
  • Be smarter

in conclusion

In this tutorial, we have:

  1. Exploring Langgraph concept and its graph-based approach
  2. A text processing pipeline is built that includes classification, entity extraction and summary
  3. Enhance our pipeline with other features
  4. Introduce conditional edges to dynamically control flow based on classification results
  5. Visualize our workflow
  6. Tested our agents with real-world text examples

Langgraph provides a powerful framework for creating AI agents by modeling AI agents as functional graphs. This approach makes it easy to design, modify and scale complex AI systems.

Next step

  • Add more nodes to extend the capabilities of your agent
  • Try different LLMs and parameters
  • Explore the state persistence characteristics of Langgraph for ongoing conversations

Check The complete code is here. All credits for this study are to the researchers on the project. Also, please stay tuned for us twitter And don’t forget to join us 100K+ ml reddit And subscribe Our newsletter.

You may also like NVIDIA’s open source cosmic diffuser [Check it now]


NIR Diamant is an AI researcher, algorithm developer and expert at Genai, with more than a decade of experience in AI research and algorithms. His open source project has received millions of views, over 500,000 views per month and over 500,000 stars on Github, making him a leading voice in the AI community.

Through work on Github and Diamantai newsletters, NIR has helped millions improve their AI skills with practical guides and tutorials.