Facebook iconDSPy vs Normal Prompting: A Practical Comparison
F22 logo
Blogs/AI

DSPy vs Normal Prompting: A Practical Comparison

Written by Kiruthika
Feb 23, 2026
18 Min Read
DSPy vs Normal Prompting: A Practical Comparison Hero

When you build an AI agent that books flights, calls tools, or handles multi-step workflows, one question comes up quickly: how should you control the model?

Most developers use prompt engineering. You write detailed instructions, add examples, adjust wording, and test until it works. Sometimes it works well. Sometimes changing a single sentence breaks the entire workflow.

DSPy offers a different approach. Instead of manually crafting prompts, you define what the system should do, and the framework handles prompt generation and optimization for you.

To see the difference in practice, I built the same flight booking agent twice, once using traditional prompting and once using DSPy, with the same tools and model. The results revealed two very different ways of building LLM-powered agents.

Before diving into the experiment, let’s clarify what we’re comparing.

What is Normal Prompting?

Normal prompting is the traditional form of prompt engineering used with Large Language Models (LLMs). It relies on writing explicit instructions that define the task, expected behaviour, and output format.

In this approach, model performance depends heavily on prompt clarity and structure. Even small wording changes can affect reasoning, consistency, or tool usage in LLM agents.

Common Prompting Techniques in Traditional Prompt Engineering

Most prompt engineering falls into a few common patterns. The difference is mainly how much guidance you give the model before it responds.

Zero-Shot Prompting: Giving the model a task without any examples.

You are an airline customer service agent. Help users book flights.

Few-Shot Prompting: You include a couple of examples so the model copies the pattern, which is useful for consistent formatting and tool use.

You are an airline customer service agent.
Example 1:
User: "Book a flight to NYC"
Assistant: [calls fetch_flights tool] [calls book_flight tool]
Example 2:
User: "Find cheapest flight to LA"
Assistant: [calls fetch_flights tool] [compares prices] [calls book_flight tool]
Now respond to the next user request.

Chain-of-Thought (CoT): You ask the model to reason through the problem before answering. This often improves multi-step accuracy, but may be unnecessary for simple requests.

Think step by step and explain your reasoning before providing the final answer.

The Challenge with Normal Prompting

Normal prompting works well for simple, single-step tasks. However, its limitations become clear when building LLM agents that require multi-step reasoning, state management, or tool orchestration.

Infographic highlighting limitations of traditional prompt engineering such as brittle behavior, manual tool chaining, optimization difficulty, and structural complexity in multi-step LLM agents.

As workflows grow more complex, prompt engineering often introduces the following challenges:

  • Brittle behavior – Small wording changes can significantly alter outputs, reasoning paths, or tool usage, making systems unpredictable.
  • High manual effort – Multi-step workflows require carefully crafted instructions, examples, and repeated refinement to maintain consistency.
  • Difficult to optimize – Performance improvements rely on iterative prompt tweaks rather than structured optimisation methods.
  • Limited structural clarity – There is no formal way to declare inputs, outputs, or intermediate reasoning steps.
  • Manual tool orchestration – Passing outputs from one tool to another requires explicit workflow instructions and careful parameter handling.

In short, as task complexity increases, prompt engineering shifts from simple instruction-writing to fragile workflow management.

What is DSPy?

DSPy (Declarative Self-improving Language Programs) is a framework for building LLM-powered systems by treating prompting as a structured programming task rather than manual prompt writing.

Instead of crafting detailed instructions step by step, you define the task’s input–output behavior using structured signatures. DSPy then automatically generates, optimizes, and adapts the underlying prompts for the chosen language model.

Developed by researchers at Stanford, DSPy shifts the focus from prompt engineering to declarative design, making multi-step reasoning, tool orchestration, and agent workflows easier to manage and scale.

Core Concepts of DSPy

DSPy is built around four core abstractions that replace manual prompt engineering with structured, declarative design.

1. Signatures

Signatures define the input–output specification of a task.

Instead of writing detailed prompts, you declare what goes in and what comes out.
class BookFlight(dspy.Signature):
    """You are an airline customer service agent that helps users book and manage flights."""
question = dspy.InputField(desc="User's flight booking request")
    answer = dspy.OutputField(desc="Message summarizing the process, result, and information")

This shifts the focus from writing instructions to defining an interface. DSPy handles the underlying prompt construction.

2. Modules

DSPy provides reusable modules that implement common reasoning patterns:

  • Predict – Direct input-to-output generation
  • ChainOfThought – Adds structured reasoning steps
  • ReAct – Combines reasoning and tool usage
  • ProgramOfThought – Supports mathematical and symbolic reasoning

Modules make multi-step workflows composable and easier to manage.

3. Optimizers

One of DSPy’s defining features is automatic prompt optimization.

Instead of manually refining prompts, DSPy can:

  • Generate effective prompts from your signatures
  • Improve prompts using training examples
  • Adapt prompts across different language models
  • Compile your program into an optimized prompt configuration

This replaces trial-and-error prompt tuning with systematic optimization.

4. Automatic Prompt Generation

DSPy generates the actual prompts behind the scenes. When you combine signatures with modules like ReAct, DSPy automatically:

  • Constructs system-level instructions
  • Adds reasoning scaffolding
  • Manages tool-calling conventions
  • Preserves context across multiple steps
Innovations in AI
Exploring the future of artificial intelligence
Murtuza Kutub
Murtuza Kutub
Co-Founder, F22 Labs

Walk away with actionable insights on AI adoption.

Limited seats available!

Calendar
Saturday, 28 Feb 2026
10PM IST (60 mins)

The result is a structured LLM program rather than a fragile prompt.

Why DSPy is Different?

Traditional prompting follows an instruction-driven mindset:

“I need to tell the model exactly how to do this.”

Every step, tool call, and reasoning path must be explicitly described in the prompt.

DSPy takes a declarative approach:

“I need to define what I want, and let the system determine how to achieve it.”

Instead of treating prompts as static text, DSPy treats them as optimizable program parameters through LLM fine-tuning, similar to weights in a neural network. This shift from manual instruction-writing to structured, programmable design makes complex, multi-step agent workflows more stable, scalable, and easier to maintain.

The Experiment: Flight Booking Agent

To compare DSPy vs normal prompting in a controlled setting, I implemented the same flight booking agent using both approaches, with identical tools and the same language model.

The agent was required to:

  • Retrieve available flights
  • Select the optimal option (based on duration and price)
  • Book the selected flight for the user

Test scenario:“Please help me book a flight from SFO to JFK on September 9th. My name is Adam.”

Available tools:

  • fetch_flight_info – Retrieve matching flights
  • pick_flight – Select the best option
  • book_flight – Complete the booking
  • get_user_profile – Retrieve user information

Both implementations used GPT-4o mini, ensuring that any differences in behavior were due to orchestration strategy rather than model capability.

DSPy Implementation

The implementation below uses DSPy’s ReAct module to orchestrate tool calls for flight search and booking. We first define the data models used by the tools (date, user profile, flight, itinerary), then register the tools, and finally run the agent on the test prompt.

Setup and Data Models

Here's the actual implementation using DSPy:

from pydantic import BaseModel
import random
import string
import dspy
import os
from dotenv import load_dotenv
load_dotenv()
class Date(BaseModel):
    year: int
    month: int
    day: int
    hour: int
class UserProfile(BaseModel):
    user_id: str
    name: str
    email: str
class Flight(BaseModel):
    flight_id: str
    date_time: Date
    origin: str
    destination: str
    duration: float
    price: float
class Itinerary(BaseModel):
    confirmation_number: str
    user_profile: UserProfile
    flight: Flight
class Ticket(BaseModel):
    user_request: str
    user_profile: UserProfile
user_database = {
    "Adam": UserProfile(user_id="1", name="Adam", email="adam@gmail.com"),
    "Bob": UserProfile(user_id="2", name="Bob", email="bob@gmail.com"),
    "Chelsie": UserProfile(user_id="3", name="Chelsie", email="chelsie@gmail.com"),
    "David": UserProfile(user_id="4", name="David", email="david@gmail.com"),
}
flight_database = {
    "DA123": Flight(
        flight_id="DA123",  
        origin="SFO",
        destination="JFK",
        date_time=Date(year=2025, month=9, day=1, hour=1),
        duration=3,
        price=200,
    ),
    "DA125": Flight(
        flight_id="DA125",
        origin="SFO",
        destination="JFK",
        date_time=Date(year=2025, month=9, day=1, hour=7),
        duration=9,
        price=500,
    ),
    "DA456": Flight(
        flight_id="DA456",
        origin="SFO",
        destination="SNA",
        date_time=Date(year=2025, month=10, day=1, hour=1),
        duration=2,
        price=100,
    ),
    "DA460": Flight(
        flight_id="DA460",
        origin="SFO",
        destination="SNA",
        date_time=Date(year=2025, month=10, day=1, hour=9),
        duration=2,
        price=120,
    ),
}
itinery_database = {}
ticket_database = {}
def fetch_flight_info(date: Date, origin: str, destination: str):
    """Fetch flight information from origin to destination on the given date"""
    flights = []
    for flight_id, flight in flight_database.items():
        if (
            flight.date_time.year == date.year
            and flight.date_time.month == date.month
            and flight.date_time.day == date.day
            and flight.origin == origin
            and flight.destination == destination
        ):
            flights.append(flight)
    if len(flights) == 0:
        raise ValueError("No matching flight found!")
    return flights
def fetch_itinerary(confirmation_number: str):
    """Fetch a booked itinerary information from database"""
    return itinery_database.get(confirmation_number)
def pick_flight(flights: list[Flight]):
    """Pick up the best flight that matches users' request. we pick the shortest, and cheaper one on ties."""
    sorted_flights = sorted(
        flights,
        key=lambda x: (
            x.get("duration") if isinstance(x, dict) else x.duration,
            x.get("price") if isinstance(x, dict) else x.price,
        ),
    )
    return sorted_flights[0]
def _generate_id(length=8):
    chars = string.ascii_lowercase + string.digits
    return "".join(random.choices(chars, k=length))
def book_flight(flight: Flight, user_profile: UserProfile):
    """Book a flight on behalf of the user."""
    confirmation_number = _generate_id()
    while confirmation_number in itinery_database:
        confirmation_number = _generate_id()
    itinery_database[confirmation_number] = Itinerary(
        confirmation_number=confirmation_number,
        user_profile=user_profile,
        flight=flight,
    )
    return confirmation_number, itinery_database[confirmation_number]
def cancel_itinerary(confirmation_number: str, user_profile: UserProfile):
    """Cancel an itinerary on behalf of the user."""
    if confirmation_number in itinery_database:
        del itinery_database[confirmation_number]
        return
    raise ValueError("Cannot find the itinerary, please check your confirmation number.")
def get_user_info(name: str):
    """Fetch the user profile from database with given name."""
    return user_database.get(name)
def file_ticket(user_request: str, user_profile: UserProfile):
    """File a customer support ticket if this is something the agent cannot handle."""
    ticket_id = _generate_id(length=6)
    ticket_database[ticket_id] = Ticket(
        user_request=user_request,
        user_profile=user_profile,
    )
    return ticket_id
def add_numbers(a: int, b: int):
    """Adds two numbers"""
    addition = a + b
    return addition
class DSPyAirlineCustomerService(dspy.Signature):
    """You are an airline customer service agent that helps user book and manage flights.
    You are given a list of tools to handle user request, and you should decide the right tool to use in order to
    fulfill users' request."""
    user_request: str = dspy.InputField()
    process_result: str = dspy.OutputField(
        desc=(
                "Message that summarizes the process result, and the information users need, e.g., the "
                "confirmation_number if a new flight is booked."
            )
        )
    
agent = dspy.ReAct(
    DSPyAirlineCustomerService,
    tools = [
        fetch_flight_info,
        fetch_itinerary,
        pick_flight,
        add_numbers,
        book_flight,
        cancel_itinerary,
        get_user_info,
        file_ticket,
    ]
)
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
result = agent(user_request="please help me book a flight from SFO to JFK on 09/01/2025, my name is Adam")
print(result)

This is the complete DSPy implementation.

Notice what’s missing:

  • The signature only declares structured input and output, with no detailed prompt instructions.
  • There are no explicit workflow steps describing how tools should be chained.
  • No few-shot examples are required to guide tool orchestration.
  • Even an irrelevant tool (add_numbers) is safely ignored.

The ReAct module handles reasoning, tool selection, parameter passing, and execution flow automatically.

Instead of manually scripting the workflow, the developer defines the interface and available tools. DSPy determines how to coordinate them.

Results

The DSPy agent successfully completed the full booking workflow without requiring explicit step-by-step prompt instructions.

Thought 0: "I need to fetch available flights from SFO to JFK"

  • Called fetch_flight_info
  • Retrieved two flights: DA123 (3 hours, $200) and DA125 (9 hours)

Thought 1: "I need to book the selected flight for Adam, so I require Adam's profile"

  • Called user profile retrieval
  • Analyzed flight options

Thought 2: "The duration of flight DA123 is shorter, so I will choose that option"

  • Selected DA123
  • Completed booking

Final Output:

Process Result: Your flight has been booked successfully, Adam. Your confirmation number is iu8b8np1.
Reasoning: I have successfully booked a flight for Adam from SFO to JFK on 09/01/2025. The available options were evaluated, and the cheaper flight DA123, which costs $200 and has a duration of 3 hours, was selected. Adam's user profile information was retrieved to finalize the booking process, resulting in the successful generation of a confirmation number.
DSPy Output

The agent provided both reasoning and results - all with minimal prompting.

Normal Prompting Implementation

First Attempt: Basic Zero-Shot Prompt

To mirror the DSPy setup, I implemented the same flight booking agent using traditional prompt engineering with OpenAI’s function-calling interface.

The goal was to keep the setup equivalent:

  • Same tools
  • Same model (GPT-4o mini)
  • Same test scenario

The only difference was orchestration strategy.

Instead of declaring structured signatures, I wrote a simple system prompt instructing the model to act as an airline customer service agent and use the available tools.

from openai import OpenAI
import json
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
# Define tools in OpenAI function calling format
TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "get_user_info",
            "description": "Fetch the user profile from database with given name.",
            "parameters": {
                "type": "object",
                "properties": {
                    "name": {"type": "string", "description": "The user's name"}
                },
                "required": ["name"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "fetch_flight_info",
            "description": "Fetch flight information from origin to destination on the given date",
            "parameters": {
                "type": "object",
                "properties": {
                    "date": {
                        "type": "object",
                        "properties": {
                            "year": {"type": "integer"},
                            "month": {"type": "integer"},
                            "day": {"type": "integer"},
                            "hour": {"type": "integer"},
                        },
                        "required": ["year", "month", "day", "hour"],
                    },
                    "origin": {"type": "string"},
                    "destination": {"type": "string"},
                },
                "required": ["date", "origin", "destination"],
            },
        },
    },
    # ... pick_flight, book_flight, etc.
]
# Simple system prompt (similar to DSPy)
system_prompt = "You are an airline customer service agent that helps users book and manage flights. Use the required tools."
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "please help me book a flight from SFO to JFK on 09/01/2025, my name is Adam"},
]
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=TOOLS,
)

Result: Failed

The agent:

  • Fetched user info correctly
  • Fetched flight info correctly
  • Failed to pass arguments between pick_flight and book_flight
  • Returned: "It seems like there was an issue while trying to select and book the flight. However, I can still help you with the available flights."

Second Attempt: Enhanced Prompt with Explicit Instructions

After the failure, I had to create a much more detailed system prompt with explicit step-by-step instructions:

system_prompt = """You are an airline customer service agent that helps users book and manage flights.
CRITICAL: When calling functions, you MUST pass the output from previous functions as arguments to subsequent functions.
Follow this exact workflow for booking:
1. Call get_user_info(name="<user's name>") to get the user profile
2. Call fetch_flight_info(date={...}, origin="...", destination="...") to get available flights
3. Call pick_flight(flights=<the array returned from step 2>) - IMPORTANT: Pass the FULL flights array from step 2
4. Call book_flight(flight=<the flight object from step 3>, user_profile=<the user profile from step 1>)
5. Provide the confirmation number to the user
IMPORTANT RULES:
- When you call pick_flight, you MUST pass the "flights" parameter with the complete array of flights you received from fetch_flight_info
- When you call book_flight, you MUST pass BOTH the "flight" parameter (from pick_flight) AND the "user_profile" parameter (from get_user_info)
- Do NOT call functions with empty parameters {}
- Use the exact objects returned from previous function calls
Example correct flow:
1. get_user_info returns: {"user_id": "1", "name": "Adam", "email": "adam@gmail.com"}
2. fetch_flight_info returns: [{"flight_id": "DA123", "date_time": {...}, ...}, {...}]
3. pick_flight(flights=[...]) - pass the FULL array from step 2
4. book_flight(flight={...}, user_profile={...}) - pass both objects from previous calls
"""

The difference is stark:

  • Explicit workflow numbered steps (1-5)
  • Multiple "CRITICAL" and "IMPORTANT" warnings
  • Detailed parameter passing instructions
  • Concrete examples showing exact data flow
  • Manual iteration loop to handle multi-step tool calling
  • Significantly more code to handle the orchestration

Results

Zero-Shot Attempt: Failed

Under zero-shot prompting, the agent partially executed the workflow but failed to complete the booking process.

The agent:

  • Successfully called get_user_info and retrieved Adam's profile
  • Successfully called fetch_flight_info and got two flights (DA123 and DA125)
  • Failed to call pick_flight - didn't pass the flights array as an argument
  • Failed to call book_flight - missing required parameters
  • Returned: "It seems like there was an issue while trying to select and book the flight. However, I can still help you with the available flights."
Normal Prompting result

Enhanced Prompt Attempt: Success

Innovations in AI
Exploring the future of artificial intelligence
Murtuza Kutub
Murtuza Kutub
Co-Founder, F22 Labs

Walk away with actionable insights on AI adoption.

Limited seats available!

Calendar
Saturday, 28 Feb 2026
10PM IST (60 mins)

With the detailed instructions, the agent properly executed:

  1. Called get_user_info("Adam") → Got user profile
  2. Called fetch_flight_info(...) → Retrieved flights DA123 and DA125
  3. Called pick_flight(flights=[...]) → Selected DA123 (3 hours, $200)
  4. Called book_flight(flight=DA123, user_profile=Adam) → Generated confirmation number
  5. Provided clear summary to user with booking details
Normal Prompting output

Key insight: The same model required 10x more prompt engineering to achieve what DSPy did automatically.

Key Findings from the DSPy vs Normal Prompting Comparison

The comparison reveals a clear pattern in how each approach performs under different levels of complexity.

When DSPy Excels

DSPy performs best in structured, multi-step agent workflows:

  • Multi-step processes requiring reliable tool chaining
  • Complex workflows with multiple decision points
  • Minimal prompt engineering, works effectively out of the box
  • Built-in reasoning support through modules like ReAct and ChainOfThought
  • Automatic coordination of tools and parameter passing

When Normal Prompting Works Best

  • Single-shot tasks (simple Q&A)
  • When you need fine-grained control over prompt structure
  • Simpler use cases that don't require tool orchestration

However, normal prompting requires significantly more effort when:

  • Multiple tools must be coordinated
  • Outputs from one step feed into another
  • You need transparent reasoning traces

The comparison reveals a clear pattern:

AspectDSPyNormal Prompting

Initial complexity

Low

Low

Multi-step tasks

Excellent (minimal setup)

Requires extensive prompt engineering

Prompt length

Very short

Much longer for complex tasks

Reasoning transparency

Built-in trajectory

Manual implementation needed

Tool coordination

Automatic

Manual instruction required

Initial complexity

DSPy

Low

Normal Prompting

Low

1 of 5

Frequently Asked Questions (FAQ)

1. What is the main difference between DSPy and normal prompting?

The main difference lies in orchestration strategy.
Normal prompting relies on manually written instructions to guide the model step by step. DSPy uses declarative signatures and structured modules to automatically generate and optimize prompts for multi-step workflows.

2. Is DSPy better than traditional prompt engineering?

DSPy performs better in complex, multi-step LLM agent workflows that require reliable tool coordination and reasoning chains.
Traditional prompting works well for simple, single-step tasks where detailed manual control is sufficient.

3. When should I use normal prompting instead of DSPy?

Use normal prompting when:

  • The task is single-shot (Q&A, summarization, classification)
  • No tool chaining is required
  • The workflow is simple and predictable
  • You need tight wording control

For scalable agent systems, DSPy is typically more stable.

4. How does DSPy handle tool orchestration differently?

In traditional prompting, developers must explicitly instruct how tool outputs are passed between steps.
DSPy modules like ReAct automatically manage tool selection, parameter passing, and reasoning flow without requiring detailed workflow instructions.

5. Why does normal prompting fail in multi-step LLM agents?

Normal prompting becomes brittle because:

  • Tool outputs are not reliably passed forward
  • Small wording changes alter behavior
  • There is no formal structure for the reasoning stages
  • Manual orchestration increases complexity

As workflow depth increases, instability increases.

6. Does DSPy eliminate prompt engineering completely?

No. DSPy reduces manual prompt writing but shifts the effort toward defining structured input–output signatures and selecting reasoning modules.
It replaces fragile prompt tweaking with structured program design.

7. Is DSPy suitable for production-grade AI agents?

Yes. DSPy is well-suited for production environments that require:

  • Multi-step reasoning
  • Reliable tool chaining
  • Declarative workflow design
  • Prompt optimization across models
  • Maintainable LLM systems

It reduces brittle behavior in complex orchestration scenarios.

Conclusion

DSPy and traditional prompting represent two different ways of building with LLMs.

Prompt engineering gives you direct control. It works well for simple tasks and single-shot interactions. When you understand exactly how the model should behave, a well-written prompt is often enough.

DSPy takes a different approach. Instead of manually crafting instructions, you define the task structure and let the framework handle reasoning, tool orchestration, and optimization. In multi-step workflows, this reduces brittle prompt tuning and makes complex agents easier to manage.

In the flight booking experiment, the difference was clear. What required detailed, step-by-step instructions in traditional prompting worked with minimal setup in DSPy. As workflows grow more complex, that gap becomes more noticeable.

The choice depends on the problem. Use traditional prompting for simple, well-defined tasks. Use DSPy when coordinating multiple tools, handling multi-step reasoning, or scaling agent behavior across models.

As LLM systems mature, the shift from manual prompt crafting toward structured, declarative design is likely to accelerate. DSPy reflects that direction.

Author-Kiruthika
Kiruthika

I'm an AI/ML engineer passionate about developing cutting-edge solutions. I specialize in machine learning techniques to solve complex problems and drive innovation through data-driven insights.

Share this article

Phone

Next for you

How to Calculate GPU Requirements for LLM Inference? Cover

AI

Feb 23, 20269 min read

How to Calculate GPU Requirements for LLM Inference?

If you’ve ever tried running a large language model on a CPU, you already know the pain. It works, but the latency feels unbearable. This usually leads to the obvious question:          “If my CPU can run the model, why do I even need a GPU?” The short answer is performance. The long answer is what this blog is about. Understanding GPU requirements for LLM inference is not about memorizing hardware specs. It’s about understanding where memory goes, what limits throughput, and how model choice

Map Reduce for Large Document Summarization with LLMs Cover

AI

Feb 23, 20268 min read

Map Reduce for Large Document Summarization with LLMs

LLMs are exceptionally good at understanding and generating text, but they struggle when documents grow large. Movies script, policy PDFs, books, and research papers quickly exceed a model’s context window, resulting in incomplete summaries, missing sections, or higher latency. When it’s tempting to assume that increasing context length solves this problem, real-world usage shows hits different. Larger contexts increase cost, latency, and instability, and still do not guarantee full coverage.

Pre-Chunking vs Post-Chunking in RAG Systems Cover

AI

Feb 24, 202615 min read

Pre-Chunking vs Post-Chunking in RAG Systems

Ever wondered why your RAG chatbot returns inconsistent or incomplete answers even when your embeddings and vector database look solid? I faced this exact challenge while refining a Retrieval-Augmented Generation (RAG) pipeline, and the root cause wasn’t the model or retrieval layer; it was chunking. Chunking determines how documents are split before they are embedded and retrieved, and that single architectural decision directly impacts answer quality, latency, and infrastructure cost. Pre-ch