
When you build an AI agent that books flights, calls tools, or handles multi-step workflows, one question comes up quickly: how should you control the model?
Most developers use prompt engineering. You write detailed instructions, add examples, adjust wording, and test until it works. Sometimes it works well. Sometimes changing a single sentence breaks the entire workflow.
DSPy offers a different approach. Instead of manually crafting prompts, you define what the system should do, and the framework handles prompt generation and optimization for you.
To see the difference in practice, I built the same flight booking agent twice, once using traditional prompting and once using DSPy, with the same tools and model. The results revealed two very different ways of building LLM-powered agents.
Before diving into the experiment, let’s clarify what we’re comparing.
Normal prompting is the traditional form of prompt engineering used with Large Language Models (LLMs). It relies on writing explicit instructions that define the task, expected behaviour, and output format.
In this approach, model performance depends heavily on prompt clarity and structure. Even small wording changes can affect reasoning, consistency, or tool usage in LLM agents.
Most prompt engineering falls into a few common patterns. The difference is mainly how much guidance you give the model before it responds.
Zero-Shot Prompting: Giving the model a task without any examples.
You are an airline customer service agent. Help users book flights.Few-Shot Prompting: You include a couple of examples so the model copies the pattern, which is useful for consistent formatting and tool use.
You are an airline customer service agent.Example 1:
User: "Book a flight to NYC"
Assistant: [calls fetch_flights tool] [calls book_flight tool]Example 2:
User: "Find cheapest flight to LA"
Assistant: [calls fetch_flights tool] [compares prices] [calls book_flight tool]Now respond to the next user request.Chain-of-Thought (CoT): You ask the model to reason through the problem before answering. This often improves multi-step accuracy, but may be unnecessary for simple requests.
Think step by step and explain your reasoning before providing the final answer.
Normal prompting works well for simple, single-step tasks. However, its limitations become clear when building LLM agents that require multi-step reasoning, state management, or tool orchestration.

As workflows grow more complex, prompt engineering often introduces the following challenges:
In short, as task complexity increases, prompt engineering shifts from simple instruction-writing to fragile workflow management.
DSPy (Declarative Self-improving Language Programs) is a framework for building LLM-powered systems by treating prompting as a structured programming task rather than manual prompt writing.
Instead of crafting detailed instructions step by step, you define the task’s input–output behavior using structured signatures. DSPy then automatically generates, optimizes, and adapts the underlying prompts for the chosen language model.
Developed by researchers at Stanford, DSPy shifts the focus from prompt engineering to declarative design, making multi-step reasoning, tool orchestration, and agent workflows easier to manage and scale.
DSPy is built around four core abstractions that replace manual prompt engineering with structured, declarative design.
Signatures define the input–output specification of a task.
Instead of writing detailed prompts, you declare what goes in and what comes out.
class BookFlight(dspy.Signature):
"""You are an airline customer service agent that helps users book and manage flights."""question = dspy.InputField(desc="User's flight booking request")
answer = dspy.OutputField(desc="Message summarizing the process, result, and information")This shifts the focus from writing instructions to defining an interface. DSPy handles the underlying prompt construction.
DSPy provides reusable modules that implement common reasoning patterns:
Modules make multi-step workflows composable and easier to manage.
One of DSPy’s defining features is automatic prompt optimization.
Instead of manually refining prompts, DSPy can:
This replaces trial-and-error prompt tuning with systematic optimization.
DSPy generates the actual prompts behind the scenes. When you combine signatures with modules like ReAct, DSPy automatically:
Walk away with actionable insights on AI adoption.
Limited seats available!
The result is a structured LLM program rather than a fragile prompt.
Traditional prompting follows an instruction-driven mindset:
“I need to tell the model exactly how to do this.”
Every step, tool call, and reasoning path must be explicitly described in the prompt.
DSPy takes a declarative approach:
“I need to define what I want, and let the system determine how to achieve it.”
Instead of treating prompts as static text, DSPy treats them as optimizable program parameters through LLM fine-tuning, similar to weights in a neural network. This shift from manual instruction-writing to structured, programmable design makes complex, multi-step agent workflows more stable, scalable, and easier to maintain.
To compare DSPy vs normal prompting in a controlled setting, I implemented the same flight booking agent using both approaches, with identical tools and the same language model.
The agent was required to:
Test scenario:“Please help me book a flight from SFO to JFK on September 9th. My name is Adam.”
Available tools:
Both implementations used GPT-4o mini, ensuring that any differences in behavior were due to orchestration strategy rather than model capability.
The implementation below uses DSPy’s ReAct module to orchestrate tool calls for flight search and booking. We first define the data models used by the tools (date, user profile, flight, itinerary), then register the tools, and finally run the agent on the test prompt.
Here's the actual implementation using DSPy:
from pydantic import BaseModel
import random
import string
import dspy
import os
from dotenv import load_dotenv
load_dotenv()
class Date(BaseModel):
year: int
month: int
day: int
hour: int
class UserProfile(BaseModel):
user_id: str
name: str
email: str
class Flight(BaseModel):
flight_id: str
date_time: Date
origin: str
destination: str
duration: float
price: float
class Itinerary(BaseModel):
confirmation_number: str
user_profile: UserProfile
flight: Flight
class Ticket(BaseModel):
user_request: str
user_profile: UserProfile
user_database = {
"Adam": UserProfile(user_id="1", name="Adam", email="adam@gmail.com"),
"Bob": UserProfile(user_id="2", name="Bob", email="bob@gmail.com"),
"Chelsie": UserProfile(user_id="3", name="Chelsie", email="chelsie@gmail.com"),
"David": UserProfile(user_id="4", name="David", email="david@gmail.com"),
}
flight_database = {
"DA123": Flight(
flight_id="DA123",
origin="SFO",
destination="JFK",
date_time=Date(year=2025, month=9, day=1, hour=1),
duration=3,
price=200,
),
"DA125": Flight(
flight_id="DA125",
origin="SFO",
destination="JFK",
date_time=Date(year=2025, month=9, day=1, hour=7),
duration=9,
price=500,
),
"DA456": Flight(
flight_id="DA456",
origin="SFO",
destination="SNA",
date_time=Date(year=2025, month=10, day=1, hour=1),
duration=2,
price=100,
),
"DA460": Flight(
flight_id="DA460",
origin="SFO",
destination="SNA",
date_time=Date(year=2025, month=10, day=1, hour=9),
duration=2,
price=120,
),
}
itinery_database = {}
ticket_database = {}
def fetch_flight_info(date: Date, origin: str, destination: str):
"""Fetch flight information from origin to destination on the given date"""
flights = []
for flight_id, flight in flight_database.items():
if (
flight.date_time.year == date.year
and flight.date_time.month == date.month
and flight.date_time.day == date.day
and flight.origin == origin
and flight.destination == destination
):
flights.append(flight)
if len(flights) == 0:
raise ValueError("No matching flight found!")
return flights
def fetch_itinerary(confirmation_number: str):
"""Fetch a booked itinerary information from database"""
return itinery_database.get(confirmation_number)
def pick_flight(flights: list[Flight]):
"""Pick up the best flight that matches users' request. we pick the shortest, and cheaper one on ties."""
sorted_flights = sorted(
flights,
key=lambda x: (
x.get("duration") if isinstance(x, dict) else x.duration,
x.get("price") if isinstance(x, dict) else x.price,
),
)
return sorted_flights[0]
def _generate_id(length=8):
chars = string.ascii_lowercase + string.digits
return "".join(random.choices(chars, k=length))
def book_flight(flight: Flight, user_profile: UserProfile):
"""Book a flight on behalf of the user."""
confirmation_number = _generate_id()
while confirmation_number in itinery_database:
confirmation_number = _generate_id()
itinery_database[confirmation_number] = Itinerary(
confirmation_number=confirmation_number,
user_profile=user_profile,
flight=flight,
)
return confirmation_number, itinery_database[confirmation_number]
def cancel_itinerary(confirmation_number: str, user_profile: UserProfile):
"""Cancel an itinerary on behalf of the user."""
if confirmation_number in itinery_database:
del itinery_database[confirmation_number]
return
raise ValueError("Cannot find the itinerary, please check your confirmation number.")
def get_user_info(name: str):
"""Fetch the user profile from database with given name."""
return user_database.get(name)
def file_ticket(user_request: str, user_profile: UserProfile):
"""File a customer support ticket if this is something the agent cannot handle."""
ticket_id = _generate_id(length=6)
ticket_database[ticket_id] = Ticket(
user_request=user_request,
user_profile=user_profile,
)
return ticket_id
def add_numbers(a: int, b: int):
"""Adds two numbers"""
addition = a + b
return addition
class DSPyAirlineCustomerService(dspy.Signature):
"""You are an airline customer service agent that helps user book and manage flights.
You are given a list of tools to handle user request, and you should decide the right tool to use in order to
fulfill users' request."""
user_request: str = dspy.InputField()
process_result: str = dspy.OutputField(
desc=(
"Message that summarizes the process result, and the information users need, e.g., the "
"confirmation_number if a new flight is booked."
)
)
agent = dspy.ReAct(
DSPyAirlineCustomerService,
tools = [
fetch_flight_info,
fetch_itinerary,
pick_flight,
add_numbers,
book_flight,
cancel_itinerary,
get_user_info,
file_ticket,
]
)
dspy.configure(lm=dspy.LM("openai/gpt-4o-mini"))
result = agent(user_request="please help me book a flight from SFO to JFK on 09/01/2025, my name is Adam")
print(result)This is the complete DSPy implementation.
Notice what’s missing:
The ReAct module handles reasoning, tool selection, parameter passing, and execution flow automatically.
Instead of manually scripting the workflow, the developer defines the interface and available tools. DSPy determines how to coordinate them.
The DSPy agent successfully completed the full booking workflow without requiring explicit step-by-step prompt instructions.
Thought 0: "I need to fetch available flights from SFO to JFK"
Thought 1: "I need to book the selected flight for Adam, so I require Adam's profile"
Thought 2: "The duration of flight DA123 is shorter, so I will choose that option"
Final Output:
Process Result: Your flight has been booked successfully, Adam. Your confirmation number is iu8b8np1.
Reasoning: I have successfully booked a flight for Adam from SFO to JFK on 09/01/2025. The available options were evaluated, and the cheaper flight DA123, which costs $200 and has a duration of 3 hours, was selected. Adam's user profile information was retrieved to finalize the booking process, resulting in the successful generation of a confirmation number.
The agent provided both reasoning and results - all with minimal prompting.
To mirror the DSPy setup, I implemented the same flight booking agent using traditional prompt engineering with OpenAI’s function-calling interface.
The goal was to keep the setup equivalent:
The only difference was orchestration strategy.
Instead of declaring structured signatures, I wrote a simple system prompt instructing the model to act as an airline customer service agent and use the available tools.
from openai import OpenAI
import json
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
# Define tools in OpenAI function calling format
TOOLS = [
{
"type": "function",
"function": {
"name": "get_user_info",
"description": "Fetch the user profile from database with given name.",
"parameters": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "The user's name"}
},
"required": ["name"],
},
},
},
{
"type": "function",
"function": {
"name": "fetch_flight_info",
"description": "Fetch flight information from origin to destination on the given date",
"parameters": {
"type": "object",
"properties": {
"date": {
"type": "object",
"properties": {
"year": {"type": "integer"},
"month": {"type": "integer"},
"day": {"type": "integer"},
"hour": {"type": "integer"},
},
"required": ["year", "month", "day", "hour"],
},
"origin": {"type": "string"},
"destination": {"type": "string"},
},
"required": ["date", "origin", "destination"],
},
},
},
# ... pick_flight, book_flight, etc.
]
# Simple system prompt (similar to DSPy)
system_prompt = "You are an airline customer service agent that helps users book and manage flights. Use the required tools."
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "please help me book a flight from SFO to JFK on 09/01/2025, my name is Adam"},
]
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=TOOLS,
)Result: Failed
The agent:
After the failure, I had to create a much more detailed system prompt with explicit step-by-step instructions:
system_prompt = """You are an airline customer service agent that helps users book and manage flights.
CRITICAL: When calling functions, you MUST pass the output from previous functions as arguments to subsequent functions.
Follow this exact workflow for booking:
1. Call get_user_info(name="<user's name>") to get the user profile
2. Call fetch_flight_info(date={...}, origin="...", destination="...") to get available flights
3. Call pick_flight(flights=<the array returned from step 2>) - IMPORTANT: Pass the FULL flights array from step 2
4. Call book_flight(flight=<the flight object from step 3>, user_profile=<the user profile from step 1>)
5. Provide the confirmation number to the user
IMPORTANT RULES:
- When you call pick_flight, you MUST pass the "flights" parameter with the complete array of flights you received from fetch_flight_info
- When you call book_flight, you MUST pass BOTH the "flight" parameter (from pick_flight) AND the "user_profile" parameter (from get_user_info)
- Do NOT call functions with empty parameters {}
- Use the exact objects returned from previous function calls
Example correct flow:
1. get_user_info returns: {"user_id": "1", "name": "Adam", "email": "adam@gmail.com"}
2. fetch_flight_info returns: [{"flight_id": "DA123", "date_time": {...}, ...}, {...}]
3. pick_flight(flights=[...]) - pass the FULL array from step 2
4. book_flight(flight={...}, user_profile={...}) - pass both objects from previous calls
"""The difference is stark:
Zero-Shot Attempt: Failed
Under zero-shot prompting, the agent partially executed the workflow but failed to complete the booking process.
The agent:

Enhanced Prompt Attempt: Success
Walk away with actionable insights on AI adoption.
Limited seats available!
With the detailed instructions, the agent properly executed:

Key insight: The same model required 10x more prompt engineering to achieve what DSPy did automatically.
The comparison reveals a clear pattern in how each approach performs under different levels of complexity.
DSPy performs best in structured, multi-step agent workflows:
However, normal prompting requires significantly more effort when:
The comparison reveals a clear pattern:
| Aspect | DSPy | Normal Prompting |
Initial complexity | Low | Low |
Multi-step tasks | Excellent (minimal setup) | Requires extensive prompt engineering |
Prompt length | Very short | Much longer for complex tasks |
Reasoning transparency | Built-in trajectory | Manual implementation needed |
Tool coordination | Automatic | Manual instruction required |
The main difference lies in orchestration strategy.
Normal prompting relies on manually written instructions to guide the model step by step. DSPy uses declarative signatures and structured modules to automatically generate and optimize prompts for multi-step workflows.
DSPy performs better in complex, multi-step LLM agent workflows that require reliable tool coordination and reasoning chains.
Traditional prompting works well for simple, single-step tasks where detailed manual control is sufficient.
Use normal prompting when:
For scalable agent systems, DSPy is typically more stable.
In traditional prompting, developers must explicitly instruct how tool outputs are passed between steps.
DSPy modules like ReAct automatically manage tool selection, parameter passing, and reasoning flow without requiring detailed workflow instructions.
Normal prompting becomes brittle because:
As workflow depth increases, instability increases.
No. DSPy reduces manual prompt writing but shifts the effort toward defining structured input–output signatures and selecting reasoning modules.
It replaces fragile prompt tweaking with structured program design.
Yes. DSPy is well-suited for production environments that require:
It reduces brittle behavior in complex orchestration scenarios.
DSPy and traditional prompting represent two different ways of building with LLMs.
Prompt engineering gives you direct control. It works well for simple tasks and single-shot interactions. When you understand exactly how the model should behave, a well-written prompt is often enough.
DSPy takes a different approach. Instead of manually crafting instructions, you define the task structure and let the framework handle reasoning, tool orchestration, and optimization. In multi-step workflows, this reduces brittle prompt tuning and makes complex agents easier to manage.
In the flight booking experiment, the difference was clear. What required detailed, step-by-step instructions in traditional prompting worked with minimal setup in DSPy. As workflows grow more complex, that gap becomes more noticeable.
The choice depends on the problem. Use traditional prompting for simple, well-defined tasks. Use DSPy when coordinating multiple tools, handling multi-step reasoning, or scaling agent behavior across models.
As LLM systems mature, the shift from manual prompt crafting toward structured, declarative design is likely to accelerate. DSPy reflects that direction.
Walk away with actionable insights on AI adoption.
Limited seats available!