Prompt Engineering book, Chapter 11: DSPy — Programming, Not Prompting
Introduction: The “Compiler” for AI
In the previous parts, we’ve focused on how to write better prompts and build systems around them. But what if we didn’t have to write the prompts at all? What if we could treat the LLM like a piece of hardware and write a program that “compiles” the best prompt for us?
This is the promise of DSPy (Declarative Self-improving Language Programs). Developed by the Stanford NLP group, it represents the most significant paradigm shift in the history of prompt engineering. In 2026, many of the most advanced AI systems are not built with manual prompts, but with DSPy Signatures and Modules.
Deep Technical Analysis: The DSPy Compiler
The shift from “Prompt Crafting” to “Language Model Programming” is built on three technical pillars:
1. The Separation of Logic from Implementation
In traditional prompting, the “Prompt” is both the Logic (the task) and the Implementation (the specific wording). If you change the model, you have to rewrite the implementation. In DSPy, you only define the Logic in a Signature (e.g., Question -> Answer). The Implementation (the prompt) is generated automatically by the DSPy compiler based on the specific LLM you are using.
2. Modules as Reasoning Scaffolds
DSPy provides Modules (like ChainOfThought, ReAct, ProgramOfThought) that act as “Reasoning Templates.” You don’t have to tell the model to “think step-by-step.” You just wrap your Signature in a dspy.ChainOfThought module, and the framework handles the “Thinking” state-management and formatting.
3. Teleprompters (The Prompt Optimizers)
This is the “Secret Sauce.” A Teleprompter is an optimizer that takes your DSPy program, some training data, and a metric (accuracy), and then automatically searches for the best prompt and few-shot examples. It is essentially Machine Learning for Prompts.
Why DSPy Solves Real-World Problems
In practice, DSPy solves several critical production issues:
Model Brittleness: Write your program once. DSPy will automatically “compile” it for GPT-4, Claude 3.5, or Llama 3 by finding the prompts that work best for each model.
Systematic Improvement: Instead of guessing why a prompt is failing, you provide a few examples of “Good” and “Bad” outputs, and the optimizer finds a way to fix the prompt for you.
Scalability: For an enterprise with 500 different AI tasks, manually engineering each prompt is impossible. DSPy allows for an AI Factory approach where tasks are “compiled” and “optimized” automatically.
Practical Implementation: 8 Python Examples
These examples demonstrate how to build your first DSPy programs and move from “Prompting” to “Programming.”
Example 1: Defining a Declarative Signature
Problem: You want a model to perform a specific task (e.g., sentiment analysis) without writing a 100-word prompt that might be biased.Solution: Use a “Signature” to define the task as a Python class.
import dspy
# 1. Define the Signature (The logic contract)
class SentimentAnalysis(dspy.Signature):
“”“Analyze the sentiment and tone of the given customer feedback.”“”
# Input fields define what the model receives
feedback = dspy.InputField(desc=”Raw text from a user review”)
# Output fields define what the model must produce
sentiment = dspy.OutputField(desc=”Positive, Negative, or Neutral”)
tone = dspy.OutputField(desc=”Professional, Frustrated, or Happy”)
# 2. Setup the Predictor
# Predict is a module that takes a signature and generates a prompt
def run_sentiment_task(text: str):
predictor = dspy.Predict(SentimentAnalysis)
# DSPy automatically generates the instructions based on field names and descriptions
try:
response = predictor(feedback=text)
return response.sentiment, response.tone
except Exception as e:
return f”Error: {e}”
# Execution Example
if __name__ == “__main__”:
# res = run_sentiment_task(”The app is slow and I hate it.”)
passWhy this is preferred: It is Declarative. You’ve told the system what you want (sentiment and tone), and DSPy handles the how (the prompt instructions) for you.
Example 2: Using the “ChainOfThought” Module
Problem: You want the model to reason before answering to reduce hallucinations. Solution: Wrap your Signature in the dspy.ChainOfThought module.
import dspy
def run_reasoning_task(query: str):
“”“
Uses ChainOfThought to raise the reasoning ceiling.
Logic:
1. Model generates ‘Rationale’
2. Model generates ‘Sentiment’ and ‘Tone’
“”“
# Simply swap Predict for ChainOfThought
cot_predictor = dspy.ChainOfThought(SentimentAnalysis)
# response = cot_predictor(feedback=query)
# print(f”Reasoning: {response.rationale}”)
# print(f”Result: {response.sentiment}”)
passWhy this is preferred: You don’t have to manually write the “Reasoning:” header or “Think step-by-step” instruction. DSPy’s built-in module handles the state-management consistently across different models.
Example 3: Defining Multi-Input Signatures (RAG)
Problem: You need a model to answer a question based on a provided context, but you don’t know the best way to word the “Context” block. Solution: Define a signature with multiple InputFields and let the compiler handle the formatting.
import dspy
class ContextAnswer(dspy.Signature):
“”“Answer the question accurately using ONLY the provided context.”“”
context = dspy.InputField(desc=”Retrieved facts from the knowledge base”)
question = dspy.InputField()
answer = dspy.OutputField()
# predictor = dspy.Predict(ContextAnswer)
# response = predictor(context=”...”, question=”...”)Why this is preferred: It defines a clean Data Interface for your AI task. You can easily swap the source of the context (e.g., from a vector DB or a local file) without touching the AI logic.
Example 4: Creating a Custom Module (Multi-Hop Agent)
Problem: You want to build a more complex reasoning loop (e.g., “Search for a query, then answer”). Solution: Subclass dspy.Module to define a custom flow of multiple signatures.
import dspy
class MultiHopSearch(dspy.Module):
def __init__(self):
super().__init__()
# Define internal sub-modules
self.generate_query = dspy.Predict(”question -> search_query”)
self.generate_answer = dspy.ChainOfThought(ContextAnswer)
def forward(self, question: str):
# 1. Generate search terms
query = self.generate_query(question=question).search_query
# 2. (Mock) Fetch context using the query
context = f”Internal search results for {query}...”
# 3. Generate final answer
return self.generate_answer(context=context, question=question)
# agent = MultiHopSearch()
# result = agent.forward(”Who is the CEO?”)Why this is preferred: It treats the AI workflow like a Standard Python Class. This makes it easy to test each step individually and version the entire “Agent” as a single artifact.
Example 5: “Assertions” for Quality Control
Problem: You want the model to never return an answer longer than 50 words. Solution: Use dspy.Suggest or dspy.Assert to enforce constraints in code.
# Inside a Module’s forward method:
# res = self.generate_answer(context=ctx, question=q)
# dspy.Assert(len(res.answer.split()) < 30,
# “Answer too long! Please summarize more concisely.”)Why this is preferred: If the constraint is failed, DSPy will automatically backtrack and ask the LLM to rewrite the response using the feedback as a new instruction.
Example 6: Compiling with a “Teleprompter” (BootstrapFewShot)
Problem: You have 10 examples of “Good” answers, and you want the model to follow that pattern. Solution: Use an optimizer to find the best way to include those examples in the prompt.
from dspy.teleprompters import BootstrapFewShot
# 1. Define a simple metric (True/False or 0-1)
def my_metric(example, pred, trace=None):
return example.answer.lower() == pred.answer.lower()
# 2. Initialize the Optimizer
optimizer = BootstrapFewShot(metric=my_metric, max_bootstrapped_demos=4)
# 3. ‘Compile’ the module into an optimized program
# compiled_bot = optimizer.compile(MultiHopSearch(), trainset=train_data)Why this is preferred: It turns prompt engineering into a Machine Learning Optimization. The system learns the best prompt by mathematically searching for the one that maximizes your metric.
Example 7: Handling Structured Output with Descriptors
Problem: You need the answer in a specific structural format (e.g., a list of task objects). Solution: Use the desc parameter in OutputField to guide the compiler’s formatting logic.
class TaskExtractor(dspy.Signature):
“”“Extract tasks from a chat log.”“”
chat_log = dspy.InputField()
tasks = dspy.OutputField(
desc=”A JSON list of objects with ‘owner’ and ‘action’ keys”
)Why this is preferred: DSPy automatically generates the correct “Formatting Instructions” (e.g., JSON schema hints) based on your model’s specific capabilities.
Example 8: Zero-Effort Model Migration
Problem: You want to switch from OpenAI to Llama 3 to save money. Solution: Just swap the global “Language Model” (LM) configuration in your Python script.
# Switch to Llama 3 via Ollama or vLLM
# llama = dspy.OllamaLocal(model=”llama3:8b”)
# with dspy.context(lm=llama):
# # The EXACT same program code now runs on Llama 3.
# # DSPy will handle the instruction differences automatically.
# agent = MultiHopSearch()
# result = agent.forward(”What is the capital of France?”)Why this is preferred: It provides the ultimate Future-Proofing. Your business logic (the Signature and Module) is now completely decoupled from the specific API or model version.
Conclusion: The Death of the String
DSPy is the “End of Prompt Engineering” as we once knew it. By moving away from manual string manipulation and towards declarative, compiled programs, we gain consistency, portability, and the ability to scale our AI systems far beyond what a human “Prompt Whisperer” could ever manage.
In the next chapter, we will dive deeper into Why DSPy Matters for the enterprise and how it solves the “Brittleness” problem of manual prompts.
References & Further Reading
Khattab et al. (2023): DSPy: Compiling Declarative Language Model Programs.
Stanford NLP: Official DSPy Documentation and Tutorials.
Medium (Balaji Rajan): DSPy: Programming, Not Prompting — Why .compile() Feels Like Home.
Plain English (2026): DSPy vs Prompt Engineering: A New Paradigm.

