Simple Control Flow for Automatically Steering Agents

You can use validation functions as a tool to steer coding agents and data processing agents automatically, instead of manually copy pasting outputs or prompting. You can use this to parallelize common tasks in agentic programming, and eliminate bottlenecks on your agentic coding workflows.

The Standard Agent Loop

The industry consensus on a agentic loop looks something like this

msg = []
while True:
    msg.append(input())
    while True:
        output, tool_calls = prompt_llm(msg)
        msg.append(output)
        print("Agent: ", output)
        if tool_calls:
            msg.extend([ 
              handle_tool_call(tc) 
              for tc in tool_calls 
            ])

The Problem: Manual Verification

Agents operating on this premise trap you in a tedious cycle:

  1. Agent makes a change
  2. Manually run tests
  3. Paste the output back into the agent
  4. Repeat

While this workflow might feel productive you might start to feel bottlenecked by your ability to copy paste test outputs from window to agent time and time again. To automate this, get the agent to run the tests for you and feed the results of this test back into its context window. By automating this, you can free up your attention by running this in the background, and have confidence of reaching a stable state code wise.

Naive Solution: Bro, Just Prompt Harder

fix my failing test and verify your fix by running pytest

Steering the agent via a prompt is an okay way to implement this control flow. You are handing off the control flow to token output generation so that will make your control flow probabilistic in nature and subject to all of the fun consequences of that property. Usually that’s fine especially for cheap one off tasks though.

Control for Hallucination with Deterministic Validators

introducing validation directly into the agent loop feeds back validation context into the agent so that it does not complete its loop until the goals are met.

 msg = []
 while True:
     msg.append(input())
     while True:
         output, tool_calls = prompt_llm(msg)
         msg.append(output)
         print("Agent: ", output)
         if tool_calls:
             msg.extend([ 
               handle_tool_call(tc) 
               for tc in tool_calls 
             ])
+        complete, output = check_completed()
+        msg.append(output)
+        if complete:
+          break

When an agent is ready to submit a completed task, the check_completed() function runs a verification command (like npm run test) or executes any validator that determines task completion. When the task remains incomplete, the validation output enters the message history, creating an automatic feedback loop.

flowchart LR
    Agent -->|mutation with tools| Environment
    Environment -->|observation with validators| Agent

Why This Works

This approach gives the agent grounded, controlled access to environment state. Agents possess tools to inspect the environment, but prompt-based steering proves unreliable. Agents grow lazy, confused, or prematurely declare success.
Embedding environment state validation directly into control flow ensures the agent continues until either:

  • The task is genuinely complete, or
  • The token budget is exhausted

Critically, check_completed() validates the actual state resulting from the agent’s actions on the environment, not merely what the agent thinks it accomplished.

Implementation: Mini Agent Action

Mini Agent Action adapts Mini SWE Agent to implement this control flow. Install it with pip and run an agent with a bash tool that loops until task completion:

pip install mini-agent-action
mini-agent-action --exec "python test.py" \
                  --task "fix this test, you can see it fail with python test.py. you will be validated against that cmd" \
                  --debug

This sort of agent is very useful to have listening to your PR branches. When you push a commit with broken tests, instead of intervening, you can task switch and wait for this agent to come up with a solution to your test failure. Mini SWE Agent claims a 70% score on SWE bench without having this level of control on its outputs, and even if the agent doesn’t complete the task correctly, it will often be close enough to the correct answer that I am able to use part of its outputs in the final implementation. The agent creates a pull request on the feature branch, so I can choose to just merge, or close the pull request if I’m not happy with the results.

A system like this is proactively having agents fix your code, instead of having a programmer reactively prompting agents to intervene, while maintaining human ownership over the results.

Implementation: Data Transformation

Consider this common scenario: you receive JSON data that must conform to a specific schema. You could manually write transformation logic, but this approach fails at scale. Each new JSON structure requires custom transformation code—an unsustainable workload when handling diverse inputs.

Agents solve this by generating valid transformation code at runtime. Add a validation function to your agent loop that checks whether the transformed output matches your target schema. When validation fails, the agent reruns with debug information until it produces valid output.

flowchart TD
    A[Input JSON] --> B[Generate Python code<br/>to transform data]
    B --> C{Output matches<br/>target schema?}
    C -->|No| D[Rerun with<br/>debug information]
    D --> B
    C -->|Yes| E[Return valid JSON]
    
    classDef default fill:#fff,stroke:#333,stroke-width:1px,color:#000
    classDef decision fill:#f5f5f5,stroke:#333,stroke-width:1px,color:#000
    class C decision

This approach has limitations. Agents sometimes satisfy validation by emitting minimal, empty JSON documents. You may need multiple runs to get acceptable results. Still, this requires far less effort than writing custom transformations for each incoming schema.

Conclusion

Integrating automated validation into your agent’s control flow eliminates manual intervention and creates an autonomous system.