We're seeing some seriously capable AI coding assistants out there, right? Tools that can chew through boilerplate, suggest fixes, and even draft entire websites. They're changing how we work, no doubt. But I got to thinking: how quickly could I, with an AI partner, bootstrap a custom agent, one tailored specifically for my CLI workflow, focusing on direct codebase interaction and local model friendliness? Could we go from zero to a functional, tool-using agent, like, really fast?
The answer? Less than two hours.
That's how long it took to get Terminaut up and running – a CLI-based coding agent, co-developed with another AI. This isn't about claiming to have invented the next big thing in AI agents. It's a story from the trenches about the sheer speed and leverage you get when you pair human experience with AI capabilities to build your own tools.
Terminaut is that experiment: a lean, mean, CLI-native agent designed to understand requests, interact with a codebase, and execute tasks. It's about embedding a reasoning, acting AI right into the terminal to help ship code faster. And it's a testament to how AI can help us build... well, more AI.
Beyond Chatbots: The Quest for True Agency (and a Meta Twist) #
Generic LLMs are great for a quick brainstorm or spitting out boilerplate, no argument there. But what about the real day-to-day developer grind? I'm not just talking about writing new features – that's often the fun part. I mean the stuff like upgrading Spring Boot across 34 microservices, refactoring legacy code to meet new standards, or meticulously updating dependencies and configurations. That's where we often need more than a suggestion box; we need a tireless partner to handle the grunt work. The existing landscape of AI coding assistants is already tackling parts of this, and they were definite inspirations.
My goals for this experiment, though, were a bit more specific, almost a meta-challenge:
- Keep It Simple & Transparent: Could I build an effective agent where the core logic wasn't buried under layers of complex frameworks? Something you could actually understand by reading the code.
- Local First, FTW: Prioritize compatibility with local models (via Ollama, etc.). Powerful agency shouldn't require a constant cloud tether and a credit card.
- The Meta-Dream: Could an LLM be instrumental in creating an agent that might, one day, work on its own codebase? Talk about eating your own dog food!
So, while nodding to the giants, Terminaut's mission was to prove a potent, locally-runnable coding agent could be rapidly co-created with an LLM. Because an agent, in my book, still needs to:
- Understand Complex Intent: Go beyond simple prompts to grasp multi-faceted coding goals.
- Comprehend Code Context: "Read" and reason about the existing codebase it's working on.
- Formulate and Execute Plans: Break down tasks and use available tools to achieve them.
- Modify Code Intelligently: Not just suggest changes, but apply them (safely!).
Terminaut aimed to embody these principles, serving as a lean proof-of-concept for this vision of a more integrated, active, and ultimately self-aware (in a code sense!) AI coding partner. This isn't just wishful thinking. Philip Zeyliger's killer post, "The Unreasonable Effectiveness of an LLM Agent Loop with Tool Use," absolutely nails why this iterative, tool-using loop—the very heart of Terminaut—is so damn potent.
Crafting Terminaut: Engineering an AI Agent for Your Terminal #
Building an agent like Terminaut isn't just about wiring an LLM to a CLI. That's the easy part. Real agency comes from thoughtful design. Here are the key ingredients that went into Terminaut:
1. Defining the Agent's "Mind": The System Prompt This is ground zero, the constitution of your agent. It's not a fluffy welcome message; it's a detailed spec sheet. It meticulously defines:
- Its purpose.
- Core capabilities (what it can do).
- Operational mandates (the hard rules).
- Interaction style.
- Crucially, how it must use its tools and adhere to specific coding guidelines.
For Terminaut, this meant a comprehensive guide covering things like:
- Core Capabilities: What can the agent do? (e.g., process inputs, manage code, use a sandboxed environment).
- Agent Behavior: How should it behave? (e.g., persist until task completion, gather information before acting, resolve tasks by editing and testing).
- Operational Mandates: What are the hard rules? (e.g., repository access permissions, security analysis allowances, transparency).
- Tool Usage: Precise instructions for each tool. For instance, how to format a
bash
command request (strictly JSON) or how to useapply_patch
with a specific diff format. This includes preferred tools for common tasks, likeripgrep
for searching orsed
for reading specific line ranges. - Coding Guidelines: If the agent writes or modifies code, what standards must it follow? (e.g., fix root causes, maintain simplicity, update documentation, adhere to existing codebase style, and a crucial "NEVER add copyright or license headers unless explicitly requested").
- Communication Style: How should it interact with the user? (e.g., knowledgeable, friendly, substance over flattery, critical engagement).
This level of detail is paramount. It's what separates a generic chatbot from a specialized agent that can reliably get shit done. For Terminaut, the system prompt is its DNA, instructing it to be helpful, think step-by-step, use tools effectively, and always prioritize safety and user approval. (Factor 2 from my 12-Factor Agents post: Own Your Prompts! And make 'em thorough.)
2. Giving the Agent "Hands" - Tool Integration:
An agent needs to act. Terminaut is equipped with tools:
- Bash Command Execution: The ability to run shell commands is crucial for interacting with a developer's environment (compiling, running scripts, git operations, etc.). Critically, every command requires explicit user approval. Safety first.
- Patch Application: To "write" or "modify" code, Terminaut has the LLM generate a standard diff -u
style patch. This patch is then applied using a simple apply_patch
bash script I wrote (it's just a wrapper around patch --batch --forward
—no magic). Again, user confirmation is mandatory. Why reinvent the wheel when robust, existing tools can do the heavy lifting for a core agentic function? This diff-and-patch approach, by the way, was heavily inspired by early OpenAI Codex work; credit where it's due.
3. Enabling Thought - The Agentic Loop: This is where the magic happens. When Terminaut uses a tool, the output of that tool isn't just displayed; it's fed back into the LLM as new context. This allows the agent to:
- See the results of its actions.
- Reason about the next steps.
- Chain multiple tool uses together to accomplish more complex tasks.
- Correct course if a previous action didn't yield the expected outcome.
This feedback loop is fundamental to agentic behavior.
4. Responsible Agency - User Oversight: While we want an autonomous agent, we also want a safe one. User approval for any action that modifies the system or codebase (like running bash commands or applying patches) is a non-negotiable safety net. The agent proposes, the human disposes.
5. Building for Evolution - Modularity: Terminaut was refactored from a simple script into a modular Python package. This makes it easier for humans to maintain and extend. But there's a meta-level benefit: a well-structured agent is also easier for other AIs (or future, more advanced versions of Terminaut) to understand, analyze, and even help improve.
My AI coding partner was invaluable here, rapidly scaffolding modules, implementing tool interfaces, and ensuring the core agentic loop was sound. The development process itself was a demonstration of human-AI teaming.
How an AI Co-Developer Helped Me Build This Beast in Under Two Hours #
That "less than two hours" claim from idea to working, documented tool? It wasn't magic; it was methodical human-AI collaboration. The process, looked something like this:
- Human (Me): Sets a high-level goal. "Okay, LLM, let's port the initial script from Anthropic's API to OpenAI's. We need a configurable base URL and the bash tool."
- AI Sidekick: Drafts the initial Python code, suggests the necessary library changes, and implements the
LLM
class structure for OpenAI. - Human: Reviews, tests. "Good start. Now, let's abstract all
print
statements into a centraloutput()
function with contextual tags." - AI Sidekick: Implements the
output()
function and refactors all existing calls. - Human: "Next, user approval for bash commands. The agent should prompt, wait for 'y/N', and log the decision."
- AI Sidekick: Adds the approval loop within
handle_tool_call
.
That's how a feature-rich agent, apply_patch
script, and basic docs came together in one coffee-fueled session. For rapid prototyping and just getting things done, this human-AI teaming is a straight-up game-changer.
Terminaut in Action: An Agent That Understands and Codes #
Even in its current form, Terminaut demonstrates key agentic capabilities:
- Instruction Following: It can take a complex request, like "Refactor this function to use a try-except block and log errors, then show me the diff," and formulate a plan.
- Code Comprehension (via Context): By including relevant code snippets in the prompt (either manually or through future context-aware mechanisms), Terminaut can "understand" the code it needs to work with.
- Code Generation & Modification: It doesn't just output code blocks; it generates patches and applies them, directly impacting the codebase.
It's a tangible step towards an AI that doesn't just talk about code, but actively participates in its creation and maintenance.
Terminaut in Action: Reading and Understanding Code #
Alright, enough talk. Let's see this thing in action. Here's a quick example of Terminaut (running a local qwen2:14b-instruct-q8_0
model via Ollama, in this case) tackling a common dev task: figuring out what a piece of code does.
Scenario: The user wants Terminaut to read and explain the terminaut/cli.py
file.
1➜ ./terminaut.py # Or however you launch your agent
2ℹ️ [info] === Terminaut: LLM Agent Loop with OpenAI Chat Completions API and Bash Tool ===
3ℹ️ [info] Type '/exit' to end the conversation.
4👤 [user]: Read and understand the file terminaut/cli.py and explain it to me in simple terms
5
6🤖 [agent] <think>
7 Okay, the user wants me to read and explain the terminaut/cli.py file in simple terms.
8 I should first locate the file, then read its contents using available tools.
9 I'll use `rg --files | rg cli.py` to find it, then `cat terminaut/cli.py` to read it.
10 Once I have the content, I'll break down its structure (imports, functions, main loop)
11 and explain its purpose in simple, non-jargon terms.
12 </think>
13 I'll read the file and explain it in simple terms. Let me first check the file's contents...
14
15 ```json
16 {
17 "id": "file_read_1",
18 "type": "function",
19 "function": {
20 "name": "bash",
21 "arguments": "{\"command\": \"rg --files | rg cli.py\"}"
22 }
23 }
24 ```
25💻 [bash_command] Approval required for bash command: rg --files | rg cli.py
26❓ [approval_prompt] Approve execution of: rg --files | rg cli.py? [y/N]: y
27➡️ [approval_response] y
28💻 [bash_command] User approved execution of: rg --files | rg cli.py
29📤 [bash_output] STDOUT:
30 terminaut/cli.py
31
32 STDERR:
33
34 EXIT CODE: 0
35
36🤖 [agent] <think>
37 Great, the file `terminaut/cli.py` exists. Now I need to read its content.
38 I'll use the `cat` command for that.
39 </think>
40 ```json
41 {
42 "id": "read_cli_file",
43 "type": "function",
44 "function": {
45 "name": "bash",
46 "arguments": "{\"command\": \"cat terminaut/cli.py\"}"
47 }
48 }
49 ```
50💻 [bash_command] Approval required for bash command: cat terminaut/cli.py
51❓ [approval_prompt] Approve execution of: cat terminaut/cli.py? [y/N]: y
52➡️ [approval_response] y
53💻 [bash_command] User approved execution of: cat terminaut/cli.py
54📤 [bash_output] STDOUT:
55 import os
56 from .llm import LLM
57 from .tools import handle_tool_call
58 from .input import user_input
59 from .output import output
60
61 def loop(llm):
62 msg = user_input()
63 while True:
64 output_text, tool_calls = llm(msg)
65 if output_text.strip():
66 output("agent", output_text)
67 # If there are tool calls, handle them and immediately call LLM again with the tool result(s)
68 while tool_calls:
69 msg = [handle_tool_call(tc) for tc in tool_calls]
70 output_text, tool_calls = llm(msg)
71 if output_text.strip():
72 output("agent", output_text)
73 # Only prompt user when there are no tool calls left
74 msg = user_input()
75
76 def main():
77 try:
78 output("info", "=== Terminaut: LLM Agent Loop with OpenAI Chat Completions API and Bash Tool ===\") # Note: A small typo here in the original output, should be ===
79 output("info", "Type '/exit' to end the conversation.")
80 loop(LLM(
81 model=os.environ.get("OPENAI_MODEL", "gpt-4o"),
82 base_url=os.environ.get("OPENAI_BASE_URL")
83 ))
84 except KeyboardInterrupt:
85 output("info", "Exiting. Goodbye!")
86 except Exception as e:
87 output("error", f"An error occurred: {str(e)}")
88
89 STDERR:
90
91 EXIT CODE: 0
What just happened?
- User Request: The user asked Terminaut to read and explain
terminaut/cli.py
. - Agent Planning (Internal
<think>
block): Terminaut's LLM brain decided on a two-step plan: first, verify the file exists usingrg
(ripgrep), then read its content usingcat
. This internal monologue is crucial for understanding the agent's reasoning. - Tool Call 1 (Find File): Terminaut requested to run
rg --files | rg cli.py
by outputting a structured JSON call. - User Approval: The system (Terminaut's harness) intercepted this, recognized it as a
bash
command, and prompted the user for explicit approval. Safety first! The user typed 'y'. - Tool Execution & Result: The command ran, confirming
terminaut/cli.py
exists. The STDOUT, STDERR, and exit code were captured and fed back to Terminaut as context for its next reasoning step. - Agent Reasoning (Internal
<think>
block): Seeing the successful file location, Terminaut decided its next step was to read the file. - Tool Call 2 (Read File): Terminaut requested to run
cat terminaut/cli.py
via another JSON tool call. - User Approval: Again, the user approved the
cat
command. - Tool Execution & Result: The
cat
command ran, and its STDOUT (the full content ofcli.py
) was captured and sent back to the LLM.
So now, the entire content of cli.py
is in the LLM's context. In its next turn (not shown here, but trust me), Terminaut would digest this code and spit out that simple explanation the user wanted. This little dance—understanding intent, planning, using tools (with my explicit OK), and feeding results back in—is the agentic loop in action. It's a world away from your average code-completion gimmick.
The Future is Agentic: What's Next for Coding Assistants? #
Terminaut is just scratching the surface, a proof of concept. The real fun begins when we think about where this tech is heading. What's next for AI coding agents? I've got a few ideas:
- Deep Codebase Understanding: Imagine an agent that has ingested and indexed your entire repository, understanding dependencies, architectural patterns, and historical context. "Terminaut, where else is this deprecated function used, and what's the impact of removing it?"
- Leveraging Developer Tools & Standards: Integration with Language Server Protocols (LSPs) for deeper code intelligence, using linters, formatters, and even build systems as part of their toolkit.
- Access to Advanced Research Capabilities: Future agents might utilize tools that are themselves deep thinking models, capable of performing research, analyzing documentation, or understanding complex algorithms to inform their coding tasks.
- Sophisticated Planning & Reasoning: Agents that can break down truly large-scale tasks (e.g., "Migrate this entire module from framework X to framework Y") into manageable sub-goals, executing them step-by-step.
- Autonomous Learning & Adaptation: An agent that learns from your feedback, your coding style, and the outcomes of its actions to improve its performance over time.
- Advanced Toolsets as Agent Capabilities:
- Static Analysis & Linters: "Terminaut, run the project linter, identify all style violations in my recent commits, and propose fixes."
- Refactoring Engines: "Apply the 'extract method' refactoring to this complex function, ensuring all tests still pass."
- Documentation Generation: "Generate comprehensive documentation for this new class, including examples based on its usage in the test suite."
- Multi-Agent Collaboration: Terminaut could be one of a team of specialized AI agents (a testing agent, a deployment agent, a security agent) coordinating via a protocols like A2A.
This isn't sci-fi; it's the trajectory. I'm betting heavily on a future where AI agents are integral, proactive members of our dev teams. The productivity gains are too massive to ignore.
Your Turn: Embrace Your Agentic Future #
Building Terminaut hammered it home for me: AI agents that genuinely help with complex coding aren't some far-off fantasy. They're here, now. The tools are ready, the LLMs are surprisingly capable, and the main bottleneck is our own imagination in designing how these digital partners should operate.
Consider Terminaut an open invitation to dive into this space. What would your ideal AI coding partner look like? How would it plug into your daily grind? The CLI is just one playground; these principles of agency are far broader.
The shift from passive AI tools to proactive, agentic assistants is happening fast. It's time to stop waiting and start building. Or at least, start thinking about how you'll leverage them.
So, what are your takes on agentic coding? Got any war stories, grand visions, or healthy skepticism? Drop 'em in the comments below. Let's figure this out together.