Anatomy of an AI agent

The concept

Calmoth agent in action

The concept of AI agents has fascinated me ever since I first interacted with coding agents, and I’ve been thinking about how I could deploy them to be useful in my daily life.

As a multitasker, I switch context a lot across different domains. From things I need to do at work, to the shopping list I need to remember for tomorrow’s breakfast, to planning bill payments for my family summer trip, and even the drugs I need to buy for my father’s health.

To keep track of everything, I use Google Keep. Each morning, I run through the day in my head and I ask myself: "What I need to work on? What I need to do?"

I write it all down in a note. During the day, if more thoughts come up, I add them to the list, reorder things, or deprioritize.

The general idea sounds like this: I check that list to bring myself up to date on what I haven’t done yet, and to make sure I’m not falling behind on more important tasks.

This is what made it feel like a perfect use case for a personal AI agent. An agent that could send me daily briefs and do quick check-ins throughout the day about my tasks.

The caveats

I’m still a bit too paranoid/cautious to delegate truly autonomous work to an agent without a human in the loop. I’ve seen this with coding agents. When there’s no one verifying what’s being done, guiding it, and shaping how it’s used, letting an agent run free feels like a recipe for disaster. I’d be a lot more comfortable if it had clear constraints and boundaries it could operate within.

The idea of letting an agent buy drugs on my behalf, book flights, or send emails for me also raises the risk of costly mistakes. Booking a flight with extra luggage space when not needed, buying more drugs than needed, or sending the right person the right message in the wrong tone. I don’t have the luxury of tolerating those kinds of errors.

With code, these mistakes are easier to catch and manage. You can write automated specs to cover test cases, or even use another agent for reviews. But how do you run test cases for real-world actions? Things that don’t live inside your system like a program does?

That’s probably also one of the reasons I opted against running/installing OpenClaw, beyond the potential security liability the tool itself might introduce.

So I figured this might be a good opportunity to build an AI agent myself, just to understand the nitty gritty of how it all comes together under the hood.

The definitions

AI agents are jagged intelligent (borrowing from Karpathy) packaged into programs that can self-execute on your behalf, powered, of course, by LLMs.

Before agents became a thing, you still had programs (software or hardware), but humans were the ones operating them.

Take a CRM, for example. If you wanted to manage a customer pipeline:

Before AI agents: you’d call the customer yourself, categorize them, place them into buckets, and manually save that data in your CRM.

With an AI agent: the agent can do that for you. It can speak with the customer and operate the CRM on your behalf.

The hows

So, how do we actually build an AI agent? The good news, at least from where I stand, is you don’t need to be a machine learning expert to build one. Like any other software program, an AI agent is still just software. If you’re comfortable working with APIs, you already have enough to get started.

In this case, we’d be building an AI task assistant. Its main job would be to brief me daily on my tasks and todos, and check in throughout the day when necessary. To make that happen, we need a few things:

The medium: LLMs are mostly non-deterministic systems, which is why rigid, structured input forms don’t work that well. They limit what the system can interpret and do. That’s why chat has become the de facto interface for interacting with these models and collecting input. We could build our own chat UI as a web app or native app, but to keep things simple, we’ll use Telegram. Other channels work too (WhatsApp, Slack, Discord, even email), as long as they support free-form, multimedia-friendly interaction and let us programmatically send and receive messages.
The model: Of course, there’s no agent without a model. There are plenty of foundational models we could use, including self-hosted open-source options. But for this, we’ll use OpenAI’s GPT-5 Nano, mainly because I’m already a ChatGPT Plus subscriber.
The program: Finally, the agent needs a program it can actually operate on, something that exposes tools the model can call to take (in)appropriate actions. That program can be an existing tool or something you build from scratch. For this scenario, we’ll build a simple todo list app with reminder scheduling. I originally considered Google Calendar, but I realized not all my tasks have a clear due date. Sometimes I just want something on a list, and I want occasional check-ins about it.

The Medium

I'm going to skip the process of setting up a telegram bot. It's pretty easy and Telegram has a good guide on how to set it up. To summarize the steps:

We need a way to receive messages from our Telegram bot and a way to respond to those messages. We can do this with webhooks. A typical webhook handler would look like this:

class TelegramWebhookController
{
    public function __construct(
        protected TelegramAdapter $adapter,
    ) {}

    public function __invoke(Request $request, string $secret): Response
    {
        $expectedSecret = (string) config('services.telegram.webhook_secret');

        if ($secret !== $expectedSecret) {
            return response('Forbidden', 403);
        }

        $message = $this->adapter->toNormalizedMessage($request->all());

        if ($message->externalUserId === '' || $message->text === '') {
            return response('OK', 200);
        }

        ProcessInboundMessage::dispatch($message->toArray());

        return response('OK', 200);
    }
}

Telegram Webhook Handler

When we receive a request from our Telegram bot, we quickly respond back with a 200, and then process the message in the background as a queued job.
After our job processes the message and we are ready to reply to the user, we send back our message by hitting Telegram’s API:

class TelegramMessenger implements ChannelMessengerInterface
{
    /**
     * @param  array<string, mixed>  $options
     */
    public function sendMessage(User $user, string $text, array $options = []): void
    {
        $token = config('services.telegram.bot_token');

        if (! $token) {
            Log::warning('Telegram bot token missing.');

            return;
        }

        $identity = $user->identities()
            ->where('provider', 'telegram')
            ->latest('id')
            ->first();

        if (! $identity?->chat_id) {
            Log::warning('Telegram chat id missing for user.', ['user_id' => $user->id]);

            return;
        }

        $payload = array_merge([
            'chat_id' => $identity->chat_id,
            'text' => $text,
        ], $options);

        Http::post("https://api.telegram.org/bot{$token}/sendMessage", $payload);
    }

Responding to message

As we can see, we obviously need a datastore for users interacting with our agent. We can store this anywhere: Postgres, MongoDB, MySQL, even Redis. We just need to be able to reference user data somehow. In my case, I used a MySQL datastore (yes, I know. I didn't use Postgres, Supabase, Convex, PlanetScale. I'm an old-school guy, running self-hosted MySQL. Ok I'm kidding. I already pay for managed MySQL for my blog, so I used it here).

I have a User model that stores basic user info like name and timezone. I also have a UserIdentities model that stores metadata about the channel in use (Telegram in this case, but it can also be used for WhatsApp or other channels).

Awesome, we now have a functioning medium (Telegram). We can receive messages and respond to messages. Let's move on to the program.

The program

AI agents need a program to run against. Something they can actually execute on. In our case, that program is a todo list system.

At the task level, I need to be able to create, list, edit, update, and delete tasks. Tasks should support optional due dates, and I should be able to assign priorities when needed.

On top of that, I need reminders. I should be able to set, pause, snooze, delete, or dismiss a reminder for a task. Reminders can be one-off or recurring (up to a defined period). And for check-ins, I can treat those as recurring reminders, but with random intervals.

Also, the program doesn’t have to live inside the agent. It can be external, or even multiple programs, like an agent that operates your WhatsApp or email on your behalf.

For our todo list program, to keep it simple, we’ll stick to two tables/models: tasks and reminders, plus a service class that performs the actions.

<?php

class TaskService
{
    public function create(User $user, string $title, ?string $notes = null, ?CarbonInterface $dueAt = null, ?string $createdVia = null): Task
    {
        return $user->tasks()->create([
            'title' => $title,
            'notes' => $notes,
            'due_at' => $dueAt,
            'status' => 'open',
            'created_via' => $createdVia,
        ]);
    }

    /**
     * @param  array<string, mixed>  $attributes
     */
    public function update(User $user, int $taskId, array $attributes): ?Task
    {
        $task = $user->tasks()->whereKey($taskId)->first();

        if (! $task) {
            return null;
        }

        $task->fill($attributes);
        $task->save();

        return $task;
    }

    /**
     * @return Collection<int, Task>
     */
    public function listOpen(User $user, int $limit = 5): Collection
    {
        return $user->tasks()
            ->where('status', 'open')
            ->latest('id')
            ->limit($limit)
            ->get();
    }

    public function complete(User $user, int $taskId): ?Task
    {
        $task = $user->tasks()->whereKey($taskId)->first();

        if (! $task) {
            return null;
        }

        $task->update([
            'status' => 'done',
            'completed_at' => now(),
        ]);

        return $task;
    }
}

Our Todo list Program for Tasks

Now we finally have our todo list program. Let's go to the hard part

The Model

The model is what actually brings life to the whole agentic journey. Think of it like a conductor sitting between two moving parts, the medium and the program, and orchestrating how they work together. The model lives in the middle, working “effortlessly” (hard?) to carry out its goal. Its life’s work.

A simple agentic LLM integration usually looks something like this:

Simple LLM Integration journey

But there are three things we need to think through:

How do we get the LLM to reliably interact with our program?
How do we manage context, memory, and tokens especially given model constraints and the degradation that happens as you get close to token limits?
And how do we make it actually feel like an agent one that follows up and breaks out of the boring request/response loop?

Tool Calling

With tool calling, we give the LLM a direct way to interact with our program. In simple terms, we’re telling the model: these are the tools my program exposes, this is what each one does, and this is how you call them.

<?php


class ListTasksHandler implements ToolHandlerInterface
{
    public function __construct(protected TaskService $taskService) {}

    public function name(): string
    {
        return 'list_tasks';
    }

    public function schema(): array
    {
        return [
            'type' => 'function',
            'function' => [
                'name' => $this->name(),
                'description' => 'List open tasks for the user.',
                'parameters' => [
                    'type' => 'object',
                    'properties' => [
                        'limit' => ['type' => 'integer'],
                    ],
                ],
            ],
        ];
    }

    public function handle(User $user, NormalizedMessage $message, array $args, string $timezone): ToolHandlerResult
    {
        $limit = (int) ($args['limit'] ?? 10);
        $tasks = $this->taskService->listOpen($user, max(1, min($limit, 25)));

        $items = $tasks->map(function ($task) use ($timezone) {
            return [
                'id' => $task->id,
                'title' => $task->title,
                'due_at' => $task->due_at?->timezone($timezone)->toIso8601String(),
            ];
        })->values()->all();

        return new ToolHandlerResult(true, [
            'tasks' => $items,
        ], $items === [] ? 'You have no open tasks.' : 'Here are your open tasks.');
    }
}

List Tasks Tool

The anatomy of a tool:

Every tool needs three things:

A name: so the model has something to reference
A definition/schema: so the model understands what the tool does and what inputs it expects
A handler: the actual code that runs when the model calls the tool, passing the arguments into our program

For example, our “list tasks” tool (seen above) has its own schema that explains exactly how the model can invoke it. You can think of it like an OpenAPI spec, but for models.

class OpenAiClient implements ModelClientInterface
{
    public function generateToolCalls(array $messages, array $tools, array $settings = []): array
    {
        $apiKey = config('services.openai.api_key');
        $model = $settings['model'] ?? config('services.openai.model', 'gpt-4o-mini');
        $baseUri = rtrim((string) config('services.openai.base_uri', 'https://api.openai.com/v1'), '/');

        if (! $apiKey) {
            return [
                'error' => 'missing_api_key',
            ];
        }

        $payload = array_merge([
            'model' => $model,
            'messages' => $messages,
            'tools' => $tools,
            'tool_choice' => 'auto',
        ], $settings['payload'] ?? []);

        /** @var \Illuminate\Http\Client\Response $response */
        $response = Http::withToken($apiKey)
            ->post("{$baseUri}/chat/completions", $payload);

        return $response->json();
    }
}

We can make our model aware of our tools, sending it together with our message

Conversation and Memory Management

I also needed a way to preserve context, so I stored the full conversation between the user and the agent. When sending a message to the LLM, I include the relevant chat history along with it. That way, the model has the context it needs at the right time.

Conversation roles:

To keep things structured, I grouped messages into three roles:

User: messages sent by the user

Assistant: messages sent by the model/agent

Tool: messages generated as a result of tool calls (i.e., tool responses)

This makes it easier for the LLM to follow the thread: it can see what the user asked, what it responded with, what tool it triggered (and what came back), and where the conversation left off.

<?php

class ConversationService
{

    /**
     * @return array<string, mixed>|null
     */
    protected function toModelMessage(ConversationMessage $message): ?array
    {
        if ($message->role === 'user') {
            return [
                'role' => 'user',
                'content' => $message->content ?? '',
            ];
        }

        if ($message->role === 'assistant') {
            $payload = [
                'role' => 'assistant',
                'content' => $message->content,
            ];

            if ($message->tool_calls) {
                $payload['tool_calls'] = $message->tool_calls;
            }

            return $payload;
        }

        if ($message->role === 'tool') {
            return [
                'role' => 'tool',
                'tool_call_id' => $message->tool_call_id,
                'content' => $this->encodeToolResult($message->tool_result ?? []),
            ];
        }

        return null;
    }
}

Building conversation history for the LLM

Conversation rotations:

Rotating Conversations

To manage token limits and avoid sending an extra-large context window to the LLM, I implemented conversation rotation. The idea is that only the active conversation contributes to the history we send to the model, and only one conversation can be active at a time.

conversation (active/inactive) -> conversation messages

<?php


class ConversationService
{

    public function getOrCreate(User $user, NormalizedMessage $message): Conversation
    {
        $conversation = Conversation::query()
            ->where('user_id', $user->id)
            ->where('channel', $message->channel)
            ->where('external_user_id', $message->externalUserId)
            ->where('status', 'active')
            ->latest('last_message_at')
            ->first();

        if ($conversation && $this->shouldRotateConversation($conversation)) {
            $conversation->update(['status' => 'inactive']);
            $conversation = null;
        }

        if ($conversation) {
            return $conversation;
        }

        return Conversation::query()->create([
            'user_id' => $user->id,
            'channel' => $message->channel,
            'external_user_id' => $message->externalUserId,
            'status' => 'active',
            'last_message_at' => now(),
            'metadata' => null,
        ]);
    }
 }

Auto generate new active conversation to prevent large context and token limits

Now, how do we decide when to rotate a conversation? There are a few approaches I think we could take:

Idle time: rotate if the last interaction is older than x minutes/hours/days

Message count: rotate if the conversation has more than x messages

Token-based: if we assume 1 token ~= 4 characters, we can estimate token usage and rotate once we hit a cutoff

To keep things simple, I went with a blend of the first two. There are definitely better ways to do this. In fact, I can already see a potential issue, especially once we introduce an agentic loop:

<?php

namespace App\Services\Agent;

use App\Messaging\NormalizedMessage;
use App\Models\Conversation;
use App\Models\ConversationMessage;
use App\Models\User;

class ConversationService
{
    protected int $maxIdleMinutes = 1440;

    protected int $maxMessages = 50;

    protected function shouldRotateConversation(Conversation $conversation): bool
    {
        if (! $conversation->last_message_at) {
            return true;
        }

        $idleMinutes = $conversation->last_message_at->diffInMinutes(now());
        if ($idleMinutes >= $this->maxIdleMinutes) {
            return true;
        }

        $messageCount = $conversation->messages()->count();

        return $messageCount >= $this->maxMessages;
    }
}

Conversation rotation based on last message and message count

The upside is that we cap how much context we send at once. And if we want to go a step further, we can give the LLM a tool to search older conversations when it needs missing context, assuming it has enough of a clue about what to search for.

For example, a user might say: “I've completed the task we talked about last week Friday.” At that point, the agent probably won’t have last week’s messages in its current context, but it can still place a tool call to fetch conversation history from last week Friday and follow up from there.

The agentic loop

What really makes an agent an agent is its loop. If we want to break out of the request/response cycle, we need the system to be able to act, observe, and act again without requiring the user to keep nudging it.

To see why this matters, let’s reuse the earlier scenario. The user says: “I've completed all the tasks I listed last week Friday. We can mark them as done.”

If we decompose this message, we can see that involves multiple tool calls:

call list_task, filtered by date, to find the tasks from last week Friday

call update_task to mark each task as completed

And if we assume we don’t have bulk updates, and there are 3 tasks, that’s already three update calls.

The agentic loop

We can already start seeing the issue with request/response flow. It would be terrible UX to expect the user to always come back with the next instruction. Ideally, the agent’s 4 tool calls should happen in one go, without the user having to babysit the process. While those tool calls are running, it would be nice if the user stays informed (and relaxed) instead of wondering what’s happening behind the scenes. That’s where the agentic loop comes in. It basically translates to:

While this action is not yet complete, continue doing x

I'm sure you might be wondering, how can we tell if the agent's outcome is not yet complete. That's where the system prompt comes in. The brain child/soul of the agent.

You are a task and reminders assistant. Use the provided tools to take actions. Stay within tasks, reminders, and summaries. When you create a task, the system will automatically attach a reminder (check-in by default, or scheduled if due_at is provided). When the user wants to change a task, use update_task (find tasks first if needed to resolve references). Check-ins are random by default; use fixed intervals only when explicitly requested. All reminders require acknowledgement. If the user confirms or says done, dismiss the reminder. If they ask to delay, snooze it. When you need to send multiple user messages, respond with JSON: {"messages":["first","second"]}. When you want a follow-up turn without waiting for the user, respond with JSON: {"messages":[...],"next":{"type":"follow_up","reason":"..."}}. If you need to call tools, call them first and leave the message content empty; you will get another turn to respond after tool results are returned.

In that prompt, we’re explicitly telling the agent to follow up when it needs a follow-up. But because I have agent trust issues, I also added a breaker to cut the loop just in case the agent overlord decides it wants to loop forever.

The Ending

Putting it all together, we’ve walked through a rough anatomy of what makes an agent an agent: tool calling, memory and context management, the underlying program, the LLM, and the agentic loop. It's not an anatomy grounded in expertise and given the way the AI space evolves, this might become obsolete in a few months. Heck, there are already AI agents building AI agents. It is just what it is. It’s no longer a matter of what an agent is, but what it’s about to do next.

Check out the project: here