TSM - From Zero to Agent: Designing and Deploying a Context-Aware Assistant

Timea Fodor - FullStack Developer @ msg systems Romania

In recent years, the field of autonomous agents has gained significant momentum, driven by rapid advances in natural language processing, machine learning, and API integration. Agents, which are software entities capable of perceiving their environment, reasoning about it, and acting upon it—are increasingly embedded into digital workflows to enhance productivity, streamline decision-making, and enable higher levels of automation.

In this case study we will examine how such an agent can be implemented by the example of a of a Google-integrated assistant. Through this implementation, we explore how various components such as language models, external APIs, memory, and reasoning can be orchestrated into a functional, intelligent assistant. The assistant is capable of retrieving and modifying emails and calendar events, as well as gathering real-time information from the web and is capable of answering to both text and voice inputs.

Understanding Agents

At a high level, an agent is an entity that perceives its environment through sensors and acts upon it using actuators. In software, these concepts map to inputs such as user prompts or API data and outputs. An agent processes these inputs, applies reasoning logic, and executes actions that influence its digital context.

Interaction style represents the fundamental models of agent behavior. We differentiate two agent behaviors called Reactive and Proactive. Reactive agents respond to changes and provide a real-time solution as these changes occur. On the other hand, proactive agents operate based on internal goals. They can initiate actions independently, anticipate future needs, and take steps before being explicitly prompted.

Autonomy is the agent's ability to act independently, making decisions and taking actions without direct human intervention. An autonomous agent is trusted to perform tasks on behalf of a user or system while adhering to a set of constraints or objectives.

Agents should also be able to make decision on their own. Beyond simply following a script, intelligent agents can make decisions about how to achieve a goal. This includes constructing or adjusting their workflow in real time, based on current context, available tools, and expected outcomes.

Finally, an agent should be adaptable, it should be able to learn from experience, whether through explicit user feedback or by observing the outcomes of their own actions. Over time, this allows them to improve performance, personalize behavior, and avoid repeating past mistakes.

These characteristics form the foundation of agent-based systems, and together enable agents to function with a degree of independence and intelligence that sets them apart from traditional rule-based automation.

System Design and Implementation of the Google Assistant App

The idea behind the application was to build an "assistant" that can retrieve emails and calendar events based on time intervals (today, this week, all time), contacts or subject; write and send new emails, and create or move calendar events. Besides these functionalities it was important that it can also search the web if it is needed.

It was designed in a way that it should be able to process not only text as input but also images. Hands-off usability was also taken into consideration when implementing the application: besides text, the input can be an audio recording as well, for cases when the user is not able to type their questions. In these cases, it is assumed that the user does not have the possibility to read a wall of text either, so if the input is spoken, the response will be returned as a voice recording.

The workflow of the agent application begins with user input, which when received, the system performs an initial content moderation step. A guardrail module evaluates the input for any obscene language or restricted instructions. If any violations are detected, the user is notified, and the process is halted. This ensures that the agent operates within safe and responsible boundaries.

If the input passes validation, the language model is prompted to generate a plan: a high-level sequence of steps the agent should take to achieve the user's goal. The first phase of this planning process involves tool selection: the agent assesses its available tools (e.g., email handler, calendar manager, web retriever) and determines whether it has sufficient information to proceed. If additional context is needed, it will request clarification from the user.

If the agent has the required information, it proceeds to a semantic retrieval phase, querying a PostgreSQL database with embeddings for records of similar past actions. These records include prior inputs, the tools used, and evaluations of those outcomes. By retrieving this historical data, the agent can reflect on past behavior and use that reflection to refine its current plan before execution.

Once the final plan is confirmed, the appropriate tool is executed, and the result is passed back to the LLM. The LLM then formats the output into a user-friendly message, delivered in either text or audio, depending on the original input modality.

The final phase of the workflow, although optional, is critical for building adaptive and intelligent agents: the feedback loop. If the agent was uncertain about which plan to pursue, or if the user chooses to provide input about the interaction, a feedback prompt is triggered. This feedback is stored in the database and linked to the associated plan and outcome. Over time, this mechanism enables the agent to learn from past interactions and improve the quality and relevance of its future plans.

Architectural Overview

The agent application was developed using Next.js and deployed on Vercel, providing a scalable and performant frontend environment. For data storage, Supabase was used as the backend platform, leveraging its integrated PostgreSQL database and pgvector extension to support semantic search over vector embeddings.

Given that the core use cases revolve around Google-specific tasks, such as reading and modifying Gmail messages, managing Google Calendar events, and retrieving information from Google Contacts, the system heavily relies on Google APIs for functionality. To support secure access to user data, OAuth2 is employed for authentication. For the agent's language capabilities, the system integrates Gemini 2.0 Flash as the underlying LLM, chosen for its speed and efficiency in tool-augmented use cases.

At a high level, an agentic application must include three core components: Planning, Action, and Memory. These components enable the agent to reason, execute, and adapt over time.

Planning

The Planning module is responsible for interpreting user input, reasoning through possible approaches, and generating a structured plan of action.

Action

Once a final plan is formed, the Action module carries out the steps using available tools. This module is usually implemented either by direct function calling or exposing tool functions through the Model Context Protocol (MCP)—a standardized interface that governs how LLMs interact with external services. In this application, the implemented tools include a web search retriever for real-time information retrieval and specific Google API integrations for email (Gmail) and calendar (Google Calendar) operations.

The Action module serves as the bridge between the LLM's reasoning and real-world execution, enabling the agent to generate real impact through API calls and content retrieval.

In more complex agents, an Orchestration layer coordinates the Planning and Action modules according to a specific pattern. One of the most prominent orchestration paradigms is called ReAct.

The ReAct (Reasoning and Acting) paradigm enables the agent to iteratively think and act toward solving a task. It has a workflow that consists of four steps that it repeats until the agent decides that the answer is acceptable.

In contrast, in this application another paradigm named ReWOO (Reasoning WithOut Observation) was used, which eliminates the dependence on tool outputs for action planning. Instead, agents plan upfront. Redundant tool usage is avoided by anticipating which tools to use upon receiving the initial prompt from the user. ReWOO consists of three modules:

In this example, as stated before, ReWOO was used as the tasks are simple enough that continuous iterations of planning, action and reflecting are not needed. The model decomposes high-level goals into intermediate steps and consults a reflection mechanism that draws on previously recorded interactions stored in the database.

Using the results of a semantic query, the planner incorporates similar past actions and the feedback associated with them to improve the current decision. This reflective reasoning loop allows the planner to update or refine its course of action before proceeding.

Memory

The Memory module enables the agent to adapt and personalize over time. It should include at least one of short- and long-term memories, sometimes both of them are combined.

Short-term memory provides context awareness during a session. This typically includes recent user messages, agent responses, and active goals. It helps the LLM maintain coherence and reference prior turns within the same interaction.

Long-term memory stores data, such as past plans, reasoning traces, user preferences, and feedback.

In this system, short-term memory is implemented using session storage, which feeds the conversation history into the LLM's context window. For long-term memory, Supabase with pgvector is used to store and retrieve vectorized embeddings of past interactions. These embeddings enable semantic search, allowing the system to surface contextually relevant examples based on the current query.

This feedback-enhanced memory architecture allows the agent not just to recall, but to learn or adapt its behavior from past decisions. By evaluating the outcomes and user feedback of previous interactions, the system can better inform future planning and so, bringing the agent closer to adaptive, experience-based decision-making.

Conclusion

As this case study has demonstrated, building agents is not solely about leveraging large language models. It is about orchestrating memory, reasoning, and tools into a unified, adaptive system capable of complex, autonomous decision-making. These components, when integrated effectively, allow agents to transcend the limitations of static automation and respond dynamically to user needs.

Modern agents excel at automating intricate workflows, operating across multiple modalities, and navigating ambiguity with contextual intelligence. However, these capabilities introduce new design challenges. Agentic systems are often resource-intensive, both computationally and architecturally. They demand thoughtful permissioning and robust security frameworks, particularly when handling sensitive data or interacting with external services. Moreover, their open-ended reasoning introduces the risk of unpredictable behaviors, especially in loosely defined or edge-case scenarios.

These tradeoffs prompt a key design consideration: Do we need an agentic workflow, or a full-fledged agent? The answer lies in the complexity and variability of the task. For straightforward, repetitive functions, simpler tools remain more efficient and easier to control. But in contexts requiring initiative, context retention, and system integration, agents provide a compelling and scalable solution.

As agentic technologies continue to evolve, the boundary between tool and collaborator will become increasingly fluid. The agent introduced in this work serves as a prototype of what such systems can achieve today- and a foundation for what they may become in the future. Through the integration of memory, reasoning, and action, agents are poised to redefine how we interact with software- moving from static interfaces to context-aware, goal-oriented partners.

References

[5] Abramson, J., et al. " Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback" arXiv,