ABONAMENTE VIDEO REDACȚIA
RO
EN
NOU
Numărul 158
Numărul 157 Numărul 156 Numărul 155 Numărul 154 Numărul 153 Numărul 152 Numărul 151 Numărul 150 Numărul 149 Numărul 148 Numărul 147 Numărul 146 Numărul 145 Numărul 144 Numărul 143 Numărul 142 Numărul 141 Numărul 140 Numărul 139 Numărul 138 Numărul 137 Numărul 136 Numărul 135 Numărul 134 Numărul 133 Numărul 132 Numărul 131 Numărul 130 Numărul 129 Numărul 128 Numărul 127 Numărul 126 Numărul 125 Numărul 124 Numărul 123 Numărul 122 Numărul 121 Numărul 120 Numărul 119 Numărul 118 Numărul 117 Numărul 116 Numărul 115 Numărul 114 Numărul 113 Numărul 112 Numărul 111 Numărul 110 Numărul 109 Numărul 108 Numărul 107 Numărul 106 Numărul 105 Numărul 104 Numărul 103 Numărul 102 Numărul 101 Numărul 100 Numărul 99 Numărul 98 Numărul 97 Numărul 96 Numărul 95 Numărul 94 Numărul 93 Numărul 92 Numărul 91 Numărul 90 Numărul 89 Numărul 88 Numărul 87 Numărul 86 Numărul 85 Numărul 84 Numărul 83 Numărul 82 Numărul 81 Numărul 80 Numărul 79 Numărul 78 Numărul 77 Numărul 76 Numărul 75 Numărul 74 Numărul 73 Numărul 72 Numărul 71 Numărul 70 Numărul 69 Numărul 68 Numărul 67 Numărul 66 Numărul 65 Numărul 64 Numărul 63 Numărul 62 Numărul 61 Numărul 60 Numărul 59 Numărul 58 Numărul 57 Numărul 56 Numărul 55 Numărul 54 Numărul 53 Numărul 52 Numărul 51 Numărul 50 Numărul 49 Numărul 48 Numărul 47 Numărul 46 Numărul 45 Numărul 44 Numărul 43 Numărul 42 Numărul 41 Numărul 40 Numărul 39 Numărul 38 Numărul 37 Numărul 36 Numărul 35 Numărul 34 Numărul 33 Numărul 32 Numărul 31 Numărul 30 Numărul 29 Numărul 28 Numărul 27 Numărul 26 Numărul 25 Numărul 24 Numărul 23 Numărul 22 Numărul 21 Numărul 20 Numărul 19 Numărul 18 Numărul 17 Numărul 16 Numărul 15 Numărul 14 Numărul 13 Numărul 12 Numărul 11 Numărul 10 Numărul 9 Numărul 8 Numărul 7 Numărul 6 Numărul 5 Numărul 4 Numărul 3 Numărul 2 Numărul 1
×
▼ LISTĂ EDIȚII ▼
Numărul 158
Abonamente

From Zero to Agent: Designing and Deploying a Context-Aware Assistant

Timea Fodor
FullStack Developer @ msg systems Romania



PROGRAMARE


In recent years, the field of autonomous agents has gained significant momentum, driven by rapid advances in natural language processing, machine learning, and API integration. Agents, which are software entities capable of perceiving their environment, reasoning about it, and acting upon it—are increasingly embedded into digital workflows to enhance productivity, streamline decision-making, and enable higher levels of automation.

In this case study we will examine how such an agent can be implemented by the example of a of a Google-integrated assistant. Through this implementation, we explore how various components such as language models, external APIs, memory, and reasoning can be orchestrated into a functional, intelligent assistant. The assistant is capable of retrieving and modifying emails and calendar events, as well as gathering real-time information from the web and is capable of answering to both text and voice inputs.

Understanding Agents

At a high level, an agent is an entity that perceives its environment through sensors and acts upon it using actuators. In software, these concepts map to inputs such as user prompts or API data and outputs. An agent processes these inputs, applies reasoning logic, and executes actions that influence its digital context.

Agents are typically defined by a set of core properties:

Interaction style represents the fundamental models of agent behavior. We differentiate two agent behaviors called Reactive and Proactive. Reactive agents respond to changes and provide a real-time solution as these changes occur. On the other hand, proactive agents operate based on internal goals. They can initiate actions independently, anticipate future needs, and take steps before being explicitly prompted.

Autonomy is the agent's ability to act independently, making decisions and taking actions without direct human intervention. An autonomous agent is trusted to perform tasks on behalf of a user or system while adhering to a set of constraints or objectives.

Agents should also be able to make decision on their own. Beyond simply following a script, intelligent agents can make decisions about how to achieve a goal. This includes constructing or adjusting their workflow in real time, based on current context, available tools, and expected outcomes.

Finally, an agent should be adaptable, it should be able to learn from experience, whether through explicit user feedback or by observing the outcomes of their own actions. Over time, this allows them to improve performance, personalize behavior, and avoid repeating past mistakes.

These characteristics form the foundation of agent-based systems, and together enable agents to function with a degree of independence and intelligence that sets them apart from traditional rule-based automation.

System Design and Implementation of the Google Assistant App

The idea behind the application was to build an "assistant" that can retrieve emails and calendar events based on time intervals (today, this week, all time), contacts or subject; write and send new emails, and create or move calendar events. Besides these functionalities it was important that it can also search the web if it is needed.

It was designed in a way that it should be able to process not only text as input but also images. Hands-off usability was also taken into consideration when implementing the application: besides text, the input can be an audio recording as well, for cases when the user is not able to type their questions. In these cases, it is assumed that the user does not have the possibility to read a wall of text either, so if the input is spoken, the response will be returned as a voice recording.

The workflow of the agent application looks like this:

Figure 1: Workflow of the google agent application

The workflow of the agent application begins with user input, which when received, the system performs an initial content moderation step. A guardrail module evaluates the input for any obscene language or restricted instructions. If any violations are detected, the user is notified, and the process is halted. This ensures that the agent operates within safe and responsible boundaries.

If the input passes validation, the language model is prompted to generate a plan: a high-level sequence of steps the agent should take to achieve the user's goal. The first phase of this planning process involves tool selection: the agent assesses its available tools (e.g., email handler, calendar manager, web retriever) and determines whether it has sufficient information to proceed. If additional context is needed, it will request clarification from the user.

If the agent has the required information, it proceeds to a semantic retrieval phase, querying a PostgreSQL database with embeddings for records of similar past actions. These records include prior inputs, the tools used, and evaluations of those outcomes. By retrieving this historical data, the agent can reflect on past behavior and use that reflection to refine its current plan before execution.

Once the final plan is confirmed, the appropriate tool is executed, and the result is passed back to the LLM. The LLM then formats the output into a user-friendly message, delivered in either text or audio, depending on the original input modality.

The final phase of the workflow, although optional, is critical for building adaptive and intelligent agents: the feedback loop. If the agent was uncertain about which plan to pursue, or if the user chooses to provide input about the interaction, a feedback prompt is triggered. This feedback is stored in the database and linked to the associated plan and outcome. Over time, this mechanism enables the agent to learn from past interactions and improve the quality and relevance of its future plans.

Architectural Overview

Figure 2: Architectural Overview

The agent application was developed using Next.js and deployed on Vercel, providing a scalable and performant frontend environment. For data storage, Supabase was used as the backend platform, leveraging its integrated PostgreSQL database and pgvector extension to support semantic search over vector embeddings.

Given that the core use cases revolve around Google-specific tasks, such as reading and modifying Gmail messages, managing Google Calendar events, and retrieving information from Google Contacts, the system heavily relies on Google APIs for functionality. To support secure access to user data, OAuth2 is employed for authentication. For the agent's language capabilities, the system integrates Gemini 2.0 Flash as the underlying LLM, chosen for its speed and efficiency in tool-augmented use cases.

At a high level, an agentic application must include three core components: Planning, Action, and Memory. These components enable the agent to reason, execute, and adapt over time.

Planning

The Planning module is responsible for interpreting user input, reasoning through possible approaches, and generating a structured plan of action.

Action

Once a final plan is formed, the Action module carries out the steps using available tools. This module is usually implemented either by direct function calling or exposing tool functions through the Model Context Protocol (MCP)—a standardized interface that governs how LLMs interact with external services. In this application, the implemented tools include a web search retriever for real-time information retrieval and specific Google API integrations for email (Gmail) and calendar (Google Calendar) operations.

The Action module serves as the bridge between the LLM's reasoning and real-world execution, enabling the agent to generate real impact through API calls and content retrieval.

Figure 3: Modules

In more complex agents, an Orchestration layer coordinates the Planning and Action modules according to a specific pattern. One of the most prominent orchestration paradigms is called ReAct.

The ReAct (Reasoning and Acting) paradigm enables the agent to iteratively think and act toward solving a task. It has a workflow that consists of four steps that it repeats until the agent decides that the answer is acceptable.

These four steps are:

In contrast, in this application another paradigm named ReWOO (Reasoning WithOut Observation) was used, which eliminates the dependence on tool outputs for action planning. Instead, agents plan upfront. Redundant tool usage is avoided by anticipating which tools to use upon receiving the initial prompt from the user. ReWOO consists of three modules:

In this example, as stated before, ReWOO was used as the tasks are simple enough that continuous iterations of planning, action and reflecting are not needed. The model decomposes high-level goals into intermediate steps and consults a reflection mechanism that draws on previously recorded interactions stored in the database.

Using the results of a semantic query, the planner incorporates similar past actions and the feedback associated with them to improve the current decision. This reflective reasoning loop allows the planner to update or refine its course of action before proceeding.

Memory

The Memory module enables the agent to adapt and personalize over time. It should include at least one of short- and long-term memories, sometimes both of them are combined.

Short-term memory provides context awareness during a session. This typically includes recent user messages, agent responses, and active goals. It helps the LLM maintain coherence and reference prior turns within the same interaction.

Long-term memory stores data, such as past plans, reasoning traces, user preferences, and feedback.

In this system, short-term memory is implemented using session storage, which feeds the conversation history into the LLM's context window. For long-term memory, Supabase with pgvector is used to store and retrieve vectorized embeddings of past interactions. These embeddings enable semantic search, allowing the system to surface contextually relevant examples based on the current query.

This feedback-enhanced memory architecture allows the agent not just to recall, but to learn or adapt its behavior from past decisions. By evaluating the outcomes and user feedback of previous interactions, the system can better inform future planning and so, bringing the agent closer to adaptive, experience-based decision-making.

Conclusion

As this case study has demonstrated, building agents is not solely about leveraging large language models. It is about orchestrating memory, reasoning, and tools into a unified, adaptive system capable of complex, autonomous decision-making. These components, when integrated effectively, allow agents to transcend the limitations of static automation and respond dynamically to user needs.

Modern agents excel at automating intricate workflows, operating across multiple modalities, and navigating ambiguity with contextual intelligence. However, these capabilities introduce new design challenges. Agentic systems are often resource-intensive, both computationally and architecturally. They demand thoughtful permissioning and robust security frameworks, particularly when handling sensitive data or interacting with external services. Moreover, their open-ended reasoning introduces the risk of unpredictable behaviors, especially in loosely defined or edge-case scenarios.

These tradeoffs prompt a key design consideration: Do we need an agentic workflow, or a full-fledged agent? The answer lies in the complexity and variability of the task. For straightforward, repetitive functions, simpler tools remain more efficient and easier to control. But in contexts requiring initiative, context retention, and system integration, agents provide a compelling and scalable solution.

As agentic technologies continue to evolve, the boundary between tool and collaborator will become increasingly fluid. The agent introduced in this work serves as a prototype of what such systems can achieve today- and a foundation for what they may become in the future. Through the integration of memory, reasoning, and action, agents are poised to redefine how we interact with software- moving from static interfaces to context-aware, goal-oriented partners.

References

[1] "ReAct Prompting" Prompting Guide,

[2] Liu, Y., et al. "Agent Design Pattern Catalouge: A collection of Architectural Patterns for FoundationModel Based Agents" arXiv, 2024

[3] Shengran, H., et al. "Automated Design of Agentic Systems" arXiv,

[4] "AI Agent Workflow Design Patterns: An Overview." Craig Li, Medium,

[5] Abramson, J., et al. " Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback" arXiv,

[6] "What Are AI Agents?" Geekflare,

[7] "Intelligent Agent." Wikipedia,

[8] Sapkota, R., et al. "AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges" arXiv,

[9] Wang, L., et al. "A Survey on Large Language Model based Autonomous Agents" arXiv,

[10] "What Are Intelligent Agents in the Context of AI?" Milvus,

[11] "[What are AI Agents.](https://www.ibm.com/think/topics/ai-agents )" IBM,

[12] "Intelligent Agent - Glossary Definition." Sapien,

[13] "Building AI Agents from Scratch." Swirl AI Newsletter,

[14] "Intelligent Agent Characteristics." Smythos,

[15] "Components of AI Agents." IBM,

[16] Xu, B., et al. "ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models" arXiv,

[17] Cash Macanaya, Unsplash,

Conferință TSM

NUMĂRUL 157 - Summertime coding

Sponsori

  • BT Code Crafters
  • Bosch
  • Betfair
  • MHP
  • BoatyardX
  • .msg systems
  • P3 group
  • Ing Hubs
  • Cognizant Softvision
  • GlobalLogic

INTERVIU