FoodFight Agent

4/2/2025

Problem Statement

Internal operations like reading active contests, creating test bets, and querying platform state all require navigating the dev panel. This is slow for our team and can be quite confusing, especially for those who aren’t as familiar with the dev panel. There is currently no conversational interface for interacting with the FoodFight platform programmatically.

To address this issue, we can introduce an agentic workflow that understands the platform domain and can execute actions on behalf of a user through natural language. The two immediate surfaces are the dev panel and a Mattermost bot for the broader team. The initial scope will be via contest reading and creation, with a clear path to expanding to other actions (querying user state, triggering notifications, user analytics, etc.) as new tools are added. Additionally allowing for players to trigger their own contests via the FoodFight platform directly could allow us to expand the use of the agent beyond internal operations and into a player-facing feature.

Proposed Solution

Deploy a FoodFight Agent backed by the OpenAI Agents SDK across two Lambdas using an async polling pattern. Agent turns can involve multiple LLM round-trips and tool calls — polling decouples submission from processing and removes the API Gateway 29s timeout as a constraint entirely.

agent-api: three endpoints — POST /agent/invoke accepts { session_id, user_id, message }, writes a PENDING job, fires a fire-and-forget async invocation of agent-runner, and returns { job_id } immediately; GET /agent/status/{job_id} reads agent_jobs and returns { status, response? }; GET /agent/sessions returns a user’s past sessions ordered by most recent so the dev panel can render them for selection
agent-runner: invoked asynchronously (no API Gateway in the loop), loads session history from Aurora, runs Runner.run(), writes the response to agent_jobs, and updates session history

Both dev panel and Mattermost authenticate via API key. The dev panel polls GET /agent/status/{job_id} every 1s after submitting. Mattermost uses slash commands (short-term) or an MCP agent bot (long-term) — see the Mattermost section below.

Architectural & Technical Details

  Dev Panel  ──POST /agent/invoke────▶  API Gateway (API Key)
  Mattermost ──Slash Cmd─────────────▶          │
                                                ▼
                                          agent-api Lambda
                                          ├─ write PENDING job (Aurora Serverless v2)
                                          └─ invoke async ───────────────────────▶ agent-runner Lambda
                                                                                   ├─ load history  (Aurora Serverless v2)
                                                                                   ├─ Runner.run(agent)
                                                                                   │    ├─ get_promo_fights   ──▶ backend/libs (SQLAlchemy)
                                                                                   │    ├─ create_promo_fight ──▶ backend/libs (SQLAlchemy)
                                                                                   │    └─ accept_promo_fight ──▶ backend/libs (SQLAlchemy)
                                                                                   ├─ save history  (Aurora Serverless v2)
                                                                                   └─ update agent_jobs → COMPLETE

  Dev Panel  ──GET /agent/status/{id}─▶  API Gateway (API Key)
  Dev Panel  ──GET /agent/sessions─────▶          │
                                                  ▼
                                            agent-api Lambda
                                            ├─ read agent_jobs    (Aurora Serverless v2)
                                            └─ list agent_sessions (Aurora Serverless v2)

Request lifecycle:

Client POSTs { session_id, user_id, message } to POST /agent/invoke
agent-api generates a job_id, writes a PENDING row to agent_jobs, fires agent-runner via async Lambda invocation (InvokeType=Event, fire-and-forget), and returns { job_id } in under 1s
agent-runner loads session history from Aurora (SELECT ... FOR UPDATE), appends the user message, calls Runner.run(), upserts history back, and updates agent_jobs to COMPLETE with the response
Client polls GET /agent/status/{job_id} every 1s until status === "COMPLETE"

Tool implementation — Tools are @function_tool decorated Python functions. The agent treats them as black boxes — it only sees the name, docstring, and parameters. The implementation detail of how a tool fetches or writes data is irrelevant to the agent.

Since this is a monorepo and bet_service already runs as a Lambda backed by backend/libs, the agent Lambda can import the same shared libs directly — using the SQLAlchemy models and service layer from libs/db and libs/schemas to query and write to the DB without any inter-service call. This is the simplest approach: no HTTP overhead, no Lambda-to-Lambda invocation, no API Gateway cost per tool call.

Promo fights are the FoodFight concept for contests. The relevant schemas are LiveBetBase (creation, with bet_type="promotion") and PromoBase (accept/delete).

get_promo_fights(restaurant_id: int | None) — queries DB via libs/db models, filtered by restaurant or all venues
create_promo_fight(restaurant_id: int, maker_outcome: int, restaurant_items: list[MenuItemOrder], takeout_type: int, maker_address: str, maker_payment_intent: str) — writes to DB using LiveBetBase with bet_type="promotion"
accept_promo_fight(restaurant_id: int, user_preferred_outcome: int, bet_id: int | None, menu_item_ids: list[int] | None) — writes to DB using PromoBase

The agent starts read-only — only get_promo_fights is exposed initially. Write operations (create_promo_fight, accept_promo_fight) may be enabled once behavior is validated. DELETE operations will never be exposed as tools.

Session & job storage

Aurora Serverless v2 (min ACU = 0, scale-to-zero) is used for both session history and job results. Scale-to-zero keeps costs at $0 during idle periods — the DB pauses automatically after 5 minutes of inactivity and resumes on the next connection. Resume latency is 15–30s.

To hide this cold start: when the dev panel agent page loads, fire a lightweight warmup query (SELECT 1) immediately. By the time the user types and submits their first message, Aurora is already warm.

RDS Proxy is not used — it maintains persistent connections that prevent Aurora from auto-pausing, defeating scale-to-zero. For this low-volume internal tool, direct Lambda-to-Aurora connections are sufficient.

CREATE TABLE agent_sessions (
    session_id  UUID PRIMARY KEY,
    user_id     TEXT NOT NULL,
    history     JSONB NOT NULL DEFAULT '[]',
    created_at  TIMESTAMP DEFAULT NOW(),
    updated_at  TIMESTAMP DEFAULT NOW()
);

CREATE INDEX ON agent_sessions (user_id);

CREATE TABLE agent_jobs (
    job_id          UUID PRIMARY KEY,
    session_id      UUID NOT NULL REFERENCES agent_sessions(session_id),
    status          TEXT NOT NULL DEFAULT 'PENDING',  -- PENDING | COMPLETE | ERROR
    response        TEXT,
    history_tokens  INT,
    history_limit   INT NOT NULL DEFAULT 100000,
    created_at      TIMESTAMP DEFAULT NOW(),
    updated_at      TIMESTAMP DEFAULT NOW()
);

Use SELECT ... FOR UPDATE when loading session history to prevent race conditions on concurrent messages in the same session.

Context management

Session history grows with each turn. For the dev panel this is manageable — conversations are short and user-scoped. For Mattermost, where conversations can span many turns, we use tiktoken to track token usage.

import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o")

def count_tokens(history: list) -> int:
    return sum(len(enc.encode(str(m))) for m in history)

async def compact_history(history: list) -> list:
    summary_result = await Runner.run(summarize_agent, input=history)
    return [{"role": "assistant", "content": f"[Summary of earlier context]: {summary_result.final_output}"}]

Before each Runner.run(), the token count is checked against the 100k limit. If exceeded, a summarize_agent run compacts earlier turns into a single summary block before the main agent runs — rather than simply dropping old turns. After the run completes, history_tokens and history_limit are stored on the agent_jobs row. The client reads these from GET /agent/status/{job_id} and can warn the user when context is getting long.

Infrastructure notes

Lambda	Triggers	Timeout	Notes
`agent-api`	`POST /agent/invoke` + `GET /agent/status/{id}` + `GET /agent/sessions`	5s	No LLM calls; DB reads/writes and async invocation
`agent-runner`	Async invocation (`EVENT` mode)	15 min	No API Gateway in the loop; no 29s limit

Set MaximumRetryAttempts=0 on agent-runner to prevent duplicate agent runs if the function times out
Configure an on-failure Lambda Destination (SQS or SNS) on agent-runner to capture failed jobs
Aurora Serverless v2: min ACU = 0 for scale-to-zero; direct Lambda connections (no RDS Proxy)

Surface	Auth	Session ID	Multi-turn
Dev Panel	API key	User ID	Yes
Mattermost (now)	API key	None (stateless)	No
Mattermost (MCP)	TBD	N/A (stateless)	Yes (bot-managed)

Mattermost integration

Webhooks are not suitable here — they cannot support individual conversations. Two phased approaches:

Short-term: Slash commands — A user runs /foodfight <message> in any Mattermost channel. Mattermost POSTs to agent-api; agent-runner processes the message; the slash command uses Mattermost’s response_url mechanism to deliver the async reply once polling resolves COMPLETE. Each invocation is single-turn — there is no stable session_id to carry context between commands. The Mattermost bot APIs are fairly limited out-of-the-box, so we may need to expand this API to support more general workflows over time.

Long-term: MCP agent bot — Expose FoodFight tools as an MCP server. Mattermost’s agent bot framework connects to the MCP server and owns the conversation context — our MCP server is stateless and only handles tool execution. Multi-turn continuity is managed entirely by the Mattermost agent bot, not by us.

Code Snippet

Two Lambda handlers. agent_api follows the same FastAPI + Mangum pattern used across all FoodFight services. agent_runner is a plain async Lambda handler with no time pressure.

Note on DB access: The pseudocode below uses a generic db object for clarity. In practice, we’d use SQLAlchemy (via libs/db) with the asyncpg dialect — Aurora Postgres is fully wire-compatible with standard Postgres, so SQLAlchemy works without any changes. The agent_sessions and agent_jobs tables would be defined as SQLAlchemy models in libs/db, consistent with how all other services access the DB.

# pseudocode — not production ready

# --- agent_api/main.py ---

from fastapi import FastAPI
from mangum import Mangum
from pydantic import BaseModel
from routes import agent_routes

app = FastAPI()
app.include_router(agent_routes.router)

handler = Mangum(app)  # Lambda entry point


# --- agent_api/routes/agent_routes.py ---

from fastapi import APIRouter

router = APIRouter(prefix="/agent", tags=["agent"])


class InvokeRequest(BaseModel):
    session_id: str
    user_id: str
    message: str


@router.post("/invoke")
async def invoke(body: InvokeRequest):
    job_id = str(uuid.uuid4())

    await db.execute(
        "INSERT INTO agent_sessions (session_id, user_id) VALUES ($1, $2) ON CONFLICT DO NOTHING",
        body.session_id, body.user_id
    )
    await db.execute(
        "INSERT INTO agent_jobs (job_id, session_id, status) VALUES ($1, $2, 'PENDING')",
        job_id, body.session_id
    )

    lambda_client.invoke(
        FunctionName=RUNNER_FUNCTION_NAME,
        InvocationType="Event",  # fire-and-forget
        Payload=json.dumps({"job_id": job_id, "session_id": body.session_id, "message": body.message, "user_id": body.user_id})
    )

    return {"job_id": job_id}


@router.get("/status/{job_id}")
async def status(job_id: str):
    row = await db.fetchrow(
        "SELECT status, response, history_tokens, history_limit FROM agent_jobs WHERE job_id = $1", job_id
    )
    return {
        "status": row["status"],
        "response": row["response"],
        "history_tokens": row["history_tokens"],
        "history_limit": row["history_limit"],
    }


@router.get("/sessions")
async def sessions(user_id: str):
    rows = await db.fetch(
        "SELECT session_id, created_at, updated_at FROM agent_sessions WHERE user_id = $1 ORDER BY updated_at DESC",
        user_id
    )
    return {"sessions": [dict(r) for r in rows]}


# --- agent_runner/main.py (separate Lambda, triggered via async invocation) ---

async def agent_runner(event, context):
    job_id = event["job_id"]
    session_id = event["session_id"]
    user_message = event["message"]
    user_id = event["user_id"]

    history = await db.fetchval(
        "SELECT history FROM agent_sessions WHERE session_id = $1 FOR UPDATE",
        session_id
    ) or []

    history.append({"role": "user", "content": user_message})

    if count_tokens(history) > 100_000:
        history = await compact_history(history)

    result = await Runner.run(agent, input=history)

    updated_history = result.to_input_list()
    token_count = count_tokens(updated_history)

    await db.execute(
        """
        INSERT INTO agent_sessions (session_id, history, user_id)
        VALUES ($1, $2, $3)
        ON CONFLICT (session_id) DO UPDATE SET history = $2, updated_at = NOW()
        """,
        session_id,
        json.dumps(updated_history),
        user_id
    )

    await db.execute(
        """
        UPDATE agent_jobs
        SET status = 'COMPLETE', response = $1, history_tokens = $2, updated_at = NOW()
        WHERE job_id = $3
        """,
        result.final_output, token_count, job_id
    )

result.to_input_list() returns the full updated history including tool call requests and results — this is what gives the model complete context on prior tool invocations in subsequent turns.

Alternatives

OpenAI Agents SDK vs. Anthropic Claude Agent SDK

We are using the OpenAI Agents SDK and plan to stay there. We have free credits and the cost profile is significantly cheaper at our current scale. The SDK surface is also the right fit for this use case: @function_tool, RunHooks, Runner.run(), and to_input_list() cover everything we need with minimal boilerplate.

Switching SDKs is a non-trivial migration. Tools are not portable — the two SDKs have meaningfully different APIs at every layer:

	OpenAI Agents SDK	Anthropic Claude Agent SDK
Tool definition	`@function_tool` with typed params, auto-schema from type hints	`@tool(name, desc, schema_dict)` with explicit schemas
Tool arguments	Typed function parameters (`def get_promo_fights(restaurant_id: int)`)	Dict-based (`def get_promo_fights(args: dict)`)
Tool return	Plain string or object	Must wrap in `{"content": [{"type": "text", "text": ...}]}`
Run loop	`await Runner.run(agent, input=history)`	`async for msg in query(prompt=..., options=...)`
Conversation history	Manual via `result.to_input_list()` or Session backends	Automatic via session resumption (`resume=session_id`)
Hooks	Subclass `RunHooks`, override `on_tool_start` / `on_tool_end`	Register callback functions with `HookMatcher` regex patterns

A migration would require rewriting every @function_tool decorator and function signature, all history management, the run loop call sites, and any hooks. Realistically a few days of work for this agent, with risk of subtle behavioural differences.

The Anthropic SDK would be worth revisiting if we need its built-in tools (Read, Write, Bash, Grep, etc.) or better performance on complex multi-tool tasks at higher volume, but it is not a drop-in swap.

Why not the simple `openai.ChatCompletion.create()` API?

While we could implement a simple turn-based agent loop ourselves using the standard ChatCompletion API, the OpenAI Agents SDK provides a lot of value out of the box:

Automatic tool schema generation from Python function signatures and docstrings.
Built-in support for multi-turn conversations with tool calls and results included in the context.
Hooks for logging, analytics, or custom behavior on tool calls
A clean abstraction layer that keeps the agent logic focused on defining tools and handling results, rather than managing the conversation loop and context formatting manually.

Next Steps

Finalize tool signatures for get_promo_fights, create_promo_fight, and accept_promo_fight
Wire up dev panel chat UI to POST /agent/invoke and poll GET /agent/status/{job_id}
Implement dev panel Aurora warmup query on agent page load
Set up Mattermost slash command integration
Define token warning threshold for client-side context warnings

Approvals

You need architectural approval from Trace Carrasco & product approval from Filip Pacyna / Troy Lenihan

Trace Carrasco
Filip Pacyna/Troy Lenihan