Everything is in the README below but optionally you can checkout the slidedeck.
Youβll get free Google Cloud access during the eventβno credit card needed.
-
Make sure youβre signed into a Gmail/Google account.
-
Open our eventβs unique link: https://trygcp.dev/claim/gdg-other-ai-accelerate-hack
-
Follow the prompts to activate your account and access the Google Cloud Console.
After completing the steps to access the Google Cloud Console, you can generate your Gemini API Keys for free. You can find the keys be going to Google AI Studio (https://aistudio.google.com/api-keys) with the same google account used in the steps above.

Google Cloud Platform offers all the tools of the google cloud development environment to build your applications. It also offers free Gemini API Keys so you can leverage the full potential of generative AI during today's hacking!
These are some of the services the GCP offers ranging from backend & storage to scheduling and ML workloads. Some of them might be interesting to leverage during this case, especially the services enablign AI services!
- Vertex AI : Train/host custom models, embeddings, batch prediction, Model Garden.
- Vision API : Labels, OCR, object detection.
- Speech-to-Text / Text-to-Speech : Voice features.
- Translation API : Multilingual apps fast.
- Vertex AI Workbench : Managed notebooks; GPUs/TPUs if available.
GCP offers a broad range of other services to allow for cloud deployment of applications. Feel free to check out the possibilities of GCP for your hacking today or future projects!
Welcome! This repository contains everything you need to build and evaluate an AI agent using Google's Agent Development Kit (ADK).
"In the era of ChatGPT, clients have ever growing expectations of AI systems. They expect every question to be answered β instantly, flawlessly, and magically. From "Which came first β the chicken or the egg?" to "ChatGPT, can you solve my project?" β they want answers, now.
Unfortunately, although modern AI models can already generate impressive outputs, they still lack the tools and capabilities to solve the most complex questions some of our clients have.
Throughout the years, we have compiled these difficult questions into a dataset to evaluate our internal AI systems. But after years of development, we still have not been able to create a system that satisfies our most critical customers. So today, we are asking for your help!
Your challenge is to leverage an AI agent, augmenting their capabilities through the right design, tool use, prompting, and coordination β to find the correct answers to a wide range of complex questions that require multi-step reasoning and digging through files. From history to science, from video interpretation to riddles β your system should think, not just predict."
An AI agent that can interpret questions (and possibly files) and return accurate answers.
The questions require your agent to:
- Autonomously extract information from files
- Search the web for relevant information
- Reason about information step-by-step
- Handle multiple topics and modalities (text, images, PDFs, etc.)
What is an Agent?
An agent is an LLM in a loop with the ability to invoke tools (Python functions - which in turn can invoke APIs) and use the output of these tools in the next loop iteration.
This challenge will help you get hands-on experience with building agents and custom tools. Throughout the day, questions of increasing difficulty will be released sequentially - covering different topics, modalities, and durations to test your agent's accuracy and speed.
1. Technical Deliverable
- Working AI agent code in this repository
- Code must be readable and well-structured
- The agent code MUST be the same as the one used in your demo
2. Presentation (Short & Functional)
Your presentation should have two sections:
Section One: Demo & Insights
- Show your agent in action
- Share key learnings and insights from development
- Discuss interesting technical challenges you solved
Section Two: Real-World Application
- Demonstrate a potential business application
- Showcase ROI and measurable business value
- Present how the agent delivers impact in a realistic context
| Criteria | Weight | Guiding Questions |
|---|---|---|
| Agent Answer Accuracy | 40% | Was the answer given by your agent the same as the ground truth answer in our hidden test set? |
| Agent Answer Speed | 10% | What was the average time required to generate correct answers for all questions? |
| Presentation Section One | 20% | Did we clearly see the agent in action? Did we receive interesting insights and learnings? |
| Presentation Section Two | 30% | Did we clearly see the potential and value of the agent for an original use case? |
We will clone your repository and also evaluate the authenticity of your agent. To allow a smooth evaluation, make sure to properly update the python requirements if you are adding novel packages.
benchmark/train.json) to develop your agent, but avoid overfitting to it!
Scoring System:
We use a 100-point system divided across these criteria and teams.
The weight represents the total points available for that criterion. Points are distributed among teams based on performance:
- Highest-performing team: 25% of the category's points
- Second team: 15%
- Third team: 10%
- Fourth team: 8%
- Fifth team: 6%
- All other teams: Equally share the remaining points
The team with the most total points wins!
Code Submission:
- Copy our repository and recreate it on your own repository by using a git fork (clone)
- You can push your code to this newly created GitHub repository copy as desired (to your own repository). Pushing allows you to share your code with your teammates and allows us to see your progress to better help you along the way.
- Give all of the ML6 evaluators access to this repository by (settings > collaborators): usernames are "clemvg", "MatthiasCami8", "TheoDepr", "cas-ML6".
- Keep code readable and well-documented
- Deadline: Last commit before the specified deadline will be evaluated!
Presentation Submission:
- Share a link to your Google Slides presentation
- Presentation must include: Intro, Section One (demo), and Section Two (business case)
- Grant access to
cas.coopman@ml6.eufor the slide deck AND any referenced files (videos, etc.)
Your main workspace: my_agent/ folder
This is where you'll spend most of your time:
my_agent/agent.py- Define your agent's configuration and capabilitiesmy_agent/tools/- Add custom tools/functions for your agent to use
Other folders (scaffolding - do not modify):
utils/- Infrastructure code for running and evaluating agentsbenchmark/- Train benchmarkevaluate.py- Evaluation script (feel free to read and understand it!)
You'll need:
- Python 3.9 or higher
- A Google API key (We will provide one)
macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | shWindows:
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"Or via Homebrew (macOS):
brew install uvStep 1: Copy the environment file
# From the project root, navigate to my_agent folder
cd my_agent
# Copy the example environment file
# macOS/Linux
cp .local_env .env
# Windows
copy .local_env .envStep 2: Add your API key
Open my_agent/.env and replace the placeholder with your actual Google API key:
GOOGLE_API_KEY="your_actual_api_key_here"
Step 3: Install dependencies
# Go back to project root
cd ..
# Install all dependencies
uv syncThat's it! You're ready to start developing.
Use the ADK web interface to test and debug your agent interactively:
uv run adk webThen open http://127.0.0.1:8000 in your browser. This gives you:
- Live chat interface to test your agent
- Session history to review conversations
- Real-time debugging
Note: Session history is lost when you stop the server! To view previous evaluation sessions, see the Advanced: Viewing Evaluations in the Web UI section below.
Test your agent against the train dataset:
Evaluate all questions:
uv run python evaluate.pyEvaluate a specific question:
uv run python evaluate.py --question 0(Replace 0 with any question index)
Save results to a custom file:
uv run python evaluate.py --output my_results.jsonResults include:
- Total accuracy percentage
- Detailed breakdown per question
- Agent responses and expected answers
- Evaluation method used (string match or LLM judge)
- Timing metrics - Response time for each question and overall averages
The evaluation tracks response times and provides:
- Average Response Time (All): Average time across all questions
- Average Response Time (Correct Only): Average time for correctly answered questions only
π‘ Tip: This evaluation runs on the training set (benchmark/train.json). Your final score will be based on a hidden test set with similar questions, so focus on building a robust, generalizable agent rather than memorizing answers!
Open my_agent/agent.py:
from google.adk.agents import llm_agent
from my_agent.tools import web_search
root_agent = llm_agent.Agent(
model='gemini-2.5-flash-lite', # or other options such as 'gemini-2.0-flash'
name='agent',
description="A helpful assistant that can answer questions.",
instruction="You are a helpful assistant...", # Customize this!
tools=[web_search.web_search], # Add your tools here
)Key things to customize:
instruction: This is your agent's system prompt - be specific about how it should behavetools: Add custom tools to extend your agent's capabilitiesmodel: Choose the best model for your use case
Create new tools in my_agent/tools/:
Example: my_agent/tools/calculator.py
def calculator(operation: str, a: float, b: float) -> float:
"""
Performs basic arithmetic operations.
Args:
operation: The operation to perform (add, subtract, multiply, divide)
a: First number
b: Second number
Returns:
The result of the operation
"""
if operation == "add":
return a + b
elif operation == "subtract":
return a - b
# ... etcThen import and add it to your agent in agent.py:
from google.adk.agents import llm_agent
from my_agent.tools import web_search
from my_agent.tools import calculator
root_agent = llm_agent.Agent(
# ...
tools=[web_search.web_search, calculator.calculator],
)- Start simple: Get a basic agent working first, then add complexity
- Test frequently: Use
uv run adk webto interactively test changes - Read the docs: Everything you need to know about using the ADK can be found in the official ADK documentation
- Check out examples: Browse the ADK samples repository for inspiration and working examples
- Understand the dataset: Look at questions in
benchmark/train.jsonto understand what your agent needs to handle (some have attachments) - Iterate: Run evaluations, analyze failures, improve prompts/tools, repeat!
If you want to see your evaluation runs in the web interface:
-
First, start the web UI:
uv run adk web
-
Then run evaluations in a separate terminal:
uv run python evaluate.py
or
uv run python evaluate.py --question 0
(Replace
0with any question index)
This way, all evaluation sessions will appear in the web UI's history (using the same dev_user ID).
gdg-hackathon-prep/
βββ my_agent/ # β YOUR WORKSPACE
β βββ agent.py # β Define your agent here
β βββ tools/ # β Add custom tools here
β β βββ web_search.py # Example tool
β βββ .local_env # Example environment file
β βββ .env # Your API key (create this!)
βββ scripts/ # Scaffolding (don't modify)
β βββ server.py # Agent runner infrastructure
βββ benchmark/ # Train dataset (read-only)
β βββ attachments/ # Files to answer some questions
βββ evaluate.py # Evaluation script
βββ pyproject.toml # Project dependencies
βββ README.md # This file
"Module not found" errors:
uv syncAPI key issues:
- Make sure you copied
.local_envto.envin themy_agent/folder - Verify the API key variable has been set
Port already in use:
# Kill the process using port 8000
lsof -ti:8000 | xargs kill -9- ADK Documentation: https://google.github.io/adk-docs/ - Complete guide on how to use the ADK
- ADK Samples: https://github.com/google/adk-samples - Working examples for inspiration
- Gemini API Docs: https://ai.google.dev/docs - Reference for Gemini models
- Building Effective Agents: https://www.anthropic.com/engineering/building-effective-agents - Anthropic's guide on agent design patterns and workflows
- Writing Tools for Agents: https://www.anthropic.com/engineering/writing-tools-for-agents - Best practices for creating effective agent tools
- Documentation: Almost everything you need can be found in the official ADK docs linked above
- Other Issues: Feel free to reach out to one of the ML6 ML Engineers walking around
Good luck building your agent! π
ML6 is a frontier, international AI engineering company, constantly pushing the boundaries of what's possible with AI. We partner with bold leaders to turn cutting-edge AI into lasting business impact. With over a decade of proven expertise, we deliver AI that reshapes business models. AI that is reliable and secure, ensuring a lasting impact. From strategy to delivery, we don't just follow the hypeβwe build the future.
