Skip to content

GDG-KUL/ML6xGDGKUL_AIAccelerate2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Project Logo

Everything is in the README below but optionally you can checkout the slidedeck.

Google Cloud Platform (GCP) recources by GDG

You’ll get free Google Cloud access during the eventβ€”no credit card needed.

How to redeem:

Generating Gemini API Keys

After completing the steps to access the Google Cloud Console, you can generate your Gemini API Keys for free. You can find the keys be going to Google AI Studio (https://aistudio.google.com/api-keys) with the same google account used in the steps above. image

What is GCP?

Google Cloud Platform offers all the tools of the google cloud development environment to build your applications. It also offers free Gemini API Keys so you can leverage the full potential of generative AI during today's hacking!

These are some of the services the GCP offers ranging from backend & storage to scheduling and ML workloads. Some of them might be interesting to leverage during this case, especially the services enablign AI services!

  • Vertex AI : Train/host custom models, embeddings, batch prediction, Model Garden.
  • Vision API : Labels, OCR, object detection.
  • Speech-to-Text / Text-to-Speech : Voice features.
  • Translation API : Multilingual apps fast.
  • Vertex AI Workbench : Managed notebooks; GPUs/TPUs if available.

GCP offers a broad range of other services to allow for cloud deployment of applications. Feel free to check out the possibilities of GCP for your hacking today or future projects!

GDG Hackathon - ML6 Agent Challenge

Welcome! This repository contains everything you need to build and evaluate an AI agent using Google's Agent Development Kit (ADK).

Agent Illustration

🎯 The Challenge

"In the era of ChatGPT, clients have ever growing expectations of AI systems. They expect every question to be answered β€” instantly, flawlessly, and magically. From "Which came first β€” the chicken or the egg?" to "ChatGPT, can you solve my project?" β€” they want answers, now.

Unfortunately, although modern AI models can already generate impressive outputs, they still lack the tools and capabilities to solve the most complex questions some of our clients have.

Throughout the years, we have compiled these difficult questions into a dataset to evaluate our internal AI systems. But after years of development, we still have not been able to create a system that satisfies our most critical customers. So today, we are asking for your help!

Your challenge is to leverage an AI agent, augmenting their capabilities through the right design, tool use, prompting, and coordination β€” to find the correct answers to a wide range of complex questions that require multi-step reasoning and digging through files. From history to science, from video interpretation to riddles β€” your system should think, not just predict."

What You Need to Build

An AI agent that can interpret questions (and possibly files) and return accurate answers.

The questions require your agent to:

  • Autonomously extract information from files
  • Search the web for relevant information
  • Reason about information step-by-step
  • Handle multiple topics and modalities (text, images, PDFs, etc.)

What is an Agent?

An agent is an LLM in a loop with the ability to invoke tools (Python functions - which in turn can invoke APIs) and use the output of these tools in the next loop iteration.

This challenge will help you get hands-on experience with building agents and custom tools. Throughout the day, questions of increasing difficulty will be released sequentially - covering different topics, modalities, and durations to test your agent's accuracy and speed.

Deliverables

1. Technical Deliverable

  • Working AI agent code in this repository
  • Code must be readable and well-structured
  • The agent code MUST be the same as the one used in your demo

2. Presentation (Short & Functional)

Your presentation should have two sections:

Section One: Demo & Insights

  • Show your agent in action
  • Share key learnings and insights from development
  • Discuss interesting technical challenges you solved

Section Two: Real-World Application

  • Demonstrate a potential business application
  • Showcase ROI and measurable business value
  • Present how the agent delivers impact in a realistic context

⚠️ Important: Keep the presentation clear and accessible. Points will be deducted if it is too technical.

Judging Criteria

Criteria Weight Guiding Questions
Agent Answer Accuracy 40% Was the answer given by your agent the same as the ground truth answer in our hidden test set?
Agent Answer Speed 10% What was the average time required to generate correct answers for all questions?
Presentation Section One 20% Did we clearly see the agent in action? Did we receive interesting insights and learnings?
Presentation Section Two 30% Did we clearly see the potential and value of the agent for an original use case?

We will clone your repository and also evaluate the authenticity of your agent. To allow a smooth evaluation, make sure to properly update the python requirements if you are adding novel packages.

⚠️ Important Note on Accuracy: Your final accuracy score will be evaluated on a hidden test set that you don't have access to. This test set contains the same types of questions as your training set, but ensures fair evaluation. Use the training set (benchmark/train.json) to develop your agent, but avoid overfitting to it!

Scoring System:

We use a 100-point system divided across these criteria and teams.

The weight represents the total points available for that criterion. Points are distributed among teams based on performance:

  • Highest-performing team: 25% of the category's points
  • Second team: 15%
  • Third team: 10%
  • Fourth team: 8%
  • Fifth team: 6%
  • All other teams: Equally share the remaining points

The team with the most total points wins!

Submission Format

Code Submission:

  • Copy our repository and recreate it on your own repository by using a git fork (clone)
  • You can push your code to this newly created GitHub repository copy as desired (to your own repository). Pushing allows you to share your code with your teammates and allows us to see your progress to better help you along the way.
  • Give all of the ML6 evaluators access to this repository by (settings > collaborators): usernames are "clemvg", "MatthiasCami8", "TheoDepr", "cas-ML6".
  • Keep code readable and well-documented
  • Deadline: Last commit before the specified deadline will be evaluated!

Presentation Submission:

  • Share a link to your Google Slides presentation
  • Presentation must include: Intro, Section One (demo), and Section Two (business case)
  • Grant access to cas.coopman@ml6.eu for the slide deck AND any referenced files (videos, etc.)

What You'll Be Working On

Your main workspace: my_agent/ folder

This is where you'll spend most of your time:

  • my_agent/agent.py - Define your agent's configuration and capabilities
  • my_agent/tools/ - Add custom tools/functions for your agent to use

Other folders (scaffolding - do not modify):

  • utils/ - Infrastructure code for running and evaluating agents
  • benchmark/ - Train benchmark
  • evaluate.py - Evaluation script (feel free to read and understand it!)

Quick Start

1. Prerequisites

You'll need:

  • Python 3.9 or higher
  • A Google API key (We will provide one)

2. Install uv (Python package manager)

macOS/Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows:

powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

Or via Homebrew (macOS):

brew install uv

3. Setup the Project

Step 1: Copy the environment file

# From the project root, navigate to my_agent folder
cd my_agent

# Copy the example environment file
# macOS/Linux
cp .local_env .env

# Windows
copy .local_env .env

Step 2: Add your API key

Open my_agent/.env and replace the placeholder with your actual Google API key:

GOOGLE_API_KEY="your_actual_api_key_here"

Step 3: Install dependencies

# Go back to project root
cd ..

# Install all dependencies
uv sync

That's it! You're ready to start developing.

Development Workflow

Option 1: Interactive Development (Recommended)

Use the ADK web interface to test and debug your agent interactively:

uv run adk web

Then open http://127.0.0.1:8000 in your browser. This gives you:

  • Live chat interface to test your agent
  • Session history to review conversations
  • Real-time debugging

Note: Session history is lost when you stop the server! To view previous evaluation sessions, see the Advanced: Viewing Evaluations in the Web UI section below.

Option 2: Run Evaluations

Test your agent against the train dataset:

Evaluate all questions:

uv run python evaluate.py

Evaluate a specific question:

uv run python evaluate.py --question 0

(Replace 0 with any question index)

Save results to a custom file:

uv run python evaluate.py --output my_results.json

Results include:

  • Total accuracy percentage
  • Detailed breakdown per question
  • Agent responses and expected answers
  • Evaluation method used (string match or LLM judge)
  • Timing metrics - Response time for each question and overall averages

The evaluation tracks response times and provides:

  • Average Response Time (All): Average time across all questions
  • Average Response Time (Correct Only): Average time for correctly answered questions only

πŸ’‘ Tip: This evaluation runs on the training set (benchmark/train.json). Your final score will be based on a hidden test set with similar questions, so focus on building a robust, generalizable agent rather than memorizing answers!

How to Build Your Agent

1. Basic Agent Configuration

Open my_agent/agent.py:

from google.adk.agents import llm_agent
from my_agent.tools import web_search

root_agent = llm_agent.Agent(
    model='gemini-2.5-flash-lite',  # or other options such as 'gemini-2.0-flash'
    name='agent',
    description="A helpful assistant that can answer questions.",
    instruction="You are a helpful assistant...",  # Customize this!
    tools=[web_search.web_search],  # Add your tools here
)

Key things to customize:

  • instruction: This is your agent's system prompt - be specific about how it should behave
  • tools: Add custom tools to extend your agent's capabilities
  • model: Choose the best model for your use case

2. Adding Custom Tools

Create new tools in my_agent/tools/:

Example: my_agent/tools/calculator.py

def calculator(operation: str, a: float, b: float) -> float:
    """
    Performs basic arithmetic operations.

    Args:
        operation: The operation to perform (add, subtract, multiply, divide)
        a: First number
        b: Second number

    Returns:
        The result of the operation
    """
    if operation == "add":
        return a + b
    elif operation == "subtract":
        return a - b
    # ... etc

Then import and add it to your agent in agent.py:

from google.adk.agents import llm_agent
from my_agent.tools import web_search
from my_agent.tools import calculator

root_agent = llm_agent.Agent(
    # ...
    tools=[web_search.web_search, calculator.calculator],
)

3. Tips for Success

  • Start simple: Get a basic agent working first, then add complexity
  • Test frequently: Use uv run adk web to interactively test changes
  • Read the docs: Everything you need to know about using the ADK can be found in the official ADK documentation
  • Check out examples: Browse the ADK samples repository for inspiration and working examples
  • Understand the dataset: Look at questions in benchmark/train.json to understand what your agent needs to handle (some have attachments)
  • Iterate: Run evaluations, analyze failures, improve prompts/tools, repeat!

Advanced: Viewing Evaluations in the Web UI

If you want to see your evaluation runs in the web interface:

  1. First, start the web UI:

    uv run adk web
  2. Then run evaluations in a separate terminal:

    uv run python evaluate.py

    or

    uv run python evaluate.py --question 0

    (Replace 0 with any question index)

This way, all evaluation sessions will appear in the web UI's history (using the same dev_user ID).

Project Structure

gdg-hackathon-prep/
β”œβ”€β”€ my_agent/              # ← YOUR WORKSPACE
β”‚   β”œβ”€β”€ agent.py          # ← Define your agent here
β”‚   β”œβ”€β”€ tools/            # ← Add custom tools here
β”‚   β”‚   β”œβ”€β”€ web_search.py # Example tool
β”‚   β”œβ”€β”€ .local_env        # Example environment file
β”‚   └── .env              # Your API key (create this!)
β”œβ”€β”€ scripts/              # Scaffolding (don't modify)
β”‚   └── server.py         # Agent runner infrastructure
β”œβ”€β”€ benchmark/            # Train dataset (read-only)
β”‚   β”œβ”€β”€ attachments/      # Files to answer some questions
β”œβ”€β”€ evaluate.py           # Evaluation script
β”œβ”€β”€ pyproject.toml        # Project dependencies
└── README.md             # This file

Troubleshooting

"Module not found" errors:

uv sync

API key issues:

  • Make sure you copied .local_env to .env in the my_agent/ folder
  • Verify the API key variable has been set

Port already in use:

# Kill the process using port 8000
lsof -ti:8000 | xargs kill -9

Resources

Official Documentation & Examples

Agent Design & Best Practices

Getting Help

  • Documentation: Almost everything you need can be found in the official ADK docs linked above
  • Other Issues: Feel free to reach out to one of the ML6 ML Engineers walking around

Good luck building your agent! πŸš€

About ML6

ML6 is a frontier, international AI engineering company, constantly pushing the boundaries of what's possible with AI. We partner with bold leaders to turn cutting-edge AI into lasting business impact. With over a decade of proven expertise, we deliver AI that reshapes business models. AI that is reliable and secure, ensuring a lasting impact. From strategy to delivery, we don't just follow the hypeβ€”we build the future.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages