Deployment in a High-code Manner #908

DavdGao · 2025-11-07T02:38:30Z

DavdGao
Nov 7, 2025
Maintainer

Hi, community! Thanks for your interest in and support for AgentScope.

We'd like to use this category "Design Book of AgentScope" as a space to share and discuss ideas about the design choices behind AgentScope. In this first post, we'll demonstrate how to deploy agents using a high-code approach, with a focus on:

Multi-agent systems,
Custom agent classes, and
State management in agent systems.

We'll use a routing agent as our example—a typical multi-agent system that routes user requests to different sub-agents based on request content, to illustrate multi-agent deployment in high-code manner.

AgentScope-Runtime provides a one-stop solution for single-agent deployment and secure tool sandboxing. Therefore, this post focuses primarily on multi-agent and custom agent deployment.

Key Questions to Ask

Before diving into deployment, it's helpful to think through a few key questions about your system:

Q: How will you handle incoming requests? (e.g., by thread, by process, by async task, etc.)

This will affect the design of your agent system. For example, in Flask, each request is handled in a new thread, with both the server and requests running in a single Python process. In such cases, thread-safety becomes critical—you need to be careful about using global variables, as different requests may interfere with each other.

Q: Is your application multi-tenant or single-tenant?

This determines how you'll manage and isolate state across different users. Don't worry though—AgentScope provides application-level state management, which we'll discuss later.

Tips: Application-level state management can encompass multiple agents, including custom agent classes inherit from AgentBase or ReActAgentBase, not just the official ReActAgent class provided by AgentScope.

Q: How will you send messages to the frontend?

This decides what your endpoint function should return. If messages need to be streamed back through the request endpoint, your endpoint should return a generator.

Alternatively, you can send messages directly to the frontend or a pub/sub system (like Redis). In this case, the endpoint function can simply return a basic response, like "200 OK".

Q: How will you expose agents to users?

This depends on your user interaction design. Here are two approaches with their trade-offs:

Approach	Advantages	Disadvantages
Expose sub-agents to users	Clearly shows the multi-agent system at work and how sub-agents complete subtasks	May confuse users. For example: "Can I interact with sub-agents directly, or only through the main agent?"
Only expose the main agent	Focused, clear, and easier to understand.	Still need to show sub-agents' progress to avoid users feeling like nothing is happening

This choice will determine how you handle sub-agent messages:

By exposing the sub-agent directly to the user (frontend), or
As tool results from a function like create_worker.

Note: When using sub-agents as tools, remember to compress their execution logs in the tool results. Otherwise, lengthy results will bloat the main agent's context, undermining the context isolation benefit of multi-agent architecture.

Putting It into Practice

Once you've thought through these questions, let's focus on the agent application itself rather than infrastructure concerns like concurrency or latency.

We'll cover two key aspects:

State management
Frontend display

For illustration purposes, we'll use Quark as our web framework in the examples. The streaming approach (where endpoints return generators) will be covered in the Frontend Display section.

State Management

Understanding State in AgentScope

A common misconception is that only memory constitutes an agent's state. That's not necessarily true. In AgentScope, a PlanNotebook instance, a Toolkit instance, or even custom attributes can all be part of the state. Here are some examples:

Example 1: Planning state
AgentScope's planning capability, supported by PlanNotebook, allows agents to request additional information from users during execution. This means plan execution spans multiple user requests, requiring us to maintain the execution state across them.

Example 2: Toolkit state
When using group-wise management or meta tools in AgentScope, the toolkit tracks activated tool groups, which must also persist across requests.

Example 3: Custom attributes
Imagine you want to surprise users on their 100th conversation. You'd add a counter to track conversations and trigger the surprise at 100:

class MyAgent(AgentBase):
    def __init__(self, *args: Any, **kwargs: Any) -> None:
        super().__init__()
        ...
        self.counter = 0  # Conversation counter

    def reply(self, *args, **kwargs) -> Msg:
        self.counter += 1

        if self.counter == 100:
            # Trigger the surprise
            ...

This counter is clearly part of the state that needs to persist.

These examples illustrate why AgentScope provides the StateModule class for state management. It handles state in two ways:

Nested StateModules: If an attribute is itself a StateModule instance, it's automatically included in the parent's state. For example, if a ReActAgent instance has a plan_notebook attribute that is a PlanNotebook instance (both are StateModule subclasses), the notebook's state is automatically saved/loaded with the agent.
Primitive types: For regular attributes like integers or floats, use register_state() to include them in the state, optionally with custom save/load logic.

This mechanism enables AgentScope to seamlessly support custom agent implementations. As long as your custom agent inherits from AgentBase or ReActAgentBase, its state will be automatically managed without extra effort from developers.

Using Session Modules

Here's how to manage state with session modules:

@app.route("/chat", method=["POST"])
def chat_endpoint():
    session_id = request.json.get("session_id")
    
    # Init a session manage module
    sessions = JSONSession(save_dir="....")
    
    # Create the agent objects
    agent1 = ReActAgent(...)
    agent2 = ReActAgent(...)

    # Load the state before execute the application
    session.load_session_state(
        session_id=session_id,
        agent1=agent1,
        agent2=agent2,
    )

    # You application logic here
    ...

    # Save the session after handling request
    session.save_session_state(
        session_id=session_id,
        agent1=agent1,
        agent2=agent2,
    )

In practice, developer can create their own session management class by inheriting from SessionBase, to customize how session states are saved and loaded. For example, loading/saving states from a cloud database.

Frontend Displaying

As mentioned earlier, there are two approaches to displaying sub-agent messages.

Approach 1: Expose sub-agents to users

When you want users to see sub-agents in action, you can stream their messages directly to the frontend. This approach makes the multi-agent workflow transparent to users.

If you need to stream all messages from your endpoint function, AgentScope provides the stream_printing_messages pipeline to handle this. It collects printing messages from multiple agents and yields them in order, making it easy to implement streaming responses.

Hint: Printing messages are messages generated by calling an agent's self.print() function.

The key idea is to use a shared asyncio.Queue to collect messages from both the main agent and any dynamically created sub-agents. Here's how it works:

Step 1: Create sub-agents with message streaming enabled

In your tool function, configure sub-agents to send their messages to the shared queue:

from agentscope.tool import ToolResponse
from agentscope.agent import ReActAgent

import asyncio

# The tool function to create a worker agent
def create_worker(task_description: str, queue: asyncio.Queue) -> ToolResponse:
    # Create the sub-agent
    sub_agent = ReActAgent(...)
    
    # Disable the default printing message
    sub_agent.set_console_output_enabled(False)
    
    # Use the same queue with the main agent to stream the sub-agent messages
    sub_agent.set_msg_queue_enabled(True, queue=queue)
    ...

Step 2: Set up the endpoint with streaming

In your endpoint function, create the shared queue and use stream_printing_messages to stream all messages:

@app.route("/chat", methods=["POST"])
def chat_endpoint():
    ...

    # Create a shared queue for message streaming
    queue = asyncio.Queue()

    # Register the tool with the queue pre-configured
    toolkit = Toolkit()
    toolkit.register_tool_function(
        create_worker,
        preset_kwargs={
            "queue": queue,  # Pass the queue to tool function
        }
    )

    # Create the main agent with the toolkit
    agent = ReActAgent(
        ...,
        toolkit=toolkit,
    )

    # Stream messages from both main agent and sub-agents
    async for msg, last in stream_printing_messages(
        agents=[agent],  # Only the main agent is specified here
        coroutine_task=agent(Msg("user", "...", "user")),
        queue=queue,  # The same queue collects messages from all agents
    ):
        yield msg  # Stream each message to the frontend

The beauty of this approach is that even though only the main agent is passed to stream_printing_messages, messages from all sub-agents are automatically captured through the shared queue.

Approach 2: Hide Sub-Agents from Users

Another way is to hide sub-agents from users, treating their printing messages as tool results. This keeps the user interface clean and focused on the main agent.
In this case, we first prepare a function to convert and compress sub-agent messages into text blocks in the tool result
of the main agent.

Note we need to shorten tool use and result messages to avoid bloating the main agent's context.

def _convert_to_text_block(msgs: list[Msg]) -> list[TextBlock]:
    # Collect all the content blocks
    blocks: list = []
    # Convert tool_use block into text block for streaming tool response
    for _ in msgs:
        for block in _.get_content_blocks():
            if block["type"] == "text":
                blocks.append(block)

            elif block["type"] == "tool_use":
                # We omit the tool input details
                blocks.append(
                    TextBlock(
                        type="text",
                        # Only expose the tool name 
                        text=f"Calling tool {block['name']} ...",
                    ),
                )

            elif block["type"] == "tool_result":
                # We omit the tool output details more than 50 characters
                blocks.append(
                    TextBlock(
                        type="text",
                        text=f"Tool {block['name']} returned result: {block['output'][:50]} ...",
                    ),
                )
    return blocks

After that, we can implement the tool function to create a worker agent and yield streaming tool responses:

async def create_worker(task_description: str) -> Generator[ToolResponse, None, None]:
    """..."""
    
    # Create the sub-agent
    sub_agent = ReActAgent(...)
    
    # Disable the console output of the sub-agent
    sub_agent.set_console_output_enabled(False)
    
    # Collect the execution process content
    msgs = OrderedDict()

    # Wrap the sub-agent in a coroutine task to obtain the final
    # structured output
    result = []
    
    # Execute the sub-agent and stream results
    async for msg, _ in stream_printing_messages(
        agents=[sub_agent],
        coroutine_task=sub_agent(Msg("user", task_description, "user")),
    ):
        msgs[msg.id] = msg

        # Collect all the content blocks
        yield ToolResponse(
            content=_convert_to_text_block(
                list(msgs.values()),
            ),
            stream=True,
            is_last=False,
        )

With this approach, users only interact with the main agent, while sub-agent activities appear as concise tool execution updates.

Trade-off: This method sacrifices visibility—users cannot see the full details of sub-agent execution. However, this may be exactly what you want. Similar to how GitHub Copilot works, simply informing users that "the agent is working on a subtask" is often sufficient. The choice depends on your application's needs and user expectations.

Wrapping Up

We've walked through the key considerations for deploying multi-agent systems with AgentScope: from clarifying your requirements upfront, to managing state across custom agents, to choosing how to present agent activities to users.

These design choices don't have universal "right" answers—they depend on your specific use case and user needs. AgentScope aims to provide the flexibility to support different approaches while handling the complex parts (like state management and message streaming) for you.

We'd love to hear about your experiences deploying agent systems. What challenges have you encountered? What design decisions worked well for your use case? Feel free to share your thoughts and questions in the comments below.

Happy building! 🚀

DavdGao · 2025-11-07T02:47:37Z

DavdGao
Nov 7, 2025
Maintainer Author

We're working on practical deployment examples covering multi-agent systems and custom agents. They'll be available soon!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployment in a High-code Manner #908

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Deployment in a High-code Manner #908

Uh oh!

DavdGao Nov 7, 2025 Maintainer

Key Questions to Ask

Putting It into Practice

State Management

Understanding State in AgentScope

Using Session Modules

Frontend Displaying

Approach 1: Expose sub-agents to users

Approach 2: Hide Sub-Agents from Users

Wrapping Up

Replies: 1 comment

Uh oh!

DavdGao Nov 7, 2025 Maintainer Author

DavdGao
Nov 7, 2025
Maintainer

DavdGao
Nov 7, 2025
Maintainer Author