Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
include VERSION
recursive-include configs/ *
recursive-include embodichain/gen_sim/action_agent_pipeline/generation/templates *.json
197 changes: 45 additions & 152 deletions docs/source/features/generative_sim/agents.md
Original file line number Diff line number Diff line change
@@ -1,175 +1,68 @@
# EmbodiAgent(aborted)
# Action Agent Pipeline

EmbodiAgent is a hierarchical multi-agent system that enables robots to perform complex manipulation tasks through closed-loop planning, code generation, and validation. The system combines vision-language models (VLMs) and large language models (LLMs) to translate high-level goals into executable robot actions.
The action-agent pipeline is the supported agent workflow for generated tabletop
manipulation tasks. It converts an image or an existing generated gym project
into a task-specific simulation config, asks the task model for a JSON task
graph, compiles that graph into atomic-action specs, and executes it through the
`AtomicActionsAgent-v3` environment.

## Quick Start
The legacy Python-code generation agent stack has been removed. New demos and
task generation should use the modules under
`embodichain.gen_sim.action_agent_pipeline`.

### Prerequisites
Ensure you have access to Azure OpenAI or a compatible LLM endpoint.
## End-to-end Pipeline

```bash
# Set environment variables
export AZURE_OPENAI_ENDPOINT="https://your-endpoint.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-api-key"
```

### Using Different LLM/VLM APIs
Run image-to-scene, config generation, and agent execution in one command:

The system uses LangChain's `AzureChatOpenAI` by default. To use different LLM/VLM providers, you can modify the `create_llm` function in `embodichain/agents/hierarchy/llm.py`.

#### Azure OpenAI
```bash
export AZURE_OPENAI_ENDPOINT="https://your-endpoint.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-api-key"
export OPENAI_API_VERSION="2024-10-21" # Optional, defaults to "2024-10-21"
python -m embodichain.gen_sim.action_agent_pipeline.cli.run_agent_pipeline \
--use-image2scene \
--server "http://127.0.0.1:4523" \
--image-name "demo1" \
--task_description "Pick up the target object and place it in the basket." \
--config-output-dir "gym_project/action_agent_pipeline/configs/demo1_text" \
--task_name "Demo1_Text" \
--target_body_scale 0.8 \
--regenerate
```

#### OpenAI
To use OpenAI directly instead of Azure, modify `llm.py`:
```python
from langchain_openai import ChatOpenAI
## Generate Config Only

def create_llm(*, temperature=0.0, model="gpt-4o"):
return ChatOpenAI(
temperature=temperature,
model=model,
api_key=os.getenv("OPENAI_API_KEY"),
)
```
Use an existing gym project to generate the task config and agent config:

Then set:
```bash
export OPENAI_API_KEY="your-api-key"
```

#### Other Providers
You can use other LangChain-compatible providers by modifying the `create_llm` function, for example:

**Anthropic Claude:**
```python
from langchain_anthropic import ChatAnthropic

def create_llm(*, temperature=0.0, model="claude-3-opus-20240229"):
return ChatAnthropic(
temperature=temperature,
model=model,
anthropic_api_key=os.getenv("ANTHROPIC_API_KEY"),
)
python -m embodichain.gen_sim.action_agent_pipeline.cli.generate_action_agent_config \
--gym_project "gym_project/environment/image2tabletop/downloads/example_gym_project" \
--output_dir "gym_project/action_agent_pipeline/configs/demo_text" \
--task_name "Demo_Text" \
--task_description "Pick up the target object and place it in the basket." \
--target_body_scale 0.8 \
--overwrite
```

**Google Gemini:**
```python
from langchain_google_genai import ChatGoogleGenerativeAI
## Run Generated Config

def create_llm(*, temperature=0.0, model="gemini-pro"):
return ChatGoogleGenerativeAI(
temperature=temperature,
model=model,
google_api_key=os.getenv("GOOGLE_API_KEY"),
)
```

### Run the System

Run the agent system with the following command:
Run a previously generated config with the action-agent environment:

```bash
python embodichain/lab/scripts/run_agent.py \
--task_name YourTask \
--gym_config configs/gym/your_task/gym_config.yaml \
--agent_config configs/gym/agent/your_agent/agent_config.json \
--regenerate False
python -m embodichain.gen_sim.action_agent_pipeline.cli.run_agent \
--task_name "Demo_Text" \
--gym_config "gym_project/action_agent_pipeline/configs/demo_text/fast_gym_config.json" \
--agent_config "gym_project/action_agent_pipeline/configs/demo_text/agent_config.json" \
--regenerate
```

**Parameters:**
- `--task_name`: Name identifier for the task
- `--gym_config`: Path to the gym environment configuration file (``.json``, ``.yaml``, or ``.yml``)
- `--agent_config`: Path to the agent configuration file (defines prompts and agent behavior)
- `--regenerate`: If `True`, forces regeneration of plans/code even if cached

## System Architecture

The system operates on a closed-loop control cycle:

- **Observe**: The `TaskAgent` perceives the environment via multi-view camera inputs.
- **Plan**: It decomposes the goal into natural language steps.
- **Code**: The `CodeAgent` translates steps into executable Python code using atomic actions.
- **Execute**: The code runs in the environment; runtime errors are caught immediately.
- **Validate**: The `ValidationAgent` analyzes the result images, selects the best camera angle, and judges success.
- **Refine**: If validation fails, feedback is sent back to the agents to regenerate the plan or code.

---

## Core Components

### TaskAgent
*Located in:* `embodichain/agents/hierarchy/task_agent.py`

Responsible for high-level reasoning. It parses visual observations and outputs a structured plan.

* For every step, it generates a specific condition (e.g., "The cup must be held by the gripper") which is used later by the ValidationAgent.
* Prompt Strategies:
* `one_stage_prompt`: Direct VLM-to-Plan generation.
* `two_stage_prompt`: Separates visual analysis from planning logic.

### CodeAgent
*Located in:* `embodichain/agents/hierarchy/code_agent.py`

Translates natural language plans into executable Python code using atomic actions from the action bank.

* Generates Python code that follows strict coding guidelines (no loops, only provided APIs)
* Executes code in a sandboxed environment with immediate error detection
* Uses Abstract Syntax Tree (AST) parsing to ensure code safety and correctness
* Supports few-shot learning through code examples in the configuration


### ValidationAgent
*Located in:* `embodichain/agents/hierarchy/validation_agent.py`

Closes the loop by verifying if the robot actually achieved what it planned.

* Uses a specialized LLM call (`select_best_view_dir`) to analyze images from all cameras and pick the single best angle that proves the action's outcome, ignoring irrelevant views.
* If an error occurs (runtime or logic), it generates a detailed explanation which is fed back to the `TaskAgent` or `CodeAgent` for the next attempt.

---

## Configuration Guide

The `Agent` configuration block controls the context provided to the LLMs. Prompt files are resolved in the following order:

1. **Config directory**: Task-specific prompt files in the same directory as the agent configuration file (e.g., `configs/gym/agent/pour_water_agent/`)
2. **Default prompts directory**: Reusable prompt templates in `embodichain/agents/prompts/`

| Parameter | Description | Typical Use |
| :--- | :--- | :--- |
| `task_prompt` | Task-specific goal description | "Pour water from the red cup to the blue cup." |
| `basic_background` | Physical rules & constraints | World coordinate system definitions, safety rules. |
| `atom_actions` | API Documentation | List of available functions (e.g., `drive(action='pick', ...)`). |
| `code_prompt` | Coding guidelines | "Use provided APIs only. Do not use loops." |
| `code_example` | Few-shot examples | Previous successful code snippets to guide style. |

---

## File Structure

```text
embodichain/agents/
├── hierarchy/
│ ├── agent_base.py # Abstract base handling prompts & images
│ ├── task_agent.py # Plan generation logic
│ ├── code_agent.py # Code generation & AST execution engine
│ ├── validation_agent.py # Visual analysis & view selection
│ └── llm.py # LLM configuration and instances
├── mllm/
│ └── prompt/ # Prompt templates (LangChain)
└── prompts/ # Agent prompt templates
```
## Runtime Shape

---
- `TaskAgent` produces a deterministic JSON graph.
- `CompileAgent` caches and validates the graph artifact.
- `AgenticGenSimEnv` registers `AtomicActionsAgent-v3` and exposes
`create_demo_action_list()`.
- Runtime graph execution calls atomic actions from
`embodichain.gen_sim.action_agent_pipeline.runtime`.

## See Also

- [Online Data Streaming](../online_data.md) — Streaming live simulation data for training
- [RL Architecture](../../overview/rl/index.rst) — RL training pipeline and algorithms
- [Atomic Actions Tutorial](../../tutorial/atomic_actions.rst) — Action primitives used by the CodeAgent
- [SimReady Asset Pipeline](simready_pipeline.md) — Generating simulation-ready assets
- [Atomic Actions Tutorial](../../tutorial/atomic_actions.rst) — Atomic action primitives
- [Supported Tasks](../../resources/task/index.rst) — Available task environments
1 change: 1 addition & 0 deletions docs/source/features/generative_sim/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ Generative Simulation collects EmbodiChain features for generating simulation-re
.. toctree::
:maxdepth: 2

Action Agent Pipeline <agents.md>
SimReady Asset Pipeline <simready_pipeline.md>
21 changes: 21 additions & 0 deletions embodichain/gen_sim/action_agent_pipeline/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# ----------------------------------------------------------------------------
# Copyright (c) 2021-2026 DexForce Technology Co., Ltd.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ----------------------------------------------------------------------------

from __future__ import annotations

"""Action-agent graph compilation and atomic-action runtime."""

__all__: list[str] = []
24 changes: 24 additions & 0 deletions embodichain/gen_sim/action_agent_pipeline/agents/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# ----------------------------------------------------------------------------
# Copyright (c) 2021-2026 DexForce Technology Co., Ltd.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ----------------------------------------------------------------------------

from __future__ import annotations

__all__ = [
"agent_base",
"compile_agent",
"llm",
"task_agent",
]
96 changes: 96 additions & 0 deletions embodichain/gen_sim/action_agent_pipeline/agents/agent_base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# ----------------------------------------------------------------------------
# Copyright (c) 2021-2026 DexForce Technology Co., Ltd.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ----------------------------------------------------------------------------

from __future__ import annotations

from abc import ABCMeta
import os

from embodichain.utils.utility import load_txt

__all__ = ["AgentBase"]


def _resolve_prompt_path(file_name: str, config_dir: str | None = None) -> str:
# If absolute path, use directly
if os.path.isabs(file_name):
if os.path.exists(file_name):
return file_name
raise FileNotFoundError(f"Prompt file not found: {file_name}")

# Try config directory first (for task-specific prompts)
if config_dir:
config_path = os.path.join(config_dir, file_name)
if os.path.exists(config_path):
return config_path

# Try action_agent_pipeline/prompts directory for reusable prompts.
agents_prompts_dir = os.path.join(
os.path.dirname(os.path.dirname(os.path.abspath(__file__))), "prompts"
)
agents_path = os.path.join(agents_prompts_dir, file_name)
if os.path.exists(agents_path):
return agents_path

# If still not found, raise error with search paths
searched_paths = []
if config_dir:
searched_paths.append(f" - {config_dir}/{file_name}")
searched_paths.append(f" - {agents_prompts_dir}/{file_name}")

raise FileNotFoundError(
f"Prompt file not found: {file_name}\n"
f"Searched in:\n" + "\n".join(searched_paths)
)


class AgentBase(metaclass=ABCMeta):
def __init__(self, **kwargs) -> None:

assert (
"prompt_kwargs" in kwargs.keys()
), "Key prompt_kwargs must exist in config."

for key, value in kwargs.items():
setattr(self, key, value)

# Get config directory if provided
config_dir = kwargs.get("config_dir", None)
if config_dir:
config_dir = os.path.dirname(os.path.abspath(config_dir))

# Preload and store prompt contents inside self.prompt_kwargs
for key, val in self.prompt_kwargs.items():
if val["type"] == "text":
file_path = _resolve_prompt_path(val["name"], config_dir)
val["content"] = load_txt(file_path)
else:
raise ValueError(
f"Now only support `text` type but {val['type']} is given."
)

def generate(self, *args, **kwargs):
pass

def act(self, *args, **kwargs):
pass

def get_composed_observations(self, **kwargs):
ret = {}
for key, val in self.prompt_kwargs.items():
ret[key] = val["content"]
ret.update(kwargs)
return ret
Loading
Loading