Under active development.
This repository contains a communications platform with TTS, TTI, TT3D, and a master agent.
Interop: with Unreal Engine, TouchDesigner, Ollama.
Contains: Diffusers, XFormers, Triton, Instructor.
- TTS model Supertonic 3
- TTI model SDXL-Base-1
- TT3D model Hunyuan3D-2.1 (text → SDXL → shape → PBR texture → GLB)
Development Guidelines:
- A master agent controls and is accessed by the platform.
- Coordination is mandatory for critical environments.
- Expose API and execution timings.
- Local and field-first architecture
src/comms_platform/
├── main.py # entry point
├── config.py
├── constants.py # shared env defaults and paths
├── agent/ # master agent + perception engine
├── transport/ # EventBus, OSC gateway, thread manager
├── integrations/ # Ollama, TouchDesigner, Unreal orchestration
├── inference/ # TTS, TTI, and TT3D engines
├── utils/
├── mcp/ # MCP server (Streamable HTTP)
└── web/
├── app.py # FastAPI factory and lifespan
├── routes/ # HTTP route modules by domain
├── schemas.py
└── static/ # dashboard UI (HTTP client)
The platform exposes a Model Context Protocol server alongside the existing REST API. MCP clients (Cursor, Claude Code, MCP Inspector) can start/stop the master agent, send natural-language messages, and read runtime state.
flowchart LR
Browser["Browser UI\n(main.js)"]
MCPClient["MCP Clients\n(Cursor, CLI)"]
Platform["comms-platform\n(FastAPI + uvicorn)"]
Agent["MasterAgent\n(in-process thread)"]
Perception["PerceptionEngine\n(Instructor)"]
Ollama["Ollama\n(separate LLM server)"]
Browser -->|"HTTP /api/*"| Platform
MCPClient -->|"Streamable HTTP /mcp"| Platform
Platform --> Agent
Agent --> Perception
Perception -->|"Instructor → /v1"| Ollama
Platform -->|"chat: /api/generate"| Ollama
| Tool | Description |
|---|---|
agent_start |
Start the master agent heartbeat loop |
agent_stop |
Stop the master agent heartbeat loop |
agent_status |
Return current agent runtime status |
agent_message |
Natural-language input via perception routing and optional Ollama chat |
| URI | Description |
|---|---|
platform://agent/state |
JSON snapshot of agent and connection runtime state |
platform://agent/intent |
JSON snapshot of the latest perception routing decision |
With the platform running (default http://127.0.0.1:8000):
{
"mcpServers": {
"communications-platform": {
"url": "http://127.0.0.1:8000/mcp"
}
}
}Environment variables:
| Variable | Default | Description |
|---|---|---|
MCP_ENABLED |
true |
Enable MCP Streamable HTTP mount |
MCP_MOUNT_PATH |
/mcp |
HTTP mount path for the MCP endpoint |
Requires Python 3.12 on Windows for the CUDA PyTorch wheel set used by SDXL.
# First time
uv venv
uv pip install -r requirements.txt
uv pip install -e .
.\run_platform.batrun_platform.bat installs CUDA PyTorch, xFormers, triton-windows, applies Hunyuan3D vendor patches, and starts the platform. One-time Hunyuan3D vendor clone still required:
.\scripts\setup_hunyuan3d.ps1TT3D is optional and heavier than TTI/TTS. It chains your existing SDXL pipeline with Tencent's Hunyuan3D-2.1 shape and PBR paint stages to produce a textured GLB from a text prompt.
| Stage | VRAM (approx.) |
|---|---|
| SDXL preflight (TTI) | 8–12 GB |
| Shape generation | 10 GB |
| PBR texture synthesis | 21 GB |
| Full pipeline | ~29 GB |
Use TT3D_LOW_VRAM=true (default) to unload each stage before loading the next. TTI and TT3D are mutually exclusive on the GPU by default (TT3D_EXCLUSIVE_GPU=true).
From the repository root on Windows:
.\scripts\setup_hunyuan3d.ps1This script:
- Clones Tencent-Hunyuan/Hunyuan3D-2.1 into
vendor/Hunyuan3D-2.1 - Installs platform dependencies (including TT3D packages such as
trimesh,rembg, etc.) - Builds the
custom_rasterizerCUDA extension - Downloads Real-ESRGAN weights for the paint pipeline
If texture generation fails after setup, compile the DifferentiableRenderer manually following the upstream README in vendor/Hunyuan3D-2.1.
Install dependencies manually:
uv pip install -e .| Variable | Default | Description |
|---|---|---|
HUNYUAN3D_ROOT |
vendor/Hunyuan3D-2.1 |
Path to the cloned Hunyuan3D repo |
TT3D_MODEL_ID |
tencent/Hunyuan3D-2.1 |
Hugging Face model ID |
TT3D_SHAPE_SUBFOLDER |
hunyuan3d-dit-v2-1 |
Shape model subfolder |
TT3D_DEFAULT_GUIDANCE |
7.5 |
Classifier-free guidance for shape |
TT3D_DEFAULT_STEPS |
30 |
Diffusion steps for shape |
TT3D_DEFAULT_OCTREE_RESOLUTION |
256 |
Mesh detail level |
TT3D_ENABLE_TEXTURE |
true |
Run PBR paint stage (disable for shape-only) |
TT3D_LOW_VRAM |
true |
Unload pipelines between stages |
TT3D_USE_INTERNAL_TTI |
true |
Generate reference image via SDXL before shape |
TT3D_EXCLUSIVE_GPU |
true |
Unload TTI when TT3D loads (and vice versa) |
TT3D_TEST_PROMPT |
wooden chair prompt | Default prompt before a global prompt: is set |
Send prompt: your text here in Block 08 or via MCP agent_message to set the shared prompt used by Gen TTS, Gen TTI, and Gen TT3D. Example:
prompt: a neon cyberpunk city at night
| Message | Severity | Meaning / fix |
|---|---|---|
No module named 'triton' (from xformers) |
Fixable | Official triton has no Windows wheel. Install triton-windows (included in run_platform.bat and pyproject.toml for Windows). Use version <3.3 with PyTorch 2.6. Not conflicting with PyTorch — it provides the triton module xFormers probes for. |
No module named 'bpy' |
Python version gap | bpy cannot be pip-installed on Python 3.12. PyPI wheels exist only for Python 3.11 (bpy==5.0.1) and Python 3.13 (bpy==5.1.2). This project uses 3.12 for CUDA PyTorch wheels. |
Bpy IO CAN NOT BE Imported |
Usually harmless | Upstream optional import; patched automatically by the platform so the PBR paint pipeline can load without bpy. |
InPaint Function CAN NOT BE Imported |
Usually harmless | Optional inpaint helper missing; core paint path still runs. |
custom_rasterizer has no attribute 'rasterize' or No module named 'custom_rasterizer_kernel' |
Must fix for textured output | The Hunyuan paint CUDA extension was not compiled. Run .\scripts\setup_hunyuan3d.ps1 with Visual Studio Build Tools and CUDA 12.4 installed (must match PyTorch cu124). Until then TT3D can still export shape-only GLB. |
Triton (recommended on Windows):
uv pip install "triton-windows>=3.2.0.post21,<3.3"bpy (not available on Python 3.12 via pip):
# Will FAIL on Python 3.12:
uv pip install bpy
# Works only on matching Python versions:
# Python 3.11 → uv pip install bpy==5.0.1
# Python 3.13 → uv pip install bpy==5.1.2Without bpy, the platform patches Hunyuan3D's vendor code so textured OBJ generation still works; only Blender-native OBJ→GLB conversion is skipped (trimesh is used instead). Restart the platform after setup so the patch is applied before loading TT3D.
To hide texture attempts entirely: TT3D_ENABLE_TEXTURE=false
flowchart LR
Prompt["Text prompt"] --> TTI["SDXL TTI\n(reference PNG)"]
TTI --> RemBG["Background removal"]
RemBG --> Shape["Hunyuan3D shape\n(DiT flow matching)"]
Shape --> Paint["Hunyuan3D paint\n(PBR textures)"]
Paint --> GLB["output/tt3d_latest.glb"]
Outputs are written to output/:
tt3d_latest.glb— latest textured (or shape-only) modeltt3d_ref_latest.png— SDXL reference image used for conditioning
Block 01 - Agent
- Starts and stops the master agent.
- Shows current agent state.
- Uses the top-left control block for core runtime control.
Block 02 - Terminal
- Shows backend logs, stream events, and agent replies.
- Acts as the main realtime output surface.
- Useful for tracing platform activity and request flow.
Block 03 - Agent State
- Displays a JSON snapshot of the current runtime state.
- Can be scoped to agent (includes stream, connections, inference), third party, or timers.
- Includes refresh and copy controls for debugging.
Block 04 - Engines
- Launches TouchDesigner example workflows.
- Checks TouchDesigner process state.
- Sends test data and UE5 bridge messages.
- Checks whether Ollama is reachable on the host.
- Opens Ollama from the installed Windows executable when available.
- Lets you pick an available Ollama model for agent chat.
Block 05 - Media Viewer
- Shows latest generated media artifacts.
- Image card: TTI thumbnail preview, image path, and Open Image action.
- Audio card: TTS audio player, audio path, and Open Audio action.
- Model card: TT3D reference PNG preview (same style as TTI), path, and Open Model action (opens GLB in a new tab).
- Includes Refresh to reload latest media from backend endpoints.
Block 06 - Inference
- SuperTonic 3, SDXL Base 1, and Hunyuan3D 2.1: load/unload each engine and run Gen TTS, Gen TTI, or Gen TT3D using the current global inference prompt.
Block 07 - Timers
- Interval timers for TTS, TTI, and TT3D test renders.
- TTS/TTI: every 10 seconds or every 20 seconds.
- TT3D: every 60 seconds or every 120 seconds (generation is slower).
- Timer state is tracked in the agent state
timerssection.
Block 08 - User Input
- Sends text payloads to the backend agent or MCP.
- Use
prompt: your textto set the shared inference prompt for Gen TTS, Gen TTI, and Gen TT3D. - Appends the user message and agent reply into the terminal view.
Current API endpoints and capabilities:
-
GET /— serves the web UI -
GET /health— liveness endpoint -
GET /events— SSE stream for frontend realtime events/logs -
GET /api/status— runtime status (SSE clients, OSC in/out, agent state) -
POST /api/signals/publish— publishes a stream signal to frontend/event bus -
POST /api/signals/send— sends signal (OSC whenprotocol=osc, otherwise stream) -
POST /api/agent/start— starts agent coordinator -
POST /api/agent/stop— stops agent coordinator -
POST /api/agent/message— sends human text to the agent, appends to history, and returns the current reply plus routing/LLM metadata -
MCP /mcp— Streamable HTTP MCP endpoint (tools:agent_start,agent_stop,agent_status,agent_message; resources:platform://agent/state,platform://agent/intent) -
POST /api/unreal/event— ingests Unreal events and toggles agent start/stop based on current state -
POST /api/platform/send-to-unreal— sends a message to Unreal/notify -
GET /api/ollama/status— checks Ollama availability and lists models -
POST /api/ollama/open— starts Ollama when installed locally -
GET /api/tts/status— reports whether SuperTonic 3 is loaded -
POST /api/tts/engine/on— loads SuperTonic 3 into memory for fast inference -
POST /api/tts/engine/off— unloads SuperTonic 3 from memory -
POST /api/tts/synthesize— synthesizes TTS audio using SuperTonic 3 and returns WAV audio -
POST /api/tts/test— runs a quick TTS render and stores latest audio artifact -
GET /api/tti/status— reports whether SDXL Base 1 (TTI) is loaded -
POST /api/tti/engine/on— loads SDXL Base 1 pipeline into memory -
POST /api/tti/engine/off— unloads SDXL Base 1 pipeline from memory -
POST /api/tti/generate— generates an image from prompt and returns preview payload + output file metadata -
POST /api/tti/test— runs a quick TTI render and stores latest image artifact -
GET /api/tt3d/status— reports whether Hunyuan3D 2.1 is loaded and prerequisite checks -
POST /api/tt3d/engine/on— loads Hunyuan3D shape (and paint, when enabled) pipelines -
POST /api/tt3d/engine/off— unloads TT3D pipelines and clears GPU cache -
POST /api/tt3d/generate— one-shot text-to-3D: SDXL reference → shape → optional PBR → GLB -
POST /api/tt3d/test— runs a quick TT3D render with the default test prompt -
GET /api/media/tti/latest— servesoutput/tti_latest.pngfor UI/media viewer -
GET /api/media/tts/latest— servesoutput/tts_latest.wavfor UI/media viewer -
GET /api/media/tt3d/latest— servesoutput/tt3d_latest.glbfor UI/media viewer -
POST /api/touchdesigner/run-example— launchestouchdesigner/example1.toe -
POST /api/touchdesigner/send-test-data— sends JSON payload to TouchDesigner web server (TD_WEB_HOST:TD_WEB_PORT) -
GET /api/touchdesigner/processes— lists running TouchDesigner processes on this machine
Run tests with uv from the project root:
# New Unreal trigger HTTP tests (without external Unreal/TD software)
uv run pytest -q tests/test_api_unreal_start_audio.py
uv run pytest -q tests/test_api_unreal_start_image.py
# Live HTTP tests (send real POST requests to running API, watch backend console logs)
# Terminal 1: start platform
.\run_platform.bat
# Terminal 2: send live trigger requests via pytest
uv run pytest -q -s tests/test_http_unreal_live.py
# Optional: use a non-default API host/port
LIVE_API_BASE_URL=http://127.0.0.1:8000 uv run pytest -q -s tests/test_http_unreal_live.py
# Optional: run all API tests
uv run pytest -q tests/test_api_*.pyLicensed under the MIT License.