Stage 2 Submission (Top 5 Teams)
The top 5 teams from Stage 1 will be invited to Stage 2 via a private link provided by the organizers. Each team may submit exactly once. The submission must be a runnable model, modified agent framework, or combined model-plus-agent system for EmbodiedBench. Three submission formats are supported:
Note: The deadline for submitting the Stage 2 model is May 26, 2026 (AoE), to allow sufficient time for online evaluation and to address any engineering issues related to customization from participating teams.
Held-out tasks: Stage 2 uses held-out tasks from EB-ALFRED and EB-Navigation, which will be released after Stage 2 concludes.
Option A — vLLM-Compatible Model Server
Submit a fine-tuned or adapted model that can be served with vLLM. The organizers will launch it as:
vllm serve <your-model> --host 0.0.0.0 --port 8000
Provide the model path or Hugging Face model ID and any required vllm serve flags.
Option B — Modified Agent Framework
Submit a modified EmbodiedBench agent framework. Teams may customize files under embodiedbench/planner/ (for example prompt construction, reasoning, memory, replanning, and action parsing in planner modules such as vlm_planner.py and nav_planner.py).
Constraints: Teams may only modify files under embodiedbench/planner/ and add new supporting modules. Evaluator code, environment code, and metric computation must remain unchanged.
Submit a code zip together with a README that explains how to run your modified framework for Stage 2 evaluation on the held-out EB-ALFRED and EB-Navigation tasks.
Option A + B — Fine-tuned Model with Custom Agent Framework
Teams may combine both options by submitting a fine-tuned model served via vLLM together with a custom planner under embodiedbench/planner/ that calls it. This supports end-to-end optimization of both model weights and agent strategy.
For the planner portion, the same constraints as Option B apply: teams may only modify files under embodiedbench/planner/ and add new supporting modules, while evaluator code, environment code, and metric computation must remain unchanged.
Submission Package
- Model weights or code repository (Hugging Face model ID, GitHub link, or compressed archive)
- For Option B, a code zip of the modified agent framework
- For Option A + B, both the model package or model ID and a code zip of the custom planner framework
- A
README with step-by-step startup instructions and any customization notes needed for reproduction
- Complete dependency list (
requirements.txt or equivalent)
- Any required
vllm serve flags or other startup arguments
- For Option B and Option A + B, the submitted package must preserve the original evaluator, environment, and metric code unchanged