[Feature] Trajectory Replay #6049

li-boxuan · 2025-01-05T06:59:37Z

What problem or use case are you trying to solve?

Wouldn't it be cool if we can replay trajectories? This means,

Experiment results are easily reproducible (since evaluation harness usually provides clean initial states)
Easier to record demo videos
Integration testing
Make debugging scenarios easier to reproduce

Describe the UX of the solution you'd like

I don't know how it would look like on web GUI, but I think backend support would be a good first step.

Do you have thoughts on the technical implementation?

At this stage, I am not 100% confident that trajectories information is enough to replay.
Dealing with non-determinism (what if environment changes?) would be tricky, but for simplicity (and I doubt we have to), let's ignore whether real observation matches the observation from the trajectory. Let's simply execute actions from the trajectory and assume everything is deterministic.

Describe alternatives you've considered

Additional context

SWE-agent supports trajectory replay: https://github.com/SWE-agent/SWE-agent/blob/main/sweagent/run/run_replay.py

Roadmap:

Support replay in headless mode
Support trajectory dump in GUI mode
Support trajectory replay in GUI mode
Add E2E tests for replay in headless mode

enyst · 2025-01-10T01:16:58Z

This is so cool! My first thought was that it's the same with going back in time at a random time, but it's not, right? Full replay should be possible and easier!

li-boxuan · 2025-01-10T05:09:42Z

This is so cool! My first thought was that it's the same with going back in time at a random time, but it's not, right? Full replay should be possible and easier!

Yes! Time travel is even cooler but significantly harder. Full replay is easier to achieve and it would do what I need - to be able to replay benchmark results, and to enable end-to-end testing.

li-boxuan added the enhancement New feature or request label Jan 5, 2025

li-boxuan changed the title ~~Trajectory Replay~~ [Feature] Trajectory Replay Jan 5, 2025

li-boxuan self-assigned this Jan 6, 2025

li-boxuan mentioned this issue Jan 9, 2025

[Bug]: Regression in AgentController broke AgentDelegationAction #6162

Closed

1 task

This was referenced Jan 13, 2025

(feat) Add trajectory replay for headless mode #6215

Merged

(WIP) trajectory replay on web GUI #6348

Draft

This was referenced Jan 21, 2025

(feat) Add button to export trajectory on chat panel #6378

Merged

Trajectory replay: Fix a few corner cases #6380

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Trajectory Replay #6049

[Feature] Trajectory Replay #6049

li-boxuan commented Jan 5, 2025 •

edited

Loading

enyst commented Jan 10, 2025

li-boxuan commented Jan 10, 2025

[Feature] Trajectory Replay #6049

[Feature] Trajectory Replay #6049

Comments

li-boxuan commented Jan 5, 2025 • edited Loading

enyst commented Jan 10, 2025

li-boxuan commented Jan 10, 2025

li-boxuan commented Jan 5, 2025 •

edited

Loading