-
-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing StatefulReplayStrategy #327
base: main
Are you sure you want to change the base?
Conversation
Wrote a function to verify that, given a previously seen action and window event sequence, the model is able to generate the exact same action event sequence when given the old windowevent sequence as a parameter. current TODOs:
|
Added create event and action dict methods for synthetic input generation. Current TODO:
|
tests/openadapt/test_stateful.py
Outdated
mouse_button_name: str = None, | ||
mouse_pressed: bool = None, | ||
key_name: str = None, | ||
element_state: dict[Any, Any] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be dict[str, Any]
?
tests/openadapt/test_stateful.py
Outdated
width=WIN_WIDTH, | ||
height=WIN_HEIGHT, | ||
window_id=WINDOW_ID, | ||
meta={}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about removing meta={},
?
tests/openadapt/test_stateful.py
Outdated
width=WIN_WIDTH, | ||
height=WIN_HEIGHT, | ||
window_id=WINDOW_ID, | ||
meta={}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about removing meta={},
?
tests/openadapt/test_stateful.py
Outdated
width=WIN_WIDTH, | ||
height=WIN_HEIGHT, | ||
window_id=WINDOW_ID, | ||
meta={}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about removing meta={},
?
tests/openadapt/test_stateful.py
Outdated
mouse_y=REF_Y, | ||
mouse_button_name="left", | ||
mouse_pressed=True, | ||
element_state={}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about removing element_state={},
?
tests/openadapt/test_stateful.py
Outdated
mouse_y=REF_Y + i, | ||
mouse_button_name="left", | ||
mouse_pressed=True, | ||
element_state={}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about removing element_state={},
?
tests/openadapt/test_stateful.py
Outdated
mouse_y=REF_Y + i, | ||
mouse_button_name="left", | ||
mouse_pressed=True, | ||
element_state={}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about removing element_state={},
?
tests/openadapt/test_stateful.py
Outdated
width=WIN_WIDTH, | ||
height=WIN_HEIGHT, | ||
window_id=WINDOW_ID, | ||
meta={}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about removing meta={},
?
tests/openadapt/test_stateful.py
Outdated
mouse_y=NEW_Y, | ||
mouse_button_name="left", | ||
mouse_pressed=True, | ||
element_state={}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about removing element_state={},
?
tests/openadapt/test_stateful.py
Outdated
mouse_y=NEW_Y + i, | ||
mouse_button_name="left", | ||
mouse_pressed=True, | ||
element_state={}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about removing element_state={},
?
tests/openadapt/test_stateful.py
Outdated
mouse_y=NEW_Y + i, | ||
mouse_button_name="left", | ||
mouse_pressed=True, | ||
element_state={}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about removing element_state={},
?
tests/openadapt/test_stateful.py
Outdated
MULTI_ACTION_WIN_WIDTH, | ||
MULTI_ACTION_WIN_HEIGHT, | ||
MULTI_ACTION_WINDOW_ID, | ||
{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about removing {},
?
tests/openadapt/test_stateful.py
Outdated
MULTI_ACTION_WIN_WIDTH, | ||
MULTI_ACTION_WIN_HEIGHT, | ||
MULTI_ACTION_WINDOW_ID, | ||
{}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about removing {},
?
"refuse. Copy the given format exactly. Your response should be " | ||
"valid Python3 code. Do not respond with any other text. " | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the idea to use single action for evaluation. This is truly great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FFFiend what do you think about moving this to a .j2 file?
test_action_dict = gpt_completion( | ||
reference_window_dict, reference_action_dicts, active_window_dict | ||
) | ||
test_dict = eval( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about using json.loads
instead of eval
?
So one of the tests is failing now apparently, when it previously wasn't. The model is outputting more events than necessary, see here:
and
Clearly test_dict has 3 more actions than expected_dict |
What kind of change does this PR introduce?
Adds a test file to examine the quality and accuracy of action event generation by GPT-4.
Summary
Simple pytest function for comparing expected vs actual values of action event keys. Code is currently in boilerplate stage.
Responds to #242, we would like to enhance as well as extend this strategy.
Checklist
How can your code be run and tested?
Not yet meant to be run. Will update with time.