Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing StatefulReplayStrategy #327

Open
wants to merge 45 commits into
base: main
Choose a base branch
from

Conversation

FFFiend
Copy link
Collaborator

@FFFiend FFFiend commented Jun 28, 2023

What kind of change does this PR introduce?
Adds a test file to examine the quality and accuracy of action event generation by GPT-4.

Summary
Simple pytest function for comparing expected vs actual values of action event keys. Code is currently in boilerplate stage.

Responds to #242, we would like to enhance as well as extend this strategy.

Checklist

  • My code follows the style guidelines of OpenAdapt
  • I have perfomed a self-review of my code
  • If applicable, I have added tests to prove my fix is functional/effective
  • I have linted my code locally prior to submission
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (e.g. README.md, requirements.txt)
  • New and existing unit tests pass locally with my changes

How can your code be run and tested?
Not yet meant to be run. Will update with time.

@FFFiend
Copy link
Collaborator Author

FFFiend commented Jun 28, 2023

Wrote a function to verify that, given a previously seen action and window event sequence, the model is able to generate the exact same action event sequence when given the old windowevent sequence as a parameter.

current TODOs:

Create minimal versions of each: Recording, WindowEvent, etc.. and obtain diffs

@FFFiend FFFiend marked this pull request as ready for review June 30, 2023 23:21
@FFFiend
Copy link
Collaborator Author

FFFiend commented Jun 30, 2023

Added create event and action dict methods for synthetic input generation. Current TODO:

Write a couple tests to evaluate generation quality from GPT-4. Move to a GGML Open Source LLM when results are satisfactory.

@OpenAdaptAI OpenAdaptAI deleted a comment from cr-gpt bot Jul 25, 2023
@OpenAdaptAI OpenAdaptAI deleted a comment from cr-gpt bot Jul 25, 2023
@OpenAdaptAI OpenAdaptAI deleted a comment from cr-gpt bot Jul 25, 2023
@OpenAdaptAI OpenAdaptAI deleted a comment from cr-gpt bot Jul 25, 2023
mouse_button_name: str = None,
mouse_pressed: bool = None,
key_name: str = None,
element_state: dict[Any, Any] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be dict[str, Any]?

width=WIN_WIDTH,
height=WIN_HEIGHT,
window_id=WINDOW_ID,
meta={},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about removing meta={},?

width=WIN_WIDTH,
height=WIN_HEIGHT,
window_id=WINDOW_ID,
meta={},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about removing meta={},?

width=WIN_WIDTH,
height=WIN_HEIGHT,
window_id=WINDOW_ID,
meta={},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about removing meta={},?

mouse_y=REF_Y,
mouse_button_name="left",
mouse_pressed=True,
element_state={},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about removing element_state={},?

mouse_y=REF_Y + i,
mouse_button_name="left",
mouse_pressed=True,
element_state={},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about removing element_state={},?

mouse_y=REF_Y + i,
mouse_button_name="left",
mouse_pressed=True,
element_state={},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about removing element_state={},?

width=WIN_WIDTH,
height=WIN_HEIGHT,
window_id=WINDOW_ID,
meta={},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about removing meta={},?

mouse_y=NEW_Y,
mouse_button_name="left",
mouse_pressed=True,
element_state={},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about removing element_state={},?

mouse_y=NEW_Y + i,
mouse_button_name="left",
mouse_pressed=True,
element_state={},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about removing element_state={},?

mouse_y=NEW_Y + i,
mouse_button_name="left",
mouse_pressed=True,
element_state={},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about removing element_state={},?

MULTI_ACTION_WIN_WIDTH,
MULTI_ACTION_WIN_HEIGHT,
MULTI_ACTION_WINDOW_ID,
{},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about removing {},?

MULTI_ACTION_WIN_WIDTH,
MULTI_ACTION_WIN_HEIGHT,
MULTI_ACTION_WINDOW_ID,
{},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about removing {},?

"refuse. Copy the given format exactly. Your response should be "
"valid Python3 code. Do not respond with any other text. "
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FFFiend to better organize the code and to enforce strict validation on action and window, as well as because I need these pieces of code to generate generic/simple action for model evaluation, I picked these parts and move to a dedicated model as in my PR #444

Copy link
Contributor

@LaPetiteSouris LaPetiteSouris Jul 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the idea to use single action for evaluation. This is truly great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FFFiend what do you think about moving this to a .j2 file?

test_action_dict = gpt_completion(
reference_window_dict, reference_action_dicts, active_window_dict
)
test_dict = eval(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using json.loads instead of eval?

@FFFiend FFFiend mentioned this pull request Aug 3, 2023
7 tasks
@FFFiend
Copy link
Collaborator Author

FFFiend commented Aug 7, 2023

So one of the tests is failing now apparently, when it previously wasn't. The model is outputting more events than necessary, see here:

expected_action_dict=[{'name': 'click', 'mouse_x': 138, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 139, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 90, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 139, 'mouse_y': 90, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 140, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 91, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 140, 'mouse_y': 91, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 141, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 92, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 141, 'mouse_y': 92, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 142, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 93, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 142, 'mouse_y': 93, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 143, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 94, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 143, 'mouse_y': 94, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 144, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 95, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 144, 'mouse_y': 95, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 145, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 96, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 145, 'mouse_y': 96, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 146, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 97, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 146, 'mouse_y': 97, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 147, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 98, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 147, 'mouse_y': 98, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 148, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 99, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 148, 'mouse_y': 99, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 149, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 100, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 149, 'mouse_y': 100, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}], len(expected_action_dict)=36

and

test_dict=[{'name': 'click', 'mouse_x': 138, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 139, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 90, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 139, 'mouse_y': 90, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 140, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 91, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 140, 'mouse_y': 91, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 141, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 92, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 141, 'mouse_y': 92, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 142, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 93, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 142, 'mouse_y': 93, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 143, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 94, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 143, 'mouse_y': 94, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 144, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 95, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 144, 'mouse_y': 95, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 145, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 96, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 145, 'mouse_y': 96, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 146, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 97, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 146, 'mouse_y': 97, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 147, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 98, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 147, 'mouse_y': 98, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 148, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 99, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 148, 'mouse_y': 99, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 149, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 100, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 149, 'mouse_y': 100, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 150, 'mouse_y': 89, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 138, 'mouse_y': 101, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}, {'name': 'click', 'mouse_x': 150, 'mouse_y': 101, 'mouse_button_name': 'left', 'mouse_pressed': True, 'element_state': {}}], len(test_dict)=39

Clearly test_dict has 3 more actions than expected_dict

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants