Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade PMAT and ThinkThoroughly agent #104

Merged
merged 7 commits into from
Apr 25, 2024
Merged

Conversation

kongzii
Copy link
Contributor

@kongzii kongzii commented Apr 24, 2024

Summary by CodeRabbit

  • New Features

    • Enhanced security by introducing PrivateCredentials for handling private data across various agent functions.
    • Added new prompts and refined existing ones in the think_thoroughly_agent to improve market prediction interactions.
    • Enhanced the agent_app.py script to include cost tracking and display time and cost information during operations.
  • Bug Fixes

    • Fixed formatting issues with dollar signs in logging messages.
  • Refactor

    • Updated various agents to use renamed attributes and improved handling of market data.
    • Refactored methods related to scenario generation and market predictions in think_thoroughly_agent.

Copy link
Contributor

coderabbitai bot commented Apr 24, 2024

Warning

Rate Limit Exceeded

@kongzii has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 4 minutes and 6 seconds before requesting another review.

How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.
Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.
Please see our FAQ for further information.

Commits Files that changed from the base of the PR and between b53785e and 1f62cc6.

Walkthrough

The recent updates focus on enhancing security by integrating PrivateCredentials for handling sensitive data, refining probability management by renaming attributes, and improving user interaction through better logging and prompt structuring. The changes also include method refinements for better handling scenarios and predictions in agents, and adjustments in utility functions to better reflect market conditions.

Changes

Files Change Summary
.../known_outcome_agent/benchmark.py
.../think_thoroughly_agent/benchmark.py
.../microchain/test_agent.py
Renamed p_yes to current_p_yes and updated related logic.
.../microchain_agent/functions.py
.../microchain_agent/utils.py
.../replicate_to_omen_agent/deploy.py
tests/agents/microchain/test_functions.py
Integrated PrivateCredentials for secure handling of private data and credentials.
.../think_thoroughly_agent/deploy.py
.../think_thoroughly_agent/prompts.py
.../think_thoroughly_agent/think_thoroughly_agent.py
Enhanced logic and structure for scenario generation, probability determination, and market interaction.
.../utils.py
.../tools/streamlit_utils.py
Updated utility functions for better market analysis and fixed formatting issues in logging.
scripts/agent_app.py Enhanced cost tracking and display during agent interactions with markets.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@@ -32,7 +32,7 @@ def to_market(self) -> AgentMarket:
url=self.url if self.url else "",
id=self.question,
question=self.question,
p_yes=Probability(
current_p_yes=Probability(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR includes changes required because of newest PMAT version

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

logger.info(f"Created conditional scenarios: {scenarios.scenarios}")
return scenarios

def get_hypohetical_scenarios(self, question: str) -> Scenarios:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct the typo in the method name get_hypohetical_scenarios to get_hypothetical_scenarios.

-    def get_hypohetical_scenarios(self, question: str) -> Scenarios:
+    def get_hypothetical_scenarios(self, question: str) -> Scenarios:

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
def get_hypohetical_scenarios(self, question: str) -> Scenarios:
def get_hypothetical_scenarios(self, question: str) -> Scenarios:

Comment on lines 343 to 345
self.user_address = PrivateCredentials.from_api_keys(
MicrochainAPIKeys()
).public_key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the redundant use of PrivateCredentials in the GetPositions function, as the user's public key is already fetched earlier in the function.

-        self.user_address = PrivateCredentials.from_api_keys(
-            MicrochainAPIKeys()
-        ).public_key

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
self.user_address = PrivateCredentials.from_api_keys(
MicrochainAPIKeys()
).public_key

Example 1:
- SCENARIO: Will Gnosis Pay reach 100,000 users by 2025?
- Possible other SCENARIOs:
- Will Gnosis Pay have less than 25,000 users by 2025?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of generating outcomes (ie Gnosis Pay reaches less than 25,000 users by 2025), I changed it to generate questions like this.

Agents accept the questions as input, so to keep it consistent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you are basically generating additional scenarios than the ones generated before?
Can you give examples on which scenarios were generated before and now, to justify this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of questions generated before can be seen in the linked issue, for example:

Screenshot by Dropbox Capture

And I screenshotted the kind of questions it generates now into Slack:

image

provided SCENARIO. The produced outcomes should be mutually exclusive, i.e. only one of them should be true whereas
the remaining ones should be false.
"""
CREATE_REQUIRED_CONDITIONS_PROMPT = """
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another subquestion generation for things that needs to happen in order for the question to be True

@kongzii kongzii marked this pull request as draft April 24, 2024 11:50
@kongzii kongzii self-assigned this Apr 24, 2024
@kongzii kongzii marked this pull request as ready for review April 24, 2024 12:11
hypothetical_scenarios = self.get_hypohetical_scenarios(question)
conditional_scenarios = self.get_required_conditions(question)

sub_predictions = par_generator(
Copy link
Contributor

@gabrielfior gabrielfior Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw your comment regarding parallel execution.
I think this goes in the right direction, but would advise to use the standard parallelism from crewai (https://docs.crewai.com/core-concepts/Tasks/#asynchronous-execution) instead of managing futures ourselves.

Another advantage of delegating that to crewai is the fact that same agent instance can be used, saving RAM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I'll look into that; I didn't notice (or forgot) about them having this! 🙈

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that their parallelism can be tricky - I had cases where just one task was resolved but it already called subsequential parts of the code.
So I would test it manually first and check which tasks can indeed be made parallel. Let me know if I can be of help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, then maybe it's not worth it? The current implementation seems straightforward to me.

Unless there will be memory problems.

Copy link
Contributor

@gabrielfior gabrielfior Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only problem I see with current implementation is the high memory usage (having 1 agent per outcome, and if I'm not mistaken, 5-8 outcomes per question)?
With crew-ai each agent acts as Worker, hence it's less memory intensive. The downside is it takes longer, hence the tradeoff here RAM vs duration of execution.

I would also mention that the bottleneck here is the agent execution time, hence both approaches (crewai vs par_map) should yield similar results - if both cases have proper error handling, happy with any of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors are fine. But should the other agent take any significant amount of memory? If they are working in parallel in both cases, they need to store the scraped web content, generations, etc. and otherwise, I hope there is not extra memory overload, let's try it out:

If I set max_workers=1, then:

Screenshot by Dropbox Capture

And max_workers=5, then:

Screenshot by Dropbox Capture


So I don't see any significant difference but memory profiling can be tricky, we will see if Kube goes OOM 🙈


Btw the question was, Will the sun rise tomorrow?. Disappointing that there is 0.01% chance that it will not! 😄

Copy link
Contributor

@evangriffiths evangriffiths left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of adding conditional_scenarios to help the agent reason about the final p_yes. But I think it changes the reasoning required when making the final decision, and I think this should be at least documented, and maybe also expressed in the prompt / logic.

Before, the hypothetical_scenarios were supposed to cover the whole probability space for some event, e.g. outcomes a1, a2 and a3 for event A, where a1 is the 'Yes' outcome to the original question, and a2 and a3 are the extra scenarios generated by the agent. Then

p(A=a1) + p(A=a2) + p(A=a3) = 1

I can't see this explicitly stated anywhere, but I thought that's what @gabrielfior was intending with the design! So then the reasoning the agent might do in generate_final_decision, when it adjusts its estimate of p(A=a1) (i.e. the final p_yes) would be to normalize them like p(A=a1) / p(A=a1) + p(A=a2) + p(A=a3).

But now you're adding in conditional_scenarios, which are like outcomes of different random variables, e.g. p(B=b1), p(C=c1). And we're implicitly saying here that because A is conditional on B and C, p(A=a1 | B=b1) < p(B=b1) and p(A=a1 | C=c1) < p(C=c1). This requires different reasoning what it does for the hypothetical scenarios.

All that said, if the experiments show that the results are good, and so the agent gets this, then I guess it's fine, and it might just be helpful to add this in the comments.

Another point: just taking into account hypothetical_scenarios by allowing generate_prediction_for_one_outcome to have a None return value, it could have a bad knock-on effect when it comes to generate_final_decision. As instead of erroring out because one scenario fails, the subset of scenarios that are left could no longer cover the whole of the probability space. So the p_yes normalization reasoning that the agent does in this step could be screwed up. However, this point isn't so clear when the hypothetical scenarios are combined with conditional_scenarios.

@@ -173,7 +178,7 @@ def __init__(self, market_type: MarketType, outcome: str):
self.outcome_bool = get_boolean_outcome(
outcome=self.outcome, market_type=market_type
)
self.user_address = APIKeys().bet_from_address
self.user_address = PrivateCredentials.from_api_keys(APIKeys()).public_key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regretting this suggestion. I feel like we could have done a without separate PrivateCredentials class and this could have been hidden inside APIKeys

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree, it's cumbersome. But let's do that in another PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai Please create an issue in prediction-market-agent-tooling repository to remove PrivateCredentials and make its methods just part of APIKeys

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kongzii, the issue has been successfully created in the prediction-market-agent-tooling repository to address the integration of PrivateCredentials methods into APIKeys. You can track the progress and further discussion here: GitHub Issue #218.

Analysis chain
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Create a GitHub issue in the prediction-market-agent-tooling repository
gh issue create --repo gnosis/prediction-market-agent-tooling --title "Integrate \`PrivateCredentials\` methods into \`APIKeys\`" --body "This issue tracks the enhancement to simplify the handling of credentials by removing the separate \`PrivateCredentials\` class and integrating its functionality into \`APIKeys\`. This change aims to reduce complexity and improve code maintainability. Related discussion: https://github.com/gnosis/prediction-market-agent/pull/104#discussion_r1578065452"

Length of output: 555


sub_predictions = par_generator(
hypothetical_scenarios.scenarios + conditional_scenarios.scenarios,
lambda x: (x, self.generate_prediction_for_one_outcome(x)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is for ProcessPoolExecutor, but the default executor is DEFAULT_THREADPOOL_EXECUTOR = ThreadPoolExecutor(max_workers=5) (no need of pickling in this case)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah okay

@kongzii
Copy link
Contributor Author

kongzii commented Apr 24, 2024

@evangriffiths

Before, the hypothetical_scenarios were supposed to cover the whole probability space for some event, e.g. outcomes a1, a2 and a3 for event A, where a1 is the 'Yes' outcome to the original question, and a2 and a3 are the extra scenarios generated by the agent. Then p(A=a1) + p(A=a2) + p(A=a3) = 1

This is an interesting idea to have, but I don't think it was the original intention in this agent, was it @gabrielfior?

Previously, one of the examples in the prompt was

Example 1:
        - SCENARIO: Will Gnosis Pay reach 100,000 users by 2025?
        - Answer: '''
            - Gnosis Pay reaches less than 25,000 users by 2025
            - Gnosis Pay reaches more than 25,000 users but less than 50,000 users by 2025
            - Gnosis Pay reaches more than 50,000 users but less than 75,000 users by 2025
            - Gnosis Pay reaches 100,000 users or more by 2025

And these don't add up to 1 either, or?

This agent is based on Martin's ideas that he had some time ago:

Screenshot by Dropbox Capture

So then the reasoning the agent might do in generate_final_decision, when it adjusts its estimate of p(A=a1) (i.e. the final p_yes) would be to normalize them like p(A=a1) / p(A=a1) + p(A=a2) + p(A=a3).

Based on the prompt for the final decision (and this was modified just a little), giving to agent many different scenarios with their probabilities if fine, and the agent should make sense of it:

Screenshot by Dropbox Capture

Another point: just taking into account hypothetical_scenarios by allowing generate_prediction_for_one_outcome to have a None return value, it could have a bad knock-on effect

Agreed that it won't be optimal, but I thought that if the agent does ~8 predictions, and let's say that some of them fails, it's a shame to not give the final result.

For example from the screenshot in slack

image

If the question about hitting $600 fails, it shouldn't be so bad for the final decision.

@gabrielfior
Copy link
Contributor

@kongzii regarding your comment (I couldn't reply directly there)
-> The main point is the sum of probability of each outcome not summing to 1. That should be the case in theory, however since we are processing outcomes in parallel (and not passing the other probabilities to the agent), it's natural to expect it to deliver probabilities that together don't add up to 1.
-> The way I implemented back then (and is still the case now) is to have the final agent look at all tuples (outcome, probability) and make a final choice.

@evangriffiths
Copy link
Contributor

This is an interesting idea to have, but I don't think it was the original intention in this agent, was it @gabrielfior?

I can't find this in writing on the original PR, except for this comment from me, so maybe I'm just continuing the dream I was having 😆
#47 (review)

And these don't add up to 1 either, or?

I would say they do! What other option is there?

I thought that if the agent does ~8 predictions, and let's say that some of them fails, it's a shame to not give the final result.

That's fair enough.

I do think there is something to the point about how an agent reasons about probabilities of 'different outcomes of one random variable' vs probabilities of 'different outcomes of conditional random variables', but I think overall this is a great improvement and I haven't got a good idea on what to do about it, so am happy to approve as-is

@evangriffiths evangriffiths self-requested a review April 24, 2024 20:08
@kongzii
Copy link
Contributor Author

kongzii commented Apr 25, 2024

I would say they do! What other option is there?

I see now what you meant! There are options

  • Gnosis Pay reaches less than 25,000 users by 2025
  • Gnosis Pay reaches more than 25,000 users but less than 50,000 users by 2025
  • Gnosis Pay reaches more than 50,000 users but less than 75,000 users by 2025
  • Gnosis Pay reaches 100,000 users or more by 2025

However, the option more than 75,000 but less than 100,000 users is missing. However, I see the idea that if it were included, the users should sum up to 1.

The original GitHub issue was also about playing with different dates, etc., so that wouldn't sum up to 1.

But @coderabbitai, please create an issue to upgrade the Think Thoroughly agent, such that he will generate subquestions in batches, and inside of these batches, probabilities should sum up to 1. Also mention that it's needed to check performance of current version of this agent, and maybe deploy the new one as an another agent, instead of updating the existing one. Thanks!

Copy link
Contributor

coderabbitai bot commented Apr 25, 2024

Tip

For best results, initiate chat on the files or code changes.

@kongzii, I've created the GitHub issue as requested. You can view and track the progress of the upgrade for the Think Thoroughly agent here. This issue includes the requirements for generating subquestions in batches where probabilities sum up to 1, checking the performance of the current version, and considering deploying the new version as a separate agent.

Analysis chain
Scripts executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Create a GitHub issue for upgrading the Think Thoroughly agent
gh issue create --title "Upgrade Think Thoroughly Agent" --body "The Think Thoroughly agent needs an upgrade to ensure that it generates subquestions in batches where the probabilities sum up to 1. Additionally, it is necessary to check the performance of the current version of this agent. Consider deploying the new version as a separate agent, instead of updating the existing one. This issue was requested by @kongzii in PR https://github.com/gnosis/prediction-market-agent/pull/104." --repo gnosis/prediction-market-agent

Length of output: 589

@kongzii kongzii merged commit 2e2636e into main Apr 25, 2024
6 checks passed
@kongzii kongzii deleted the peter/upgrades-to-thinker branch April 25, 2024 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants