-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade PMAT and ThinkThoroughly agent #104
Conversation
Warning Rate Limit Exceeded@kongzii has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 4 minutes and 6 seconds before requesting another review. How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. WalkthroughThe recent updates focus on enhancing security by integrating Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
@@ -32,7 +32,7 @@ def to_market(self) -> AgentMarket: | |||
url=self.url if self.url else "", | |||
id=self.question, | |||
question=self.question, | |||
p_yes=Probability( | |||
current_p_yes=Probability( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR includes changes required because of newest PMAT version
prediction_market_agent/agents/think_thoroughly_agent/think_thoroughly_agent.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
logger.info(f"Created conditional scenarios: {scenarios.scenarios}") | ||
return scenarios | ||
|
||
def get_hypohetical_scenarios(self, question: str) -> Scenarios: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct the typo in the method name get_hypohetical_scenarios
to get_hypothetical_scenarios
.
- def get_hypohetical_scenarios(self, question: str) -> Scenarios:
+ def get_hypothetical_scenarios(self, question: str) -> Scenarios:
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
def get_hypohetical_scenarios(self, question: str) -> Scenarios: | |
def get_hypothetical_scenarios(self, question: str) -> Scenarios: |
self.user_address = PrivateCredentials.from_api_keys( | ||
MicrochainAPIKeys() | ||
).public_key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the redundant use of PrivateCredentials
in the GetPositions
function, as the user's public key is already fetched earlier in the function.
- self.user_address = PrivateCredentials.from_api_keys(
- MicrochainAPIKeys()
- ).public_key
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
self.user_address = PrivateCredentials.from_api_keys( | |
MicrochainAPIKeys() | |
).public_key |
Example 1: | ||
- SCENARIO: Will Gnosis Pay reach 100,000 users by 2025? | ||
- Possible other SCENARIOs: | ||
- Will Gnosis Pay have less than 25,000 users by 2025? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of generating outcomes (ie Gnosis Pay reaches less than 25,000 users by 2025
), I changed it to generate questions like this.
Agents accept the questions as input, so to keep it consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you are basically generating additional scenarios than the ones generated before?
Can you give examples on which scenarios were generated before and now, to justify this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provided SCENARIO. The produced outcomes should be mutually exclusive, i.e. only one of them should be true whereas | ||
the remaining ones should be false. | ||
""" | ||
CREATE_REQUIRED_CONDITIONS_PROMPT = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another subquestion generation for things that needs to happen in order for the question to be True
prediction_market_agent/agents/think_thoroughly_agent/prompts.py
Outdated
Show resolved
Hide resolved
hypothetical_scenarios = self.get_hypohetical_scenarios(question) | ||
conditional_scenarios = self.get_required_conditions(question) | ||
|
||
sub_predictions = par_generator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw your comment regarding parallel execution.
I think this goes in the right direction, but would advise to use the standard parallelism from crewai (https://docs.crewai.com/core-concepts/Tasks/#asynchronous-execution) instead of managing futures ourselves.
Another advantage of delegating that to crewai is the fact that same agent instance can be used, saving RAM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I'll look into that; I didn't notice (or forgot) about them having this! 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep in mind that their parallelism can be tricky - I had cases where just one task was resolved but it already called subsequential parts of the code.
So I would test it manually first and check which tasks can indeed be made parallel. Let me know if I can be of help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, then maybe it's not worth it? The current implementation seems straightforward to me.
Unless there will be memory problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only problem I see with current implementation is the high memory usage (having 1 agent per outcome, and if I'm not mistaken, 5-8 outcomes per question)?
With crew-ai each agent acts as Worker, hence it's less memory intensive. The downside is it takes longer, hence the tradeoff here RAM vs duration of execution.
I would also mention that the bottleneck here is the agent execution time, hence both approaches (crewai vs par_map) should yield similar results - if both cases have proper error handling, happy with any of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errors are fine. But should the other agent take any significant amount of memory? If they are working in parallel in both cases, they need to store the scraped web content, generations, etc. and otherwise, I hope there is not extra memory overload, let's try it out:
If I set max_workers=1
, then:
And max_workers=5
, then:
So I don't see any significant difference but memory profiling can be tricky, we will see if Kube goes OOM 🙈
Btw the question was, Will the sun rise tomorrow?
. Disappointing that there is 0.01% chance that it will not! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of adding conditional_scenarios
to help the agent reason about the final p_yes. But I think it changes the reasoning required when making the final decision, and I think this should be at least documented, and maybe also expressed in the prompt / logic.
Before, the hypothetical_scenarios
were supposed to cover the whole probability space for some event, e.g. outcomes a1
, a2
and a3
for event A
, where a1
is the 'Yes' outcome to the original question, and a2
and a3
are the extra scenarios generated by the agent. Then
p(A=a1) + p(A=a2) + p(A=a3) = 1
I can't see this explicitly stated anywhere, but I thought that's what @gabrielfior was intending with the design! So then the reasoning the agent might do in generate_final_decision
, when it adjusts its estimate of p(A=a1)
(i.e. the final p_yes) would be to normalize them like p(A=a1) / p(A=a1) + p(A=a2) + p(A=a3)
.
But now you're adding in conditional_scenarios
, which are like outcomes of different random variables, e.g. p(B=b1)
, p(C=c1)
. And we're implicitly saying here that because A
is conditional on B
and C
, p(A=a1 | B=b1) < p(B=b1)
and p(A=a1 | C=c1) < p(C=c1)
. This requires different reasoning what it does for the hypothetical scenarios.
All that said, if the experiments show that the results are good, and so the agent gets this, then I guess it's fine, and it might just be helpful to add this in the comments.
Another point: just taking into account hypothetical_scenarios
by allowing generate_prediction_for_one_outcome
to have a None
return value, it could have a bad knock-on effect when it comes to generate_final_decision
. As instead of erroring out because one scenario fails, the subset of scenarios that are left could no longer cover the whole of the probability space. So the p_yes normalization reasoning that the agent does in this step could be screwed up. However, this point isn't so clear when the hypothetical scenarios are combined with conditional_scenarios
.
@@ -173,7 +178,7 @@ def __init__(self, market_type: MarketType, outcome: str): | |||
self.outcome_bool = get_boolean_outcome( | |||
outcome=self.outcome, market_type=market_type | |||
) | |||
self.user_address = APIKeys().bet_from_address | |||
self.user_address = PrivateCredentials.from_api_keys(APIKeys()).public_key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regretting this suggestion. I feel like we could have done a without separate PrivateCredentials
class and this could have been hidden inside APIKeys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree, it's cumbersome. But let's do that in another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai Please create an issue in prediction-market-agent-tooling repository to remove PrivateCredentials
and make its methods just part of APIKeys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kongzii, the issue has been successfully created in the prediction-market-agent-tooling
repository to address the integration of PrivateCredentials
methods into APIKeys
. You can track the progress and further discussion here: GitHub Issue #218.
Analysis chain
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Create a GitHub issue in the prediction-market-agent-tooling repository
gh issue create --repo gnosis/prediction-market-agent-tooling --title "Integrate \`PrivateCredentials\` methods into \`APIKeys\`" --body "This issue tracks the enhancement to simplify the handling of credentials by removing the separate \`PrivateCredentials\` class and integrating its functionality into \`APIKeys\`. This change aims to reduce complexity and improve code maintainability. Related discussion: https://github.com/gnosis/prediction-market-agent/pull/104#discussion_r1578065452"
Length of output: 555
|
||
sub_predictions = par_generator( | ||
hypothetical_scenarios.scenarios + conditional_scenarios.scenarios, | ||
lambda x: (x, self.generate_prediction_for_one_outcome(x)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is for ProcessPoolExecutor
, but the default executor is DEFAULT_THREADPOOL_EXECUTOR = ThreadPoolExecutor(max_workers=5)
(no need of pickling in this case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah okay
This is an interesting idea to have, but I don't think it was the original intention in this agent, was it @gabrielfior? Previously, one of the examples in the prompt was
And these don't add up to This agent is based on Martin's ideas that he had some time ago:
Based on the prompt for the final decision (and this was modified just a little), giving to agent many different scenarios with their probabilities if fine, and the agent should make sense of it:
Agreed that it won't be optimal, but I thought that if the agent does ~8 predictions, and let's say that some of them fails, it's a shame to not give the final result. For example from the screenshot in slack If the question about hitting $600 fails, it shouldn't be so bad for the final decision. |
@kongzii regarding your comment (I couldn't reply directly there) |
I can't find this in writing on the original PR, except for this comment from me, so maybe I'm just continuing the dream I was having 😆
I would say they do! What other option is there?
That's fair enough. I do think there is something to the point about how an agent reasons about probabilities of 'different outcomes of one random variable' vs probabilities of 'different outcomes of conditional random variables', but I think overall this is a great improvement and I haven't got a good idea on what to do about it, so am happy to approve as-is |
I see now what you meant! There are options
However, the option The original GitHub issue was also about playing with different dates, etc., so that wouldn't sum up to 1. But @coderabbitai, please create an issue to upgrade the Think Thoroughly agent, such that he will generate subquestions in batches, and inside of these batches, probabilities should sum up to 1. Also mention that it's needed to check performance of current version of this agent, and maybe deploy the new one as an another agent, instead of updating the existing one. Thanks! |
Tip For best results, initiate chat on the files or code changes. @kongzii, I've created the GitHub issue as requested. You can view and track the progress of the upgrade for the Think Thoroughly agent here. This issue includes the requirements for generating subquestions in batches where probabilities sum up to 1, checking the performance of the current version, and considering deploying the new version as a separate agent. Analysis chainScripts executedThe following scripts were executed for the analysis: Script: #!/bin/bash
# Create a GitHub issue for upgrading the Think Thoroughly agent
gh issue create --title "Upgrade Think Thoroughly Agent" --body "The Think Thoroughly agent needs an upgrade to ensure that it generates subquestions in batches where the probabilities sum up to 1. Additionally, it is necessary to check the performance of the current version of this agent. Consider deploying the new version as a separate agent, instead of updating the existing one. This issue was requested by @kongzii in PR https://github.com/gnosis/prediction-market-agent/pull/104." --repo gnosis/prediction-market-agent
Length of output: 589 |
Summary by CodeRabbit
New Features
PrivateCredentials
for handling private data across various agent functions.think_thoroughly_agent
to improve market prediction interactions.agent_app.py
script to include cost tracking and display time and cost information during operations.Bug Fixes
Refactor
think_thoroughly_agent
.