A tool for benchmarking and tracking Large Language Model (LLM) investment decisions.
This project provides a framework to create, manage, and track investment portfolios generated by LLM models. It allows you to:
- Create new portfolios
- List current holdings and recent context
- Update portfolios based on model decisions
The model executions and their current context can be seen here.
To optimize their portfolio, the primary objective defined for the LLMs, it is imperative to evaluate the risk-reward ratio, formulate cogent assumptions about future market conditions, and leverage tools and their understanding of human psychology and financial market dynamics.
This benchmark may be a good proxy to measure how well LLMs are able to coordinate the aforementioned efforts.
cmd
: Contains the main command implementationscreate
: Initialize new portfolioslist
: Display current holdings and contextupdate
: Process investment orders and update holdings
The most recent prompt with the clear guidelines can be see here.
Model | Ticket | Sum | Quantity |
---|---|---|---|
claude3.5 |
NVDA |
20000 | 25 |
claude3.5 |
MSFT |
20000 | 50 |
claude3.5 |
VOO |
60000 | 150 |
deepseek-r1 |
NVDA |
100000 | 125 |
gemini2.0-flash |
AAPL |
99960 | 588 |
grok3 |
BRK.B |
20000 | 50 |
grok3 |
IWM |
15000 | 75 |
grok3 |
METL |
10000 | 100 |
grok3 |
BTCETF |
10000 | 200 |
grok3 |
BSV |
24960 | 312 |
grok3 |
INTC |
20000 | 400 |
o3-mini |
TSLA |
10134 | 30 |
o3-mini |
GOOGL |
9881 | 55 |
o3-mini |
MSFT |
29799 | 73 |
o3-mini |
AMZN |
19925 | 92 |
o3-mini |
AAPL |
29957 | 122 |
o3-mini |
USD |
303 | 303 |
Model | Total Sum | Change |
---|---|---|
deepseek-r1 |
100000 | — |
claude3.5 |
100000 | — |
o3-mini |
99999 | — |
grok3 |
99960 | — |
gemini2.0-flash |
99960 | — |