Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] helium vision_web_browser.py NoneType error after saving image #570

Closed
PrideIsLife opened this issue Feb 9, 2025 · 7 comments
Closed
Labels
bug Something isn't working

Comments

@PrideIsLife
Copy link

PrideIsLife commented Feb 9, 2025

Describe the bug
When running vision_web_browser.py on Windows 11 with helium.

with the following model :

model = LiteLLMModel(
        model_id="ollama_chat/mistral-small",
        api_base="http://localhost:11434",
        num_ctx=8192 
    )

I get the error below.

Code to reproduce the error
src/smolagents/vision_web_browser.py with the above model on Windows 11 with helium.

Error logs (if any)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Output message of the LLM: ─────────────────────────────────────────────────────────────────────────────
Thought: I need to navigate to https://en.wikipedia.org/wiki/Chicago and find a sentence containing the
word "1992" that mentions a construction accident. I will use the `web_search` tool to search for
relevant information about Chicago in 1992, then navigate to Wikipedia using helium.
Code:
results = web_search(query="Chicago 1992 construction accident")
print(results)

 ─ Executing parsed code: ─────────────────────────────────────────────────────────────────────────────
  results = web_search(query="Chicago 1992 construction accident")
  print(results)
 ──────────────────────────────────────────────────────────────────────────────────────────────────────
Execution logs:
## Search Results

[Chicago flood - Wikipedia](https://en.wikipedia.org/wiki/Chicago_flood)
The Chicago flood occurred on April 13, 1992, ... Insurance battles lasted for years, the central point
being the definition of the accident, i.e., whether it was a "flood" or a "leak". Leaks were covered by
insurance, while floods were not. ... (at the time of the tunnel construction) was a pivoting bridge
with a central pivot in the middle of ...

[A Comedy of Errors: How a Small Leak Became the Great Loop Flood of
1992](https://www.wttw.com/chicago-stories/downtown-disasters/a-comedy-of-errors-how-a-small-leak-became
-the-great-loop-flood-of-1992)
On the morning of April 13, 1992 as commuters were heading to their offices in Chicago's Loop, fish were
swimming in the basement of the Merchandise Mart. A strange flood was rising. But at street level, no
one could actually see the flood. What began as a leak in a unique but mostly forgotten underground
freight tunnel system became a two-week combination of comedy-of-errors and soap opera ...

[Bridge accident baffles Chicago engineers - UPI
Archives](https://www.upi.com/Archives/1992/09/21/Bridge-accident-baffles-Chicago-engineers/396971704800
0/)
Sept. 21, 1992 Bridge accident baffles Chicago engineers. CHICAGO ... construction worker Jesus Lopez,
who was not injured. 'The ball is right inside my back seat,' he said.

[A Freak Michigan Avenue Bridge Accident Occurred in Chicago on
...](https://drloihjournal.blogspot.com/2021/03/freak-michigan-avenue-bridge-accident-occurred-in-chicag
o-on-9-20-1992.html)
A Freak Michigan Avenue Bridge Accident Occurred in Chicago on September 20, 1992. ... sending a
construction crane plummeting to the street and slightly injuring six people. The crane crashed through
Michigan Avenue to Lower Michigan. ... 1992, Chicago Transportation Commissioner J.F. Boyle Jr. asserted
the man was "absolutely blameless." The ...

[Remembering Great Chicago Flood 30 years
later](https://abc7chicago.com/chicago-flood-1992-great-loop-history-news/11744121/)
It was the Great Chicago Flood of 1992. By the time Myron Maurer arrived at work on April 13, 1992, the
entire boiler room under the massive Merchandise Mart was under 30 feet of water.

[30 years ago today: Great Chicago Flood paralyzes Loop
businesses](https://www.cbsnews.com/chicago/news/great-chicago-flood-30th-anniversary-loop-chicago-river
/)
CBS 2 Vault: Coverage of the 1992 Chicago Flood 15:46. CHICAGO (CBS) --It has been a wet morning in the
Loop, but nothing like it was 30 years ago today, when several downtown buildings flooded ...

[Why The 1992 Loop Flood Is The Most Chicago Story Ever -
WBEZ](https://www.wbez.org/curious-city/2016/08/21/why-the-1992-loop-flood-is-the-most-chicago-story-eve
r)
How clout, corruption, and construction without permits led to half the Loop being evacuated. ... On
April 13th, 1992, Chicago was struck by a man-made natural disaster. The Great Chicago Flood of ...

[Chicago's Great Flood: How a leak led to billions of dollars in
damage](https://www.nbcchicago.com/news/local/chicagos-great-flood-how-a-leak-led-to-billions-of-dollars
-in-damage/3514437/)
It was April of 1992 when downtown Chicago suffered major flooding. By Lexi Sutter • Published August 6,
2024 • Updated on August 6, 2024 at 5:40 pm NBC Universal, Inc.

[25 years on, the soggy story of the Loop Flood lingers - Chicago
Sun-Times](https://chicago.suntimes.com/2017/4/14/18333721/steinberg-25-years-on-the-soggy-story-of-the-
loop-flood-lingers)
25 years on, the soggy story of the Loop Flood lingers A crack became a hole the size of an automobile,
and the trickle turned into a torrent as the Chicago River began pouring into the 47 miles ...

[Remembering the 'Great Chicago Flood' 30 years
later](https://www.fox32chicago.com/news/remembering-the-great-chicago-flood-30-years-later)
CHICAGO - Wednesday marked 30 years since the "Great Chicago Flood" of 1992.. On April 13, 1992, an
underground tunnel wall failed, causing dozens of office building basements to flood and forcing ...

Out: None
Captured a browser screenshot: (984, 1150) pixels
[Step 0: Duration 6.95 seconds| Input tokens: 2,880 | Output tokens: 93]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Error in generating model output:
Cannot use images with flatten_messages_as_text=True
Captured a browser screenshot: (984, 1150) pixels
[Step 1: Duration 0.00 seconds| Input tokens: 5,760 | Output tokens: 186]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 3 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Captured a browser screenshot: (984, 1150) pixels
[Step 2: Duration 0.00 seconds| Input tokens: 8,640 | Output tokens: 279]
Traceback (most recent call last):
  File "C:\Users\pride\Documents\Projects\smolagents\vision_web_browser.py", line 213, in <module>
    main()
  File "C:\Users\pride\Documents\Projects\smolagents\vision_web_browser.py", line 209, in main
    agent.run(args.prompt + helium_instructions)
  File "C:\Users\pride\AppData\Local\Programs\Python\Python312\Lib\site-packages\smolagents\agents.py", line 376, in run
    return deque(self._run(task=self.task, images=images), maxlen=1)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pride\AppData\Local\Programs\Python\Python312\Lib\site-packages\smolagents\agents.py", line 405, in _run
    final_answer = self.step(memory_step)
                   ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pride\AppData\Local\Programs\Python\Python312\Lib\site-packages\smolagents\agents.py", line 839, in step
    memory_messages = self.write_memory_to_messages()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pride\AppData\Local\Programs\Python\Python312\Lib\site-packages\smolagents\agents.py", line 188, in write_memory_to_messages
    messages.extend(memory_step.to_messages(summary_mode=summary_mode))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pride\AppData\Local\Programs\Python\Python312\Lib\site-packages\smolagents\memory.py", line 109, in to_messages
    "text": f"Call id: {self.tool_calls[0].id}\nObservation:\n{self.observations}",
                        ~~~~~~~~~~~~~~~^^^
TypeError: 'NoneType' object is not subscriptable

Expected behavior
No error

Packages version:

helium==5.1.0
selenium==4.28.1
smolagents==1.8.0
@PrideIsLife PrideIsLife added the bug Something isn't working label Feb 9, 2025
@PrideIsLife
Copy link
Author

Same error with vision model :

model = LiteLLMModel(
    model_id="ollama_chat/llama3.2-vision:11b"
    api_base="http://localhost:11434"
    num_ctx=8192
)

@sysradium
Copy link
Contributor

sysradium commented Feb 9, 2025

llama3.2-vision:11b is not yet supported by the underlying library LiteLLMModel: BerriAI/litellm#6683

And in general it won't work unless this gets merged #553

But the only currently supported vision model llava is very bad. You can try using it with my PR:

python src/smolagents/vision_web_browser.py  --model-type LiteLLMModel --model-id=ollama/llava

But it produces very bad results. In some cases the model does not return any python code on the first go and that kills execution. I created a PR which tries to fix that.

@PrideIsLife
Copy link
Author

PrideIsLife commented Feb 15, 2025

Shouldn't this has been fixed by now @sysradium ?

I tried again today with smolagents==1.9.2
and the following model :

# Initialize the model based on the provided arguments
    model = LiteLLMModel(
        model_id="ollama_chat/llava:13b",# model_id="ollama_chat/mistral-small",
        api_base="http://localhost:11434",
        num_ctx=8192
    )

But I still encounter the following error :

Out: Found 4 matches for '1992'.Focused on element 1 of 4
Captured a browser screenshot: (984, 1150) pixels
[Step 0: Duration 19.18 seconds| Input tokens: 3,368 | Output tokens: 193]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Error in generating model output:
Cannot use images with flatten_messages_as_text=True
Captured a browser screenshot: (984, 1150) pixels
[Step 1: Duration 0.00 seconds| Input tokens: 6,736 | Output tokens: 386]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 3 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Captured a browser screenshot: (984, 1150) pixels
[Step 2: Duration 0.00 seconds| Input tokens: 10,104 | Output tokens: 579]
Traceback (most recent call last):
  File "C:\Users\pride\Documents\Projects\smolagents\vision_web_browser.py", line 213, in <module>
    main()
  File "C:\Users\pride\Documents\Projects\smolagents\vision_web_browser.py", line 209, in main
    agent.run(args.prompt + helium_instructions)
  File "C:\Users\pride\AppData\Local\Programs\Python\Python312\Lib\site-packages\smolagents\agents.py", line 458, in run
    return deque(self._run(task=self.task, images=images), maxlen=1)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pride\AppData\Local\Programs\Python\Python312\Lib\site-packages\smolagents\agents.py", line 487, in _run
    final_answer = self.step(memory_step)
                   ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pride\AppData\Local\Programs\Python\Python312\Lib\site-packages\smolagents\agents.py", line 1239, in step
    memory_messages = self.write_memory_to_messages()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pride\AppData\Local\Programs\Python\Python312\Lib\site-packages\smolagents\agents.py", line 283, in write_memory_to_messages
    messages.extend(memory_step.to_messages(summary_mode=summary_mode))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pride\AppData\Local\Programs\Python\Python312\Lib\site-packages\smolagents\memory.py", line 109, in to_messages
    "text": f"Call id: {self.tool_calls[0].id}\nObservation:\n{self.observations}",
                        ~~~~~~~~~~~~~~~^^^
TypeError: 'NoneType' object is not subscriptable

@sysradium
Copy link
Contributor

sysradium commented Feb 15, 2025

@PrideIsLife yeah, as you can see they haven't merged it yet.

But once it is merged it will still fail due to d34e0c8 which we are discussing here #655.

But still you can overcome this disabling message flattening:

model = LiteLLMModel(
        model_id="ollama_chat/mistral-small",
        api_base="http://localhost:11434",
        num_ctx=8192 
        flatten_messages_as_text=False,
    )

@PrideIsLife
Copy link
Author

PrideIsLife commented Feb 15, 2025

Ok thank you @sysradium, BTW I just tested with flatten_messages_as_text=False :

model = LiteLLMModel(
        # model_id="ollama_chat/llava:13b",# model_id="ollama_chat/mistral-small",
        model_id="ollama_chat/mistral-small",
        api_base="http://localhost:11434", 
        num_ctx=8192,
        flatten_messages_as_text=False,
    )

and got the following error :

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.

16:25:35 - LiteLLM:DEBUG: exception_mapping_utils.py:2230 - Logging Details: logger_fn - None | callable(logger_fn) - False
16:25:35 - LiteLLM:DEBUG: litellm_logging.py:1841 - Logging Details LiteLLM-Failure Call: []
Error in generating model output:
litellm.APIConnectionError: Ollama_chatException - Client error '400 Bad Request' for url 'http://localhost:11434/api/chat'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400
Captured a browser screenshot: (984, 1150) pixels
[Step 0: Duration 2.08 seconds]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Captured a browser screenshot: (984, 1150) pixels
[Step 1: Duration 0.00 seconds]
Traceback (most recent call last):
  File "C:\Users\pride\Documents\Projects\smolagents\vision_web_browser.py", line 218, in <module>
    main()
  File "C:\Users\pride\Documents\Projects\smolagents\vision_web_browser.py", line 214, in main
    agent.run(args.prompt + helium_instructions)
  File "C:\Users\pride\AppData\Local\Programs\Python\Python312\Lib\site-packages\smolagents\agents.py", line 458, in run
    return deque(self._run(task=self.task, images=images), maxlen=1)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pride\AppData\Local\Programs\Python\Python312\Lib\site-packages\smolagents\agents.py", line 487, in _run
    final_answer = self.step(memory_step)
                   ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pride\AppData\Local\Programs\Python\Python312\Lib\site-packages\smolagents\agents.py", line 1239, in step
    memory_messages = self.write_memory_to_messages()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pride\AppData\Local\Programs\Python\Python312\Lib\site-packages\smolagents\agents.py", line 283, in write_memory_to_messages
    messages.extend(memory_step.to_messages(summary_mode=summary_mode))
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pride\AppData\Local\Programs\Python\Python312\Lib\site-packages\smolagents\memory.py", line 109, in to_messages
    "text": f"Call id: {self.tool_calls[0].id}\nObservation:\n{self.observations}",
                        ~~~~~~~~~~~~~~~^^^
TypeError: 'NoneType' object is not subscriptable

same with llava:13b.

EDIT found some logs in ollama :

VRAM usage didn't recover within timeout" seconds=5.386267103 model=/usr/share/ollama/.ollama/models/blobs/sha256-102a747c137683e81d431dab05d8f2158df4ab6f162f8f9019425a43d51e0e9f

I guess disabling flattening increases vram usage ?

@sysradium
Copy link
Contributor

sysradium commented Feb 15, 2025

@PrideIsLife the problem will go away only when my change gets merged :)
I have rebased my change to incorporate the flatten_messages_as_text flag. Feel free to use my branch for now.

@albertvillanova
Copy link
Member

To run vision_web_browser, you need to use a VLM model that supports images.

You got this error message:

Cannot use images with flatten_messages_as_text=True

which means that your model does not support images.

That is why, when your forced the image input format (flatten_messages_as_text=False), your model raised an error because it does not support the image input format:

litellm.APIConnectionError: Ollama_chatException - Client error '400 Bad Request' for url 'http://localhost:11434/api/chat'

Please, try to use a VLM model that supports images.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants