Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: working instrumentation with smolagents #1184

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

aymeric-roucher
Copy link

Fixes #1182
cc @harrisonchu I've copied the work you did for crewAI, ended up working really well!

I've put a test.py file at the root to let you try out the instrumentation.

Beware that I've not adapted the tests yet.

@aymeric-roucher aymeric-roucher requested a review from a team as a code owner January 9, 2025 20:20
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jan 9, 2025
Copy link
Contributor

github-actions bot commented Jan 9, 2025

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@axiomofjoy axiomofjoy self-requested a review January 9, 2025 21:02
@axiomofjoy
Copy link
Contributor

Excited to dig into this @aymeric-roucher!

agent.model.last_input_token_count + agent.model.last_output_token_count
)
span.set_attribute("Observations", step_log.observations)
# if step_log.error is not None:
Copy link
Author

@aymeric-roucher aymeric-roucher Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@axiomofjoy As you can see I've commented out these 3 lines.
It is because currently their visualization in a platform like Arize Phoenix is unsatisfactory: having an error in one step shows the whole run as failing, when indeed I can have an error at one step but then the multi-step agent recovers in the next steps to successfully solve the task. Is there a way to display an error without upwards transmission of the error to the whole run?

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, can you give me a code snippet to reproduce this behavior?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the code in script test.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take a look and get back to you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be a bug in Phoenix. I have filed an issue here.

@aymeric-roucher
Copy link
Author

aymeric-roucher commented Jan 9, 2025

Nice tomeet you @axiomofjoy! 🤗 This is ready for review. Final things to do are:

  • how to not propagate errors upwards (see above)
  • remove the test.py that's here only for testing
  • make tests work

@axiomofjoy
Copy link
Contributor

Great to meet you @aymeric-roucher and thanks so much for this contribution! I'm digging into the PR now. Let me know how I can help you get it over the line, e.g., if you need help writing tests. Our team would also love to dogfood the instrumentation once it's in a good state!

@aymeric-roucher
Copy link
Author

@axiomofjoy sorry for the issue above, I had been working on a specific branch of smolagents to make it compatible: now installing from main should work!

@aymeric-roucher
Copy link
Author

aymeric-roucher commented Jan 9, 2025

@axiomofjoy one more problem that I just detected: detecting calls to Model.get_tool_call from within ToolCallingAgent does not work. I think this is because I actually use HfApiModel as my LLM: HfApiModel inherits from Model but since it overrides its parent's get_tool_call method, maybe the wrapper targeting Model.get_tool_call() does not work for HfApiModel.

Is there a way to create a wrapper that works for Subclass.get_tool_call for any subclass of Model?

Method __call__ works for standard LLM calls (LLM calls that are just "please generate some text", not "please generate a tool call") because it's defined in Model and not overridden. But it would seem like a hack to build such a wrapper method for Model.get_tool_call, so I'd prefer the universal wrapper described above.

image

@aymeric-roucher
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Jan 9, 2025
@aymeric-roucher
Copy link
Author

Also wondering how to make the nice LLM messages format with system/user shown in this screenshot:
image

…s/pyproject.toml

Co-authored-by: Xander Song <axiomofjoy@gmail.com>
@axiomofjoy
Copy link
Contributor

@axiomofjoy sorry for the issue above, I had been working on a specific branch of smolagents to make it compatible: now installing from main should work!

Got it, no worries! I just got it running from dev 😄

@axiomofjoy
Copy link
Contributor

Also wondering how to make the nice LLM messages format with system/user shown in this screenshot: image

You need to our LLM message semantic conventions. These can be a bit tricky due to constraints on OTel attribute value types. You can see an example here.

@axiomofjoy
Copy link
Contributor

@axiomofjoy one more problem that I just detected: detecting calls to Model.get_tool_call from within ToolCallingAgent does not work. I think this is because I actually use HfApiModel as my LLM: HfApiModel inherits from Model but since it overrides its parent's get_tool_call method, maybe the wrapper targeting Model.get_tool_call() does not work for HfApiModel.

Is there a way to create a wrapper that works for Subclass.get_tool_call for any subclass of Model?

Method __call__ works for standard LLM calls (LLM calls that are just "please generate some text", not "please generate a tool call") because it's defined in Model and not overridden. But it would seem like a hack to build such a wrapper method for Model.get_tool_call, so I'd prefer the universal wrapper described above.

image

Good catch! My first thought is that you might try instrumenting each subclass individually in addition to the base class. This can be accomplished by iterating over subclasses. We do something similar in DSPy here.

@axiomofjoy axiomofjoy changed the title Working instrumentation with smolagents feat: working instrumentation with smolagents Jan 9, 2025
@axiomofjoy
Copy link
Contributor

axiomofjoy commented Jan 10, 2025

@aymeric-roucher It's looking really promising so far! A few findings from my initial testing.

  1. A few span kinds are missing inputs and outputs (agent spans are missing input, LLM spans are missing both). We typically add these values for all OpenInference span kinds to show in the spans and traces table.
Screenshot 2025-01-09 at 4 27 56 PM
  1. Tool attributes are missing. These attributes are the tools conventions in this table.
Screenshot 2025-01-09 at 4 29 09 PM
  1. Retriever tools might best be instrumented as retriever span kind rather than a tool to give us rich UI for retrieved documents, scores, etc.

If you don't mind, I'll open a PR against your branch including some of the examples I used!

@aymeric-roucher
Copy link
Author

aymeric-roucher commented Jan 10, 2025

image

I've applied your suggestions @axiomofjoy and implemented a wrapper over all subclasses of Model, + done many other changes to mimick dspy implementation! Here's what my dashboard now looks like.
Addressing your points:

  1. Should be solved (cf screenshot)
  2. I think tool attributes are now input correctly. Is there any attribute you still see missing?
  3. We have no specific Retriever tool (or the community might build some but we can't know in advance how they'll work, they could return only a big string report instead of individual docs), so this does not apply here!

@axiomofjoy
Copy link
Contributor

Hey @aymeric-roucher, awesome progress! I'm excited to test it out.

I opened a PR to your fork that sets up CI and adds examples here. This enables the following commands:

  • tox run -e mypy-smolagents (run type checks)
  • tox run -e ruff-smolagents (run formatters and linters)
  • tox run -e test-smolagents (run tests, my PR nixes the CrewAI tests since they are out of date with our current patterns)
  • tox run -e smolagents (run everything)

You can read about how to get set up with tox here.

I'll start dogfooding your changes and add some tests in a subsequent PR if you don't mind!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
Status: 📘 Todo
Development

Successfully merging this pull request may close these issues.

[support new package] Hugging Face smolagents
2 participants