Adding types for evals in Langfuse #4380
Replies: 2 comments 2 replies
-
I'm tagging @marcklingen as I know you've probably got an opinion on this. |
Beta Was this translation helpful? Give feedback.
-
Has any consideration been given to instruction following frameworks such as IFEval or Multi-IF which are focused on evaluating a LLM’s ability to follow “verifiable instructions" - to provide objective verification of compliance. Examples of such instructions can be “the response should be in three paragraphs”, “the response should be in more than 300 words”, etc. IFEvaldataset: https://huggingface.co/datasets/google/IFEval Multi-IFdataset: https://huggingface.co/datasets/facebook/Multi-IF |
Beta Was this translation helpful? Give feedback.
-
Describe the feature or potential improvement
Hi everyone,
I’ve been thinking about how we manage evaluations (evals) in Langfuse, especially when it comes to maintaining compatibility across structural changes in metadata fields. Currently, whenever we make a structural update to a type, we often need to manually update every eval that depends on that field. This can be time-consuming and error-prone, especially as the number of evals grows.
Proposal: Mapping Layer with Typed Objects for Evals
Instead of directly binding evals to raw metadata fields, I propose introducing a mapping layer with types. Here’s the idea in detail:
Introduce a Typed Object Layer:
Create strongly-typed objects that represent the metadata structure we expect to work with in evals. These objects act as a translation layer, parsing the metadata and exposing only the fields that are relevant for evals.
Dynamic Parsing and Mapping:
Each type would define how to parse the raw metadata into its structured form. This ensures that even if we change the underlying structure of a metadata field, we only need to update the type definition, not every eval that uses it.
Eval Integration:
Evals can then bind to these typed objects rather than directly to raw metadata fields. This adds a layer of abstraction, decoupling eval logic from raw metadata structure.
Example Use Case:
Let’s say we have a metadata field user_info that currently includes user_id and email. If we later restructure this field to add profile_id or group related attributes under a nested object, we’d only need to update the parsing logic in the corresponding type. The evals using this type would automatically adapt to the changes.
Benefits
Decoupling and Maintainability:
Evals remain insulated from metadata structure changes, reducing the risk of breaking functionality.
Reusability:
Typed objects can be reused across multiple evals, encouraging consistency and reducing duplication.
Simplified Updates:
Updates to metadata structures become localized to the type definitions.
Challenges and Considerations
Implementation Complexity:
Introducing a mapping layer adds some upfront development effort, but I believe the long-term maintainability gains outweigh this cost.
Performance Impact:
Parsing metadata dynamically might have a slight performance overhead, but this can be mitigated with efficient parsing mechanisms.
Adoption and Transition:
Existing evals would need to be refactored to use the new typed layer. This could be phased in to minimize disruption.
Next Steps
Gather feedback on the idea from the community.
Prototype a simple version of the typed object layer to evaluate feasibility.
Identify common metadata patterns that would benefit from this abstraction.
I’d love to hear your thoughts on this approach. Do you see this improving our eval system’s maintainability and scalability? Are there potential challenges I might have overlooked?
Looking forward to the discussion!
Additional information
No response
Beta Was this translation helpful? Give feedback.
All reactions