Skip to content

Commit

Permalink
Merge pull request #33 from wesen/task/add-embeddings
Browse files Browse the repository at this point in the history
Add embeddings support and profile settings layer
  • Loading branch information
wesen authored Feb 15, 2025
2 parents 3cc7a3e + 3801440 commit 8c9851e
Show file tree
Hide file tree
Showing 18 changed files with 569 additions and 42 deletions.
65 changes: 63 additions & 2 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [

{
"name": "Launch Package",
"type": "go",
Expand All @@ -18,7 +17,69 @@
"request": "launch",
"mode": "auto",
"program": "${workspaceFolder}/cmd/escuse-me",
"args": ["mento", "index-stats", "--output", "yaml"],
"args": [
"mento",
"index-stats",
"--output",
"yaml"
],
"envFile": "${workspaceFolder}/.envrc"
},
{
"name": "Search Summaries Embeddings",
"type": "go",
"request": "launch",
"mode": "auto",
"program": "${workspaceFolder}/cmd/escuse-me",
"args": [
"examples",
"search-summaries-embeddings",
"--query",
"test",
"--print-query"
],
"envFile": "${workspaceFolder}/.envrc"
},
{
"name": "Search Summaries Embeddings (Alt)",
"type": "go",
"request": "launch",
"mode": "auto",
"program": "${workspaceFolder}/cmd/escuse-me",
"args": [
"examples",
"search-summaries-embeddings",
"--query",
"test",
"--print-query"
],
"envFile": "${workspaceFolder}/.envrc"
},
{
"name": "Process test-data/concat.yml",
"type": "go",
"request": "launch",
"mode": "auto",
"program": "${workspaceFolder}/../go-emrichen/cmd/emrichen",
"cwd": "${workspaceFolder}/../go-emrichen",
"args": [
"process",
"test-data/defaults-var-format.yml"
]
},
{
"name": "Search Summaries Embeddings (Alt)",
"type": "go",
"request": "launch",
"mode": "auto",
"program": "${workspaceFolder}/cmd/escuse-me",
"args": [
"examples",
"search-summaries-embeddings",
"--query",
"test",
"--print-query"
],
"envFile": "${workspaceFolder}/.envrc"
}
]
Expand Down
67 changes: 67 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
## Enhanced Error Handling with Raw Results

Added support for printing raw error responses when the --raw-results flag is enabled. This helps with debugging by showing the complete error response from Elasticsearch.

- Print complete error response to stderr when raw-results is enabled
- Print error reason and root cause to stderr when raw-results is disabled

# Refactor Embeddings Settings Factory

Simplified the embeddings settings factory to use a minimal configuration struct instead of depending on the full StepSettings. Added backwards compatibility method.

- Created new EmbeddingsConfig struct for minimal configuration
- Modified SettingsFactory to use EmbeddingsConfig instead of StepSettings
- Added NewSettingsFactoryFromStepSettings for backwards compatibility

# Fix Embeddings Settings Type Handling

Fixed type handling in embeddings settings to properly handle pointer types in StepSettings and non-pointer types in EmbeddingsConfig.

- Updated CreateEmbeddingsConfig to properly dereference pointer types
- Modified NewProvider to handle non-pointer types in EmbeddingsConfig
- Fixed error checks to use empty string checks instead of nil checks

# Add Provider Options for Embeddings Factory

Added functional options pattern to the embeddings provider factory for more flexible configuration.

- Added WithType, WithEngine, WithBaseURL, WithAPIKey, and WithDimensions option functions
- Modified NewProvider to accept variadic options
- Improved configuration handling with options overriding defaults

# Add Custom Tags Documentation

Added comprehensive documentation for implementing custom tags in go-emrichen, including:
- Basic tag implementation patterns
- Argument handling and validation
- Environment interaction
- Node processing utilities
- Testing guidelines and best practices
- Conceptual explanations and rationale for design patterns
- Detailed best practices and common patterns
- In-depth discussion of error handling and type safety

# Fix Custom Tags Documentation Signature

Updated custom tags documentation to reflect correct function signature:
- Changed tag handler signature to include interpreter parameter
- Clarified pure function nature of tag handlers
- Updated all code examples to use correct signature
- Added explanation of interpreter parameter usage

# Enhance Custom Tags Documentation with ParseArgs Guidelines

Added detailed documentation about argument handling and recursive processing:
- Comprehensive guide to using ParseArgs
- Core principles for implementing tag handlers
- Examples of proper argument validation and processing
- Guidelines for recursive processing of nested structures
- Detailed error handling patterns for arguments

# Update Custom Tags Documentation with Proper Namespace

Updated custom tags documentation to use proper import paths and namespaces:
- Added proper import statements for all examples
- Updated all type references to use emrichen namespace
- Fixed function signatures to use emrichen.Interpreter
- Updated utility function calls to use emrichen namespace
11 changes: 6 additions & 5 deletions cmd/escuse-me/cmds/serve.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@ package cmds

import (
"context"
"os"
"os/signal"
"path/filepath"

es_cmds "github.com/go-go-golems/escuse-me/pkg/cmds"
es_layers "github.com/go-go-golems/escuse-me/pkg/cmds/layers"
"github.com/go-go-golems/glazed/pkg/cmds"
Expand All @@ -17,9 +21,6 @@ import (
"github.com/go-go-golems/parka/pkg/server"
"github.com/pkg/errors"
"golang.org/x/sync/errgroup"
"os"
"os/signal"
"path/filepath"
)

type ServeCommand struct {
Expand Down Expand Up @@ -147,7 +148,7 @@ func (s *ServeCommand) runWithConfigFile(
commandDirHandlerOptions,
command_dir.WithGenericCommandHandlerOptions(
generic_command.WithParameterFilterOptions(
config.WithLayerDefaults(
config.WithMergeOverrideLayer(
esConnectionLayer.Layer.GetSlug(),
esConnectionLayer.Parameters.ToMap(),
),
Expand Down Expand Up @@ -254,7 +255,7 @@ func (s *ServeCommand) Run(
command_dir.WithGenericCommandHandlerOptions(
generic_command.WithTemplateLookup(datatables.NewDataTablesLookupTemplate()),
generic_command.WithParameterFilterOptions(
config.WithLayerDefaults(
config.WithMergeOverrideLayer(
esClientLayer.Layer.GetSlug(),
esClientLayer.Parameters.ToMap(),
),
Expand Down
8 changes: 4 additions & 4 deletions cmd/escuse-me/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ import (
"github.com/go-go-golems/escuse-me/cmd/escuse-me/cmds/indices"
es_cmds "github.com/go-go-golems/escuse-me/pkg/cmds"
"github.com/go-go-golems/escuse-me/pkg/cmds/layers"
"github.com/go-go-golems/escuse-me/pkg/doc"
"github.com/go-go-golems/glazed/pkg/cli"
glazed_cmds "github.com/go-go-golems/glazed/pkg/cmds"
"github.com/go-go-golems/glazed/pkg/cmds/alias"
Expand Down Expand Up @@ -113,15 +114,12 @@ var runCommandCmd = &cobra.Command{
},
}

//go:embed doc/*
var docFS embed.FS

//go:embed queries/*
var queriesFS embed.FS

func initRootCmd() (*help.HelpSystem, error) {
helpSystem := help.NewHelpSystem()
err := helpSystem.LoadSectionsFromFS(docFS, ".")
err := doc.AddDocToHelpSystem(helpSystem)
cobra.CheckErr(err)

helpSystem.SetupCobraRootCommand(rootCmd)
Expand Down Expand Up @@ -189,6 +187,8 @@ func initAllCommands(helpSystem *help.HelpSystem) error {
repositories_,
cli.WithCobraMiddlewaresFunc(es_cmds.GetCobraCommandEscuseMeMiddlewares),
cli.WithCobraShortHelpLayers(glazed_layers.DefaultSlug, layers.EsConnectionSlug, layers.ESHelpersSlug),
cli.WithProfileSettingsLayer(),
cli.WithCreateCommandSettingsLayer(),
)
if err != nil {
return err
Expand Down
1 change: 0 additions & 1 deletion cmd/escuse-me/queries/doc/README

This file was deleted.

36 changes: 36 additions & 0 deletions cmd/escuse-me/queries/examples/search-insights.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
name: "search-insights"
short: "Search through summaries using embeddings and keywords"
long: "Search through the summaries using both semantic similarity via embeddings and keyword/fuzzy matching for better results"

flags:
- name: query
type: string
help: "Search text to find similar content"
required: true
- name: k
type: int
help: "Number of results to return"
default: 5

default-index: local-testing-multi-document-summarization

query:
_source: ["content", "title", "url"]
query:
bool:
should:
- knn:
field: content_vector
query_vector: !Embeddings
text: !Var query
config:
type: "openai"
engine: "text-embedding-3-small"
dimensions: 1536
k: !Var k
num_candidates: 100
- match:
content:
query: !Var query
boost: 4
fuzziness: "AUTO"
29 changes: 29 additions & 0 deletions cmd/escuse-me/queries/examples/search-summaries-embeddings.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: "search-summaries-embeddings"
short: "Search through summaries using embeddings"
long: "Search through the summaries using semantic similarity via embeddings"

flags:
- name: query
type: string
help: "Search text to find similar content"
required: true
- name: k
type: int
help: "Number of results to return"
default: 5

default-index: local-testing-multi-document-summarization

query:
_source: ["content", "title", "url"]
query:
knn:
field: content_vector
query_vector: !Embeddings
text: !Var query
config:
type: "openai"
engine: "text-embedding-3-small"
dimensions: 1536
k: !Var k
num_candidates: 100
15 changes: 11 additions & 4 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,11 @@ toolchain go1.23.3

require (
github.com/elastic/go-elasticsearch/v8 v8.17.0
github.com/go-go-golems/clay v0.1.20
github.com/go-go-golems/glazed v0.5.24
github.com/go-go-golems/go-emrichen v0.0.3
github.com/go-go-golems/parka v0.5.15
github.com/go-go-golems/clay v0.1.27
github.com/go-go-golems/geppetto v0.4.34
github.com/go-go-golems/glazed v0.5.29
github.com/go-go-golems/go-emrichen v0.0.4
github.com/go-go-golems/parka v0.5.18
github.com/pkg/errors v0.9.1
github.com/rs/zerolog v1.33.0
github.com/spf13/cobra v1.8.1
Expand All @@ -24,6 +25,7 @@ require (
github.com/Masterminds/goutils v1.1.1 // indirect
github.com/Masterminds/semver v1.5.0 // indirect
github.com/Masterminds/sprig v2.22.0+incompatible // indirect
github.com/ThreeDotsLabs/watermill v1.3.7 // indirect
github.com/adrg/frontmatter v0.2.0 // indirect
github.com/alecthomas/chroma/v2 v2.14.0 // indirect
github.com/araddon/dateparse v0.0.0-20210429162001-6b43995a97de // indirect
Expand All @@ -47,6 +49,7 @@ require (
github.com/bahlo/generic-list-go v0.2.0 // indirect
github.com/bmatcuk/doublestar/v4 v4.6.1 // indirect
github.com/buger/jsonparser v1.1.1 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/charmbracelet/glamour v0.7.0 // indirect
github.com/dlclark/regexp2 v1.11.4 // indirect
github.com/elastic/elastic-transport-go/v8 v8.6.0 // indirect
Expand All @@ -59,9 +62,11 @@ require (
github.com/google/uuid v1.6.0 // indirect
github.com/gorilla/css v1.0.1 // indirect
github.com/hashicorp/hcl v1.0.0 // indirect
github.com/huandu/go-clone v1.7.2 // indirect
github.com/huandu/xstrings v1.5.0 // indirect
github.com/imdario/mergo v0.3.16 // indirect
github.com/inconshreveable/mousetrap v1.1.0 // indirect
github.com/invopop/jsonschema v0.12.0 // indirect
github.com/itchyny/gojq v0.12.12 // indirect
github.com/itchyny/timefmt-go v0.1.5 // indirect
github.com/jedib0t/go-pretty v4.3.0+incompatible // indirect
Expand All @@ -70,6 +75,7 @@ require (
github.com/kucherenkovova/safegroup v1.0.2 // indirect
github.com/labstack/echo/v4 v4.12.0 // indirect
github.com/labstack/gommon v0.4.2 // indirect
github.com/lithammer/shortuuid/v3 v3.0.7 // indirect
github.com/lucasb-eyer/go-colorful v1.2.0 // indirect
github.com/magiconair/properties v1.8.7 // indirect
github.com/mailru/easyjson v0.7.7 // indirect
Expand All @@ -91,6 +97,7 @@ require (
github.com/rivo/uniseg v0.4.7 // indirect
github.com/sagikazarmark/locafero v0.4.0 // indirect
github.com/sagikazarmark/slog-shim v0.1.0 // indirect
github.com/sashabaranov/go-openai v1.36.0 // indirect
github.com/sourcegraph/conc v0.3.0 // indirect
github.com/spf13/afero v1.11.0 // indirect
github.com/spf13/cast v1.7.0 // indirect
Expand Down
Loading

0 comments on commit 8c9851e

Please sign in to comment.