Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike 5000 queries round deux #1039

Merged
merged 132 commits into from
Jan 17, 2025
Merged

Spike 5000 queries round deux #1039

merged 132 commits into from
Jan 17, 2025

Conversation

epugh
Copy link
Member

@epugh epugh commented Jun 24, 2024

Description

Demonstrate running 5000 queries in a background process, storing the results in a Snapshot.

Ideally, we run the 5000 queries, building the snapshot. We then make the front end interact as if it was a CSV Upload of data. If that works, great. If it still doesn't scale, then we have a seperate UI. And we figure out how to run not just A scorer, but lots of scorers, and store them..

  • Think about moving response_body to it's own table, it's really not part of the snapshot_queries and it appears to bloat the table a lot..
  • Think about writing some code to convert the raw data from Solr into what Quepid expects. Snapshot_queries, snapshot_docs, and some scores?
  • Add to snapshot picker a sense of date of the snapshot.
  • Convert Solr response in snapshot_queries into snapshot_docs.
  • Add logic to calculate a metric using a scorer. Can we write a scorer in Ruby?
  • Read in scorer from database table, not file!
  • calculate a global score !
  • Add a dashboard for all the case scorers?
  • Experiment with loading a Snapshot in place of the actual running of the case.
  • Need to add the number_of_rows limit to reqeusting docs
  • Need to add the fl parameter to limit the fields being returned to what is defined in the case.
  • Tested with scorers that need access to the fields of a doc
  • Need to be able to manage case scores, like deleting them if you have an odd ball one.
  • Follow pattern in book and case of having a column to track the job. So either add it to case.nightly_run_job or add it to the snapshot....
  • Always show the most recent snapshot. If there are no snapshots then prompt to run the queries with a big ass button in the middle.
  • Only keep most recent run;'s web_requests... otherwise too much data.

Down the road, when we want multiple scorers:

  • establish a single north star metric.. it shows up on the home page.
  • establish a relationship between a case_scores and scorers, right now when you change scorers, we don't track that.
  • Filter the graphs to only show scores that match the scorer being used.

Motivation and Context

We need moar data! This replaces an earlier attempt, #976 that kept the AngularJS app as the runner for Javascript. It just didn't work... Headless app wasn't the right way to go.

describe 'recency scorer' do
let(:the_case) { cases(:case_without_score) }

test 'runs simple and tests eachDoc w/ a function' do
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to get a bit more detail on how to pick a baseDate.. and if the baseDate could be "today" instead? @david-fisher

Copy link
Contributor

@david-fisher david-fisher Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base date of today is just fine. The one in the code was the date of the frozen in time index. You can use:

baseDate = Date.now()

@epugh epugh temporarily deployed to quepid-pr-1039 January 2, 2025 16:13 Inactive
@epugh epugh temporarily deployed to quepid-pr-1039 January 5, 2025 16:33 Inactive
@epugh epugh temporarily deployed to quepid-pr-1039 January 6, 2025 23:35 Inactive
@epugh epugh temporarily deployed to quepid-pr-1039 January 7, 2025 14:44 Inactive
@epugh
Copy link
Member Author

epugh commented Jan 16, 2025

Maybe done?

@epugh epugh merged commit 3bfa92d into main Jan 17, 2025
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants