Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solr highlighting #4557

Closed
pdurbin opened this issue Mar 29, 2018 · 2 comments
Closed

Solr highlighting #4557

pdurbin opened this issue Mar 29, 2018 · 2 comments

Comments

@pdurbin
Copy link
Member

pdurbin commented Mar 29, 2018

In #4158 and pull request #4520 we are upgrading from Solr 4.6.0 to Solr 7.2.1 (the latest, as of this writing) and we're seeing some odd behavior in the Solr "highlighting" feature, which we use in Dataverse to show people which fields matched their query. For example, when searching for "brown bag" the results show "Filename Without Extension: Brown bag" in the search card:

screen shot 2018-03-29 at 2 22 02 pm

If you add show_relevance=true to the Search API, you can see the matches there as well:

screen shot 2018-03-29 at 2 26 12 pm

The example above is from Dataverse 4.8.4 running Solr 4.6.0.

As of a088d5e in the 4158-update-solr which uses Solr 4.7.1, we're seeing some unexpected highlighting behavior. I was wondering if highlighting in Solr 7 is broken or deprecated or completely different than Solr 4 so I took Dataverse out of the equation and use the "hello world" examples that ship with Solr to see if highlighting works or not. Highlighting seems to work just fine in both Solr 4.6.0 and Solr 4.7.1 when I use their stock config and examples. Here are my results:

Solr 4.6.0

cd solr-4.6.0/example
java -jar start.jar &
cd exampledocs
java -jar post.jar mp500.xml 

curl -s 'http://localhost:8983/solr/collection1/select?rows=1000000&wt=json&indent=true&hl=true&hl.fl=*&q=printer' | jq '.highlighting'

{
  "0579B002": {
    "features": [
      "Multifunction ink-jet color photo <em>printer</em>"
    ],
    "cat": [
      "<em>printer</em>"
    ],
    "name": [
      "Canon PIXMA MP500 All-In-One Photo <em>Printer</em>"
    ]
  }
}

Solr 7.2.1

cd solr-7.2.1
bin/solr -e techproducts

curl -s 'http://localhost:8983/solr/techproducts/select?rows=1000000&wt=json&indent=true&hl=true&hl.fl=*&q=video' | jq '.highlighting'

{
  "MA147LL/A": {
    "name": [
      "Apple 60 GB iPod with <em>Video</em> Playback Black"
    ],
    "features": [
      "Stores up to 15,000 songs, 25,000 photos, or 150 hours of <em>video</em>"
    ]
  },
  "EN7800GTX/2DHTV/256M": {
    "features": [
      "Dual DVI connectors, HDTV out, <em>video</em> input"
    ]
  },
  "100-435805": {
    "name": [
      "ATI Radeon X1900 XTX 512 MB PCIE <em>Video</em> Card"
    ]
  }
}

From here we need to decide on the next steps.

Do we care that highlighting isn't working as well from Dataverse in pull request #4520?

We we want highlighting to work as it has previously, what are the next steps? I would say we should figure out how our custom config differs from the Solr "techproducts" example above.

@pdurbin pdurbin mentioned this issue Mar 29, 2018
@kcondon
Copy link
Contributor

kcondon commented Apr 2, 2018

Some additional info:
highlighting does work for some fields in v7.2.1: author, contact name, keyword term, topic classification term, etc
does not work for others, subject, description, subtitle, alternative title, other id agency, etc
the above is taken from the citation block and is not a comprehensive list. Spot testing with other blocks shows the same pattern: all fields appear searchable with 7.2.1 but highlighting is broken for many but not all fields.

@pdurbin
Copy link
Member Author

pdurbin commented Jul 12, 2018

4.9 has been in production in Harvard Dataverse for a while and no one seems to be complaining about the search highlighting. A few other installations have upgraded as well. I'm closing this issue and will refer to it if someone reports this as a bug.

@pdurbin pdurbin closed this as completed Jul 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants