-
Notifications
You must be signed in to change notification settings - Fork 500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Solr #4158
Comments
As we work on this issue we should also consider the list of security concerns in https://help.hmdc.harvard.edu/Ticket/Display.html?id=253987
|
https://lucene.apache.org/solr/guide/6_6/upgrading-solr.html Users upgrading from older versions are strongly encouraged to consult CHANGES.txt for the details of all changes since the version they are upgrading from. Index Format Changes Note: the index format changes note was from just the latest version changes, I did not go through all changes.txt as indicated but this note seemed important and clear enough. |
It looks like 7.1 is not much more work, but this is the lowest bar to start with.
I have updated dataverse to make it compile with 6.4.2 / 7.1.0 (both work if switched in the pom). Next step is to get the new server up and running. With Solr 6+ the default is Solr uses a managed schema by default instead of a user-editable xml file. It can be switched back to the xml file with minimal changes https://stackoverflow.com/questions/37324603/ . |
I am unsure if this is actually the correct approach but am committing it as this story is getting backlogged.
The code as is still has errors when actually trying to write to solr, but I have worked through a number of them. Note: The branch above is using 7.1.0, I see that as the best path forward as 6/7 seem similar in terms of dev work. |
When we upgrade solr, will we be upgrading our existing indexes (as noted by Kevin in #4158 (comment))? Or will we just reindex from scratch? If we are looking to upgrade our indexes, what is expected related to that in this story? |
Issues found:
X -Instructions on starting solr fails when run as root, needs -force:
X -Same with creating core:
X -Same with init script:
So, maybe need to call out the recommendation on not to run it as root and how? -Do we need to mention anything about upgrading Solr? Such as new init script (command line args), not run as root, shut down service (obvious I know) |
I took a quick look at these questions. I think they're all out of scope in the sense that we probably won't address any of them when using Solr in production at Harvard Dataverse. Firewalling off Solr is enough for us. That said, it looks like the Solr project has documentation on each of the topics above:
People who are interested in these topics should read the documentation above. |
"Match" was showing up because the bundle key didn't exist. We are switching away from the deprecated "JsfHelper.localize" method.
Back in 60e640b when I was playing with spelling suggestions from Solr I changed the request handler from "/select" (the default) to "/spell". We didn't have time to fully explore the spelling suggestions feature of Solr during the 4.0 rewrite and the "/spell" request handler seems to be leading to other bugs, such as not being able to search on the "identifier" portion of a DOI (i.e. "JNIUOA") from basic search. In short we are switching to the default request handler for Solr, something I would have done before tagging 4.0 if I had realized I had left the "/spell" request handler in there.
At standup I mentioned that a basic search of the identifier was working from Solr directly but not from Dataverse. The difference is that my curl command against Solr was using the default |
I checked in with @kcondon this morning and he mentioned that highlighting of search fields is not working as consistently as on the develop branch. I tested this as of a088d5e on the Solr branch and he's right. The good news is that the object can still be found with a search so it sounds like we might merge the pull request without a fix (we'd open a new issue and fix it later) but I'm assigning myself to this issue to at least poke around a bit and characterize the bug better. Perhaps I'll write some automated tests that exercise the bug. |
I'm on the same commit (a088d5e) and this highlighting bug is really strange. It seems to be based on the data entered. For example, when filling in "otherIdAgency" if I use the value "agency1", the highlighting works. Notice "otherIdAgency" under "highlighting" at the bottom of this JSON output:
But if I fill in "otherIdAgency" with the value "agency", no highlight appears:
|
After standup this morning @djbrooke @scolapasta @kcondon and I talked about the highlighting issues we're seeing. I just did some initial investigation to give me confidence that highlighting seems to work just fine in Solr 7.2.1 when you use their example config and data. I just opened #4557 to post these results and so that we have a issue to estimate in the future. We decided that the highlighting issues are not a show stopper for merging pull request #4520. |
fwiw I plan to take another look at our solr config tomorrow to see if something related to the highlighting shows itself, but I'd be surprised if I find anything. |
I tried diffing the solr configs, but due to the porting of our customization between the two versions of the stock config this approach did not reveal any info with our minor bugs |
I just merged the latest from develop into the pull request: 5f67e56 |
We should upgrade Solr to at least a non-EOL version (<6.4).
It's important to be on a version where we can get support/patches if needed.
The text was updated successfully, but these errors were encountered: