summary does not contain the property values #171

tloubrieu-jpl · 2022-07-27T22:06:46Z

🐛 Describe the bug

The list of properties is missing in the summary.

📜 To Reproduce

curl --request GET 'http://localhost:8080/collections/urn:nasa:pds:insight_rad:data_derived::7.0/products?summary-only=true'

Result:
{
"summary": {
"q": "",
"hits": 7,
"took": 25,
"start": 0,
"limit": 100,
"sort": [],
"properties": []
}
}

🕵️ Expected behavior

The properties used by the products of this collection should be listed.

📚 Version of Software Used

all

🩺 Test Data / Additional context

🏞Screenshots

🖥 System Info

OS: [e.g. iOS]
Browser [e.g. chrome, safari]
Version [e.g. 22]

🦄 Related requirements

⚙️ Engineering Details

tloubrieu-jpl · 2022-07-27T22:10:34Z

blocks NASA-PDS/search-api-notebook#6

al-niessner · 2022-07-28T17:42:52Z

@tloubrieu-jpl @jordanpadams

Filling in the properties is a problem. In order to correctly fill them in, the code would have to request all of the data from opensearch then collect only those that exist. It will make summary only inefficient in the sense that more data has to be exchanged between the Java code and opensearch.

An alternative is to return what the user asked for. In the case given, it would be a correct list since no fields were asked for only the defaults would be returned.

However, consider the case where the user literally asks for ?fields=foobar. Which options is the desired:

leave properties blank because verifying them is expensive [efficient and error free]
fill properties with what would be requested but not verify they are what is actually found (no foobar in any document but can still request it) [efficient but erroneous]
process every document despite none of them are to be returned to the user just to build a properties list [inefficient and error free]

jordanpadams · 2022-07-28T19:48:47Z

@al-niessner @tloubrieu-jpl can we use the _mapping api? https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-mapping.html

al-niessner · 2022-07-28T22:57:51Z

@jordanpadams @tloubrieu-jpl

Yes and no. I am tired of that answer as I am sure everyone else is.

As best I understand the documentation and limited testing. Yes, the mapping can be changed on the fly but nothing is re-indexed. You have to use the reindex API for that but it also comes with caveats. You cannot reindex into an existing index. Not really a problem if one had the foresight to know that things would change or used dynamic (first instance starts the indexing which is pretty clever). If one plans on index changes, then assign an alias and use it. For instance, registry would be the alias for r1 say. Then when we update the mapping we put it into r2 then reindex r1 into r2. When all that work is done, change the alias so that registry points at r2. I am sure that comes with caveats as well like no document (old db record) inserts during reindex or something along those lines.

If you like, I can look at adding an alias to the index and use it so that we can update and move around a smidge easier in the future but if you think this is the last change we will ever need to make to the index mapping then I can keep it as is.

tloubrieu-jpl · 2022-07-29T01:08:59Z

Hi @al-niessner @jordanpadams ,

What we had before
in a previous version of the code the properties were filled from the results of the current page, by building a set of unique properties as we read the products from the results.

When summary-only=true, we read the results anyway and sent the unique properties of the page.

This was inaccurate and costly, in addition this will not work if we replace summary-only with limit=0.

What we need
The content of the properties should be:

if fields is not set in the request, all the properties available for the request, beyond the pagination limit.
if fields is set, the intersection of the fields value and the properties available for the request, beyond the pagination limit.

This is a requirement to be able to get the list of properties before returning the results since otherwise the user will have no clue how to apply request criteria on the registry.

Other thoughts
Could we have the list of properties as a keyword attributes when we run harvest ? Opensearch would then build an index of the existing properties.

Note that eventually we also want faceting which is the ability to see statistics on the values of the properties. One level higher.

al-niessner · 2022-07-29T17:43:22Z

@tloubrieu-jpl @jordanpadams

I do not understand the requirements:

if fields is not set in the request, all the properties available for the request, beyond the pagination limit.

All the fields available to the request? That would mean walking every document to build that list since not all keywords are indexed and many documents have unique elements. If dynamic were true, we could just use the index. Since dynamic cannot be true, then must build it manually at ingestion. Separate index or stuff it in registry as yet another document with a special lidvid like 1.

Do they really want the default fields that the output would use anyway? Long list for json and short list for kvp+json?

What is the pagination limit for in the requirement? Is that when summary-only existed and find the unique fields in the first limit of matching documents?

if fields is set, the intersection of the fields value and the properties available for the request, beyond the pagination limit.

Why intersection? We return the union. If a user asks for field=foobar with json output then get all the default fields plus foobar not just foobar. Is the requirement wrong or our implementation wrong?

Independent of your answers to the requirements, it seems like you want efficient and accurate which means adding something new during ingestion. Is this correct?

jordanpadams · 2022-08-18T18:03:39Z

fixed by #166

tloubrieu-jpl added bug Something isn't working needs:triage s.high High severity labels Jul 27, 2022

tloubrieu-jpl assigned jordanpadams Jul 27, 2022

tloubrieu-jpl mentioned this issue Jul 27, 2022

Notebooks no longer work with V1.0 release NASA-PDS/search-api-notebook#6

Closed

tloubrieu-jpl mentioned this issue Jul 28, 2022

summary-only does not work as expected #167

Closed

jordanpadams added B13.0 sprint-backlog and removed needs:triage labels Aug 4, 2022

jordanpadams assigned al-niessner and unassigned jordanpadams Aug 4, 2022

jordanpadams closed this as completed Aug 18, 2022

jordanpadams removed the sprint-backlog label Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

summary does not contain the property values #171

summary does not contain the property values #171

tloubrieu-jpl commented Jul 27, 2022

tloubrieu-jpl commented Jul 27, 2022

al-niessner commented Jul 28, 2022

jordanpadams commented Jul 28, 2022

al-niessner commented Jul 28, 2022

tloubrieu-jpl commented Jul 29, 2022

al-niessner commented Jul 29, 2022

jordanpadams commented Aug 18, 2022

summary does not contain the property values #171

summary does not contain the property values #171

Comments

tloubrieu-jpl commented Jul 27, 2022

🐛 Describe the bug

📜 To Reproduce

🕵️ Expected behavior

📚 Version of Software Used

🩺 Test Data / Additional context

🏞Screenshots

🖥 System Info

🦄 Related requirements

⚙️ Engineering Details

tloubrieu-jpl commented Jul 27, 2022

al-niessner commented Jul 28, 2022

jordanpadams commented Jul 28, 2022

al-niessner commented Jul 28, 2022

tloubrieu-jpl commented Jul 29, 2022

al-niessner commented Jul 29, 2022

jordanpadams commented Aug 18, 2022