Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dataset quality] Added malformed docs column to table #172462

Merged
merged 19 commits into from
Dec 5, 2023
Merged

Conversation

yngrdyn
Copy link
Contributor

@yngrdyn yngrdyn commented Dec 4, 2023

Closes #170220.

Changes

Demo

Screen.Recording.2023-12-04.at.13.18.08.mov

How to test?

  1. Go to https://yngrdyn-deploy-kiban-pr172462.kb.us-west2.gcp.elastic-cloud.com/app/observability-log-explorer/dataset-quality
  2. Malformed docs column should be present and should be sortable

@yngrdyn yngrdyn requested review from a team as code owners December 4, 2023 12:18
@yngrdyn yngrdyn linked an issue Dec 4, 2023 that may be closed by this pull request
@botelastic botelastic bot added the Team:APM - DEPRECATED Use Team:obs-ux-infra_services. label Dec 4, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/apm-ui (Team:APM)

@apmmachine
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • /oblt-deploy : Deploy a Kibana instance using the Observability test environments.
  • /oblt-deploy-serverless : Deploy a serverless Kibana instance using the Observability test environments.
  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@yngrdyn yngrdyn added release_note:skip Skip the PR/issue when compiling release notes and removed Team:APM - DEPRECATED Use Team:obs-ux-infra_services. labels Dec 4, 2023
@botelastic botelastic bot added the Team:APM - DEPRECATED Use Team:obs-ux-infra_services. label Dec 4, 2023
@yngrdyn yngrdyn added the Team:obs-ux-logs Observability Logs User Experience Team label Dec 4, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-logs-team (Team:obs-ux-logs)

Copy link
Contributor

@cauemarcondes cauemarcondes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yngrdyn yngrdyn requested a review from a team as a code owner December 4, 2023 14:53
…e different from the number of malformed documents
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-management-team (Team:obs-ux-management)

.get<GetDataStreamsStatsResponse>(DATA_STREAMS_STATS_URL, {
query: params,
})
.catch((error) => {
throw new GetDataStreamsStatsError(`Failed to fetch data streams stats": ${error}`);
});

const { dataStreamsStats, integrations } = decodeOrThrow(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL that we have this helper function 👍🏼

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's awesome, from what I saw it was created in the shared repo by @weltenwort 🎉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad it's useful. It also formats the error in a more helpful way than the default formatter (IMO).


const datasetQualityESClient = createDatasetQualityESClient(esClient);

const response = await datasetQualityESClient.search({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for future: I think it would be a great idea to add some latency telemetry for this query.

Copy link
Contributor

@achyutjhunjhunwala achyutjhunjhunwala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍🏼

@yngrdyn
Copy link
Contributor Author

yngrdyn commented Dec 5, 2023

/oblt-deploy

@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
datasetQuality 36 65 +29

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
datasetQuality 3 4 +1

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
datasetQuality 13.9KB 34.5KB +20.5KB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@yngrdyn yngrdyn merged commit df0a21c into main Dec 5, 2023
@yngrdyn yngrdyn deleted the 170220-malformed-api branch December 5, 2023 13:33
@kibanamachine kibanamachine added v8.12.0 backport:skip This commit does not require backporting labels Dec 5, 2023
@felixbarny
Copy link
Member

Let's not call them "Malformed docs" as that would indicate that the reason for the fields being in _ignored is due to ignore_malformed.

Suggestions for alternatives:

  • Degraded documents
  • Documents with ignored fields

Also, note that since elastic/elasticsearch#101373 hasn't been merged yet, the exists query on the _ignored field is expected to be very expensive.

@yngrdyn
Copy link
Contributor Author

yngrdyn commented Dec 5, 2023

@felixbarny we have this little tooltip helper

image

would that help reducing the confusion there?

Suggestions for alternatives: 1. Degraded documents, 2. Documents with ignored fields

what do you think about the suggestions @mdbirnstiehl and @ruflin? what would be the most clear one?

Also, note that since elastic/elasticsearch#101373 hasn't been merged yet, the exists query on the _ignored field is expected to be very expensive.

Can we do something in the meantime? I expected the query to be expensive and that's why also @weltenwort suggested not to sort by that column by default, since that would put it in the critical path for loading the table.

@ruflin
Copy link
Contributor

ruflin commented Dec 5, 2023

  1. Degraded documents

That would be my preference as it always applies to the failure store. Document is degraded because some processing didn't happen.

expensive query

I would just ignore it for now and keep it expensive.

@yngrdyn
Copy link
Contributor Author

yngrdyn commented Dec 5, 2023

Renamed to Degraded documents in this small PR.

@felixbarny
Copy link
Member

That would be my preference as it always applies to the failure store.

Not sure I fully understand. We should call documents in the failure store "failed documents" and documents that are in the regular data stream but have an _ignored field "degraded documents".

@ruflin
Copy link
Contributor

ruflin commented Dec 5, 2023

What do you call it if you need to "munch" both into a single statistic? :-) Maybe the solution is just to have both. At the same time I would like to be able to have a "summary" of being able to tell the state of the dataset.

@felixbarny
Copy link
Member

What do you call it if you need to "munch" both into a single statistic?

I guess I'd call it health then.

@ruflin
Copy link
Contributor

ruflin commented Dec 6, 2023

Health unfortunately is already used for shards / replicas of data streams ...

@felixbarny
Copy link
Member

Ingestion health?

@weltenwort
Copy link
Member

How about "consistency", "integrity", or "quality"?

@yngrdyn yngrdyn removed Team:APM - DEPRECATED Use Team:obs-ux-infra_services. Team:obs-ux-management Observability Management User Experience Team labels Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:obs-ux-logs Observability Logs User Experience Team v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Dataset quality] Add malformed docs column