Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update ml system related UI #12334

Open
wants to merge 47 commits into
base: master
Choose a base branch
from

Conversation

yoonhyejin
Copy link
Collaborator

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the product PR or Issue related to the DataHub UI/UX label Jan 14, 2025
Copy link

codecov bot commented Jan 14, 2025

Codecov Report

Attention: Patch coverage is 42.73504% with 335 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...p/entity/mlModelGroup/profile/ModelGroupModels.tsx 39.60% 122 Missing ⚠️
...essInstance/profile/DataProcessInstanceSummary.tsx 37.86% 64 Missing ⚠️
.../src/app/entity/mlModel/profile/MLModelSummary.tsx 37.89% 59 Missing ⚠️
...atahub-web-react/src/app/shared/time/timeUtils.tsx 11.11% 32 Missing ⚠️
...b-web-react/src/app/preview/DefaultPreviewCard.tsx 69.33% 23 Missing ⚠️
.../dataProcessInstance/DataProcessInstanceEntity.tsx 32.00% 17 Missing ⚠️
...app/entity/dataProcessInstance/preview/Preview.tsx 0.00% 10 Missing ⚠️
...b/api/entities/dataprocess/dataprocess_instance.py 81.81% 6 Missing ⚠️
datahub-web-react/src/app/entity/EntityPage.tsx 0.00% 1 Missing ⚠️
...web-react/src/app/entity/mlModel/MLModelEntity.tsx 0.00% 1 Missing ⚠️

❌ Your patch check has failed because the patch coverage (81.81%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Files with missing lines Coverage Δ
...tahub-web-react/src/app/entity/shared/constants.ts 94.95% <100.00%> (ø)
datahub-web-react/src/app/entity/EntityPage.tsx 29.03% <0.00%> (ø)
...web-react/src/app/entity/mlModel/MLModelEntity.tsx 40.11% <0.00%> (ø)
...b/api/entities/dataprocess/dataprocess_instance.py 72.14% <81.81%> (ø)
...app/entity/dataProcessInstance/preview/Preview.tsx 19.41% <0.00%> (ø)
.../dataProcessInstance/DataProcessInstanceEntity.tsx 31.38% <32.00%> (ø)
...b-web-react/src/app/preview/DefaultPreviewCard.tsx 88.14% <69.33%> (ø)
...atahub-web-react/src/app/shared/time/timeUtils.tsx 41.39% <11.11%> (ø)
.../src/app/entity/mlModel/profile/MLModelSummary.tsx 32.85% <37.89%> (ø)
...essInstance/profile/DataProcessInstanceSummary.tsx 37.86% <37.86%> (ø)
... and 1 more

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7ac0dc6...946854c. Read the comment docs.

<Typography.Title level={3}>Model Details</Typography.Title>
<InfoItemContainer justifyContent="left">
{/* TODO: should use versionProperties? */}
<InfoItem title="Version">
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RyanHolstien should we use the versionProperties.version for this as well? are we going to deprecate the properties.version?

"""
Represents lineage information for ML entities.
"""
type MLModelLineageInfo {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know if this is valid approach @shirshanka

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks right, just needs to have the backend mapping on the resolvers.

};

const renderTrainingJobs = () => {
const lineageTrainingJobs = model?.properties?.mlModelLineageInfo?.trainingJobs || [];
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know if this is a valid approach @shirshanka

would there be any difference between pulling properties > trainingjobs or pulling relationships > trained by?

</InfoItem>
<InfoItem title="Aliases">
<InfoItemContent>
{/* use versionProperties for aliases */}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commenting this out until this get merged : #12166



@dataclass
class Container:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at first glance, it took me some time to associate container to experiment. does it make sense to change class name to Experiement? unless there is reason to keep generic term "container"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

// status,
// startTime,
{
duration,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asikowitz we're adding some of the properties that are specific to data process instance in preview -- let me know if this makes sense

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in v2 UI we have a different method of passing values down that I prefer. That method isn't supported in v1 so we can just do this here, but we'll have to coordinate with Chris on getting this to v2.

@@ -270,7 +312,9 @@ export default function DefaultPreviewCard({
event.stopPropagation();
};

const shouldShowRightColumn = (topUsers && topUsers.length > 0) || (owners && owners.length > 0);
const statusPillColor = status === 'SUCCESS' ? 'green' : 'red';
const shouldShowRightColumn =
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asikowitz added additional condition to shouldShowRightColumn -- lmk if this makes sense

@@ -317,7 +317,7 @@ def test_lineage_backend(mock_emit, inlets, outlets, capture_executions):
assert all(map(lambda let: isinstance(let, Dataset), op2.outlets))

# Check that the right things were emitted.
assert mock_emitter.emit.call_count == 19 if capture_executions else 11
assert mock_emitter.emit.call_count == 20 if capture_executions else 11
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asikowitz we're modifyng this test since we have dataplatforminstance in the mock calls

@@ -619,206 +619,178 @@ def test_emit_flow(
mock_emitter.method_calls[11][1][0].entityUrn
== "urn:li:dataProcessInstance:56231547bcc2781e0c14182ceab6c9ac"
)
assert mock_emitter.method_calls[12][1][0].aspectName == "dataPlatformInstance"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asikowitz same here -- assertion has been pushed back by 1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is kinda awkward to change. Maybe we could just check if the method calls has certain entries, rather than requiring a specific index to be a certain call

dataProduct={getDataProduct(genericProperties?.dataProduct)}
externalUrl={data.properties?.externalUrl}
parentContainers={data.parentContainers}
parentEntities={parentEntities}
parentEntities={parentEntities as unknown as PreviewEntity[]}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asikowitz this was my attempt to fix the yarn type-check -- let me know if this makes sense

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed once you change the type. You can add:

import { Entity as GraphQLEntity } from '@types';
```
and use that

@yoonhyejin yoonhyejin marked this pull request as ready for review January 24, 2025 23:33
@yoonhyejin yoonhyejin changed the title [WIP] feat: update mlflow UI feat: update ml system related UI Jan 24, 2025
@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Jan 24, 2025
Copy link
Collaborator

@asikowitz asikowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Hyejin, great work on getting this all working! This is a big PR so I left a lot of comments; it'd probably be good to break this up into a few PRs, e.g. ingestion, backend, and frontend, to be able to go through a smaller set of comments. Most comments are just about style, because we have a lot of ways of doing things, and some are more outdated / no longer recommended. Happy to talk through anything if you have any questions, and feel free to disagree with any of the comments if you have a different opinion.

@@ -380,6 +419,39 @@ export default function DefaultPreviewCard({
</LeftColumn>
{shouldShowRightColumn && (
<RightColumn key="right-column">
{startTime && (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this new code is only for data process, maybe we can put it in a separate function and describe it as such

@@ -206,3 +206,39 @@ export function getTimeRangeDescription(startDate: moment.Moment | null, endDate

return 'Unknown time range';
}

export function formatDuration(durationMs: number): string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't exactly line up, but you could also use moment.js for this: https://momentjs.com/docs/#/durations/humanize/

const hours = Math.floor(durationMs / 1000 / 3600);

if (hours === 0 && minutes === 0) {
return `${seconds}secs`;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space in between?

@@ -88,7 +88,8 @@ private SearchUtils() {}
EntityType.DATA_PRODUCT,
EntityType.NOTEBOOK,
EntityType.BUSINESS_ATTRIBUTE,
EntityType.SCHEMA_FIELD);
EntityType.SCHEMA_FIELD,
EntityType.DATA_PROCESS_INSTANCE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want these to appear in search?

Comment on lines +150 to +152
com.linkedin.datahub.graphql.generated.AuditStamp created =
AuditStampMapper.map(context, dataProcessInstanceProperties.getCreated());
properties.setCreated(created);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can revert

@@ -619,206 +619,178 @@ def test_emit_flow(
mock_emitter.method_calls[11][1][0].entityUrn
== "urn:li:dataProcessInstance:56231547bcc2781e0c14182ceab6c9ac"
)
assert mock_emitter.method_calls[12][1][0].aspectName == "dataPlatformInstance"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is kinda awkward to change. Maybe we could just check if the method calls has certain entries, rather than requiring a specific index to be a certain call



@dataclass
class Container:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@@ -71,6 +77,10 @@ class DataProcessInstance:
_template_object: Optional[Union[DataJob, DataFlow]] = field(
init=False, default=None, repr=False
)
data_platform: Optional[str] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally we call this just platform. Also, perhaps confusing that we allow specifying both orchestrator and data_platform. I think we should only allow one, or we should have validation that only lets you set one.

Comment on lines +105 to +106
if self.data_plaform_instance is None and self.cluster is not None:
self.data_plaform_instance = self.cluster
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think cluster and data_platform_instance are the same concept

Comment on lines +407 to +408
:param clone_inlets: (bool) whether to clone datajob's inlets
:param clone_outlets: (bool) whether to clone datajob's outlets
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want these params in this method? Feels like the data process instance doesn't have much meaning without these

@datahub-cyborg datahub-cyborg bot added pending-submitter-response Issue/request has been reviewed but requires a response from the submitter and removed needs-review Label for PRs that need review from a maintainer. labels Jan 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending-submitter-response Issue/request has been reviewed but requires a response from the submitter product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants