Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

acryl-datahub[mongodb] > Fail processing collection with Double fields #9081

Closed
jGuzmanSan opened this issue Oct 24, 2023 · 1 comment · Fixed by #9145
Closed

acryl-datahub[mongodb] > Fail processing collection with Double fields #9081

jGuzmanSan opened this issue Oct 24, 2023 · 1 comment · Fixed by #9145
Labels
bug Bug report

Comments

@jGuzmanSan
Copy link

jGuzmanSan commented Oct 24, 2023

Describe the bug
Emitting metadata from MongoDB to DataHub, the code fails when process a collection that uses a Double field.

The ingestion fails and stops abruptly, reporting the error

[2023-08-01 14:26:42,034] ERROR    {datahub.entrypoints:199} - Command failed: argument of type 'float' is not iterable
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/datahub/entrypoints.py", line 186, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 448, in wrapper
    raise e
  File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 397, in wrapper
    res = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
    return func(ctx, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 195, in run
    ret = loop.run_until_complete(run_ingestion_and_check_upgrade())
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 179, in run_ingestion_and_check_upgrade
    ret = await ingestion_future
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 137, in run_pipeline_to_completion
    raise e
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 129, in run_pipeline_to_completion
    pipeline.run()
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 367, in run
    for wu in itertools.islice(
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 143, in auto_workunit_reporter
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 208, in auto_browse_path_v2
    for urn, batch in _batch_workunits_by_urn(stream):
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 346, in _batch_workunits_by_urn
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 156, in auto_materialize_referenced_tags
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 70, in auto_status_aspect
    for wu in stream:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/mongodb.py", line 338, in get_workunits_internal
    collection_schema = construct_schema_pymongo(
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/mongodb.py", line 196, in construct_schema_pymongo
    return construct_schema(list(documents), delimiter)
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/schema_inference/object.py", line 162, in construct_schema
    "nullable": is_nullable_collection(collection, field_path),
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/schema_inference/object.py", line 80, in is_nullable_collection
    return any(is_field_nullable(doc, field_path) for doc in collection)
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/schema_inference/object.py", line 80, in <genexpr>
    return any(is_field_nullable(doc, field_path) for doc in collection)
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/schema_inference/object.py", line 57, in is_field_nullable
    return any(is_field_nullable(x, remaining_fields) for x in doc[field])
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/schema_inference/object.py", line 57, in <genexpr>
    return any(is_field_nullable(x, remaining_fields) for x in doc[field])
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/schema_inference/object.py", line 37, in is_field_nullable
    if field in doc:
TypeError: argument of type 'float' is not iterable

To Reproduce
Ingest the content of a collection with double types.

Expected behavior
The collection content is emitted to DataHub, inferring its schema.

Screenshots
Value of the collection that fail:
image

Desktop (please complete the following information):

  • OS: Mac/Linux
  • Browser N/A
  • Version: acryl-datahub 0.11.0.5. (Python 3.10.10)

Additional context

Mapping for Double is not covered in https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/mongodb.py#L115

@hsheth2
Copy link
Collaborator

hsheth2 commented Oct 30, 2023

@jGuzmanSan should be fixed by #9145

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants