You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Emitting metadata from MongoDB to DataHub, the code fails when process a collection that uses a Double field.
The ingestion fails and stops abruptly, reporting the error
[2023-08-01 14:26:42,034] ERROR {datahub.entrypoints:199} - Command failed: argument of type 'float' is not iterable
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/datahub/entrypoints.py", line 186, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 448, in wrapper
raise e
File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 397, in wrapper
res = func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
return func(ctx, *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 195, in run
ret = loop.run_until_complete(run_ingestion_and_check_upgrade())
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 179, in run_ingestion_and_check_upgrade
ret = await ingestion_future
File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 137, in run_pipeline_to_completion
raise e
File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 129, in run_pipeline_to_completion
pipeline.run()
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 367, in run
for wu in itertools.islice(
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 143, in auto_workunit_reporter
for wu in stream:
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 208, in auto_browse_path_v2
for urn, batch in _batch_workunits_by_urn(stream):
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 346, in _batch_workunits_by_urn
for wu in stream:
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 156, in auto_materialize_referenced_tags
for wu in stream:
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/api/source_helpers.py", line 70, in auto_status_aspect
for wu in stream:
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/mongodb.py", line 338, in get_workunits_internal
collection_schema = construct_schema_pymongo(
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/mongodb.py", line 196, in construct_schema_pymongo
return construct_schema(list(documents), delimiter)
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/schema_inference/object.py", line 162, in construct_schema
"nullable": is_nullable_collection(collection, field_path),
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/schema_inference/object.py", line 80, in is_nullable_collection
return any(is_field_nullable(doc, field_path) for doc in collection)
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/schema_inference/object.py", line 80, in <genexpr>
return any(is_field_nullable(doc, field_path) for doc in collection)
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/schema_inference/object.py", line 57, in is_field_nullable
return any(is_field_nullable(x, remaining_fields) for x in doc[field])
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/schema_inference/object.py", line 57, in <genexpr>
return any(is_field_nullable(x, remaining_fields) for x in doc[field])
File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/source/schema_inference/object.py", line 37, in is_field_nullable
if field in doc:
TypeError: argument of type 'float' is not iterable
To Reproduce
Ingest the content of a collection with double types.
Expected behavior
The collection content is emitted to DataHub, inferring its schema.
Screenshots
Value of the collection that fail:
Desktop (please complete the following information):
Describe the bug
Emitting metadata from MongoDB to DataHub, the code fails when process a collection that uses a Double field.
The ingestion fails and stops abruptly, reporting the error
To Reproduce
Ingest the content of a collection with double types.
Expected behavior
The collection content is emitted to DataHub, inferring its schema.
Screenshots
Value of the collection that fail:
Desktop (please complete the following information):
Additional context
Mapping for Double is not covered in https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/mongodb.py#L115
The text was updated successfully, but these errors were encountered: