Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"The value of property is longer than 1500 bytes" error on BigQquery REPEATED STRING materialization #1633

Closed
adriangay opened this issue Jun 9, 2021 · 3 comments · Fixed by #2181
Labels
wontfix This will not be worked on

Comments

@adriangay
Copy link

Expected Behavior

When materializing REPEATED features from a BigQuery table into GCP Feast online store we should not get this error

Current Behavior

One column of a BigQuery table REPEATED STRING. The number of values in this column varies from a few to many hundreds of string IDs. It appears that the total size of the repeated string causes the issue. After 97% ingestion:

97%|█████████████████████████████████████████████████████▍ | 49606/51056 [00:30<00:00, 1604.20it/s]

The actual stack trace is:

Traceback (most recent call last):
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 67, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.INVALID_ARGUMENT
        details = "The value of property "content_30d" is longer than 1500 bytes."
        debug_error_string = "{"created":"@1623177301.342896000","description":"Error received from peer ipv6:[2a00:1450:4009:81e::200a]:443","file":"src/core/lib/surface/call.cc","file_line":1067,"grpc_message":"The value of property "content_30d" is longer than 1500 bytes.","grpc_status":3}"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/bin/feast", line 8, in <module>
    sys.exit(cli())
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/cli.py", line 243, in materialize_command
    store.materialize(
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/telemetry.py", line 151, in exception_logging_wrapper
    result = func(*args, **kwargs)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/feature_store.py", line 444, in materialize
    provider.materialize_single_feature_view(
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/infra/gcp.py", line 192, in materialize_single_feature_view
    self.online_write_batch(
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/infra/gcp.py", line 121, in online_write_batch
    pool.map(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/infra/gcp.py", line 122, in <lambda>
    lambda b: _write_minibatch(client, project, table, b, progress),
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/feast/infra/gcp.py", line 270, in _write_minibatch
    client.put_multi(entities)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/cloud/datastore/batch.py", line 328, in __exit__
    self.commit()
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/cloud/datastore/transaction.py", line 304, in commit
    super(Transaction, self).commit(**kwargs)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/cloud/datastore/batch.py", line 300, in commit
    self._commit(retry=retry, timeout=timeout)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/cloud/datastore/batch.py", line 257, in _commit
    commit_response_pb = self._client._datastore_api.commit(
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/cloud/datastore_v1/services/datastore/client.py", line 627, in commit
    response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
    return wrapped_func(*args, **kwargs)
  File "/Users/gaya/Library/Caches/pypoetry/virtualenvs/explore-feast-feature-store-ECdpd3VL-py3.8/lib/python3.8/site-packages/google/api_core/grpc_helpers.py", line 69, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.InvalidArgument: 400 The value of property "content_30d" is longer than 1500 bytes.

The Feature View is:

propensity_model_data_view = FeatureView(
    name="propensity_model_data_stats",
    entities=["customer_id"],
    ttl=Duration(seconds=86400 * 10),
    features=[
        Feature(name="avg_duration_30d", dtype=ValueType.FLOAT),
        Feature(name="content_30d", dtype=ValueType.STRING),
        Feature(name="common_genre", dtype=ValueType.STRING),
        Feature(name="tenure", dtype=ValueType.FLOAT)
    ],
    online=True,
    input=propensity_model_data,
    tags={},
)

The BQ schema is:

Field name Type Mode
tenure INTEGER
target INTEGER
sub_timestamp TIMESTAMP
customer_id STRING
created_timestamp TIMESTAMP
avg_duration_30d FLOAT
common_genre STRING
content_30d STRING REPEATED

Steps to reproduce

Materialise a column of BigQuery REPEATED STRING with total byte count > 1500

Reducing the total size, materialization runs to completion

Specifications

  • Version: 0.10.6
  • Platform: GCP
  • Subsystem: Firestore

Possible Solution

I raised this on Slack originally and Willem Pienaar thinks he knows the cause of the problem. The Slack thread is here:
https://tectonfeast.slack.com/archives/C01MSKCMB37/p1623155984116000

@woop
Copy link
Member

woop commented Jul 5, 2021

Thanks for raising this issue @adriangay. It's slipped under my radar up until now. It seems specific to Firestore/Datastore.

One solution would be an optional flag that we can enable for Datastore which compresses values for storage.

Do you think that would be effective for your data? You can try it out over here http://www.txtwizard.net/compression

@adriangay
Copy link
Author

@woop apologies for missing your reply. If you mean compressing the values for storage in Datastore and decompressing them on the way out transparently at the Feast API, then I guess thats OK. It should not increase latency much [for online features].

@stale
Copy link

stale bot commented Dec 25, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Dec 25, 2021
@stale stale bot closed this as completed Jan 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants