You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a column with a specific datetime format: '%Y%m%d%H%M%S%f'.
So a value such as 20220902110443000000 should be parsed as Sep 2, 2022, 11:04:43.000000.
Whenever I try to pass in data in this format, I get an OverflowError, even if I specify the datetime format in the metadata.
Steps to reproduce
importpandasaspdfromsdv.tabularimportGaussianCopula# create some fake data with this formatdata=pd.DataFrame(data={
'my_column': ['20220902110443000000', '20220916230356000000', '20220826173917000000'],
})
# write metadata and specify the datetime formatmetadata= {
'fields': {
'my_column': {
'type': 'datetime',
'format': '%Y%m%d%H%M%S%f'
}
}
}
# try to run it through the SDVmodel=GaussianCopula(table_metadata=metadata)
model.fit(data)
Stack Trace
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
/usr/local/lib/python3.8/dist-packages/pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
25 frames
OverflowError: Python int too large to convert to C long
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
TypeError: invalid string coercion to datetime
During handling of the above exception, another exception occurred:
OverflowError Traceback (most recent call last)
[/usr/local/lib/python3.8/dist-packages/dateutil/parser/_parser.py](https://localhost:8080/#) in _build_naive(self, res, default)
1233 repl['day'] = monthrange(cyear, cmonth)[1]
1234
-> 1235 naive = default.replace(**repl)
1236
1237 if res.weekday is not None and not res.day:
OverflowError: Python int too large to convert to C long
Context
What's interesting is that RDT seems to be able to handle this type of data well. So maybe something is misconfigured in the SDV?
fromrdtimportHyperTransformerfromrdt.transformers.datetimeimportUnixTimestampEncoderht=HyperTransformer()
ht.set_config(config={
'sdtypes': { 'my_column': 'datetime'},
'transformers': { 'my_column': UnixTimestampEncoder(datetime_format='%Y%m%d%H%M%S%f') }
})
# this works without crashing!transformed=ht.fit_transform(data)
reversed=ht.reverse_transform(transformed)
The text was updated successfully, but these errors were encountered:
npatki
changed the title
Datetime format is causing OverflowError: Python int too large to convert to C long
Some datetime formats cause InvalidDataError, even if the datetime matches the format
Mar 29, 2023
Environment Details
Error Description
I have a column with a specific datetime format:
'%Y%m%d%H%M%S%f'
.So a value such as
20220902110443000000
should be parsed asSep 2, 2022, 11:04:43.000000
.Whenever I try to pass in data in this format, I get an
OverflowError
, even if I specify the datetime format in the metadata.Steps to reproduce
Stack Trace
Context
What's interesting is that
RDT
seems to be able to handle this type of data well. So maybe something is misconfigured in the SDV?The text was updated successfully, but these errors were encountered: