Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix for nonunique index checking JSON serialization #35

Merged
merged 2 commits into from
Sep 2, 2022

Conversation

shouples
Copy link
Collaborator

@shouples shouples commented Sep 2, 2022

Adds tests and handles an issue where checking if a series is json serializable can raise a ValueError if the index values aren't unique, due to the underlying pandas json writer.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [1], in <cell line: 5>()
      2 british_youtube = pd.read_csv("GBvideos.csv")
      4 df = pd.concat([canadian_youtube, british_youtube])
----> 5 df

File /opt/conda/lib/python3.9/site-packages/IPython/core/displayhook.py:262, in DisplayHook.__call__(self, result)
    260 self.start_displayhook()
    261 self.write_output_prompt()
--> 262 format_dict, md_dict = self.compute_format_data(result)
    263 self.update_user_ns(result)
    264 self.fill_exec_result(result)

File /opt/conda/lib/python3.9/site-packages/IPython/core/displayhook.py:151, in DisplayHook.compute_format_data(self, result)
    121 def compute_format_data(self, result):
    122     """Compute format data of the object to be displayed.
    123 
    124     The format data is a generalization of the :func:`repr` of an object.
   (...)
    149 
    150     """
--> 151     return self.shell.display_formatter.format(result)

File /opt/conda/lib/python3.9/site-packages/dx/formatters/dataresource.py:127, in DXDataResourceDisplayFormatter.format(self, obj, **kwargs)
    124 def format(self, obj, **kwargs):
    126     if isinstance(obj, tuple(settings.RENDERABLE_OBJECTS)):
--> 127         handle_dataresource_format(obj)
    128         return ({}, {})
    130     return DEFAULT_IPYTHON_DISPLAY_FORMATTER.format(obj, **kwargs)

File /opt/conda/lib/python3.9/site-packages/dx/formatters/dataresource.py:80, in handle_dataresource_format(obj, ipython_shell)
     78 orig_obj = obj.copy()
     79 orig_dtypes = orig_obj.dtypes.to_dict()
---> 80 obj = normalize_index_and_columns(obj)
     81 obj_hash = generate_df_hash(obj)
     82 update_existing_display = obj_hash in SUBSET_TO_DATAFRAME_HASH

File /opt/conda/lib/python3.9/site-packages/dx/utils/formatting.py:55, in normalize_index_and_columns(df)
     52 display_df = df.copy()
     54 display_df = normalize_index(display_df)
---> 55 display_df = normalize_columns(display_df)
     57 # build_table_schema() doesn't like pd.NAs
     58 display_df.fillna(np.nan, inplace=True)

File /opt/conda/lib/python3.9/site-packages/dx/utils/formatting.py:112, in normalize_columns(df)
    110 logger.debug("-- cleaning before display --")
    111 for column in df.columns:
--> 112     df[column] = clean_column_values_for_display(df[column])
    114 return df

File /opt/conda/lib/python3.9/site-packages/dx/utils/formatting.py:142, in clean_column_values_for_display(s)
    139 s = datatypes.handle_complex_number_series(s)
    141 s = geometry.handle_geometry_series(s)
--> 142 s = datatypes.handle_unk_type_series(s)
    143 return s

File /opt/conda/lib/python3.9/site-packages/dx/utils/datatypes.py:194, in handle_unk_type_series(s)
    193 def handle_unk_type_series(s: pd.Series) -> pd.Series:
--> 194     if not is_json_serializable(s):
    195         logger.debug(f"series `{s.name}` has non-JSON-serializable types; converting to string")
    196         s = s.astype(str)

File /opt/conda/lib/python3.9/site-packages/dx/utils/datatypes.py:205, in is_json_serializable(s)
    201 """
    202 Returns True if the object can be serialized to JSON.
    203 """
    204 try:
--> 205     s.to_json()
    206     return True
    207 except (TypeError, OverflowError, UnicodeDecodeError):

File /opt/conda/lib/python3.9/site-packages/pandas/core/generic.py:2621, in NDFrame.to_json(self, path_or_buf, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines, compression, index, indent, storage_options)
   2618 config.is_nonnegative_int(indent)
   2619 indent = indent or 0
-> 2621 return json.to_json(
   2622     path_or_buf=path_or_buf,
   2623     obj=self,
   2624     orient=orient,
   2625     date_format=date_format,
   2626     double_precision=double_precision,
   2627     force_ascii=force_ascii,
   2628     date_unit=date_unit,
   2629     default_handler=default_handler,
   2630     lines=lines,
   2631     compression=compression,
   2632     index=index,
   2633     indent=indent,
   2634     storage_options=storage_options,
   2635 )

File /opt/conda/lib/python3.9/site-packages/pandas/io/json/_json.py:110, in to_json(path_or_buf, obj, orient, date_format, double_precision, force_ascii, date_unit, default_handler, lines, compression, index, indent, storage_options)
    107 else:
    108     raise NotImplementedError("'obj' should be a Series or a DataFrame")
--> 110 s = writer(
    111     obj,
    112     orient=orient,
    113     date_format=date_format,
    114     double_precision=double_precision,
    115     ensure_ascii=force_ascii,
    116     date_unit=date_unit,
    117     default_handler=default_handler,
    118     index=index,
    119     indent=indent,
    120 ).write()
    122 if lines:
    123     s = convert_to_line_delimits(s)

File /opt/conda/lib/python3.9/site-packages/pandas/io/json/_json.py:165, in Writer.__init__(self, obj, orient, date_format, double_precision, ensure_ascii, date_unit, index, default_handler, indent)
    162 self.indent = indent
    164 self.is_copy = None
--> 165 self._format_axes()

File /opt/conda/lib/python3.9/site-packages/pandas/io/json/_json.py:202, in SeriesWriter._format_axes(self)
    200 def _format_axes(self):
    201     if not self.obj.index.is_unique and self.orient == "index":
--> 202         raise ValueError(f"Series index must be unique for orient='{self.orient}'")

ValueError: Series index must be unique for orient='index'

@shouples shouples merged commit 84a04f0 into main Sep 2, 2022
@shouples shouples deleted the djs/test-indexing branch September 2, 2022 20:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant