-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Functionality to export any dashboards-as-code into CSV #269
Conversation
class ExportDashboard(DashboardMetadata): | ||
def __init__(self, sql_backend, config): | ||
self._sql_backend = sql_backend | ||
self._config = config | ||
super().__init__(display_name="") | ||
|
||
def export_dashboard( | ||
self, | ||
queries_path: Path, | ||
export_path: Path, | ||
catalog: str = "hive_metastore", | ||
database: None = None, | ||
) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of adding the class, create def export_to_zipped_csv(self, sql_backend: SqlBackend) -> Path
on DashboardMetadata
class. catalog/database replacement is already handled in that class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll refactor to follow your feedback thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes Applied
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a helper function to zip the CSV to avoid increasing the scope of the function.
file_name = f"{export_path}/{tile.metadata.id}.csv" | ||
with open(file_name, mode="w", newline="", encoding="utf-8") as file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file_name = f"{export_path}/{tile.metadata.id}.csv" | |
with open(file_name, mode="w", newline="", encoding="utf-8") as file: | |
tile_export = export_path / f"{tile.metadata.id}.csv" | |
with tile_export.open(mode="w", newline="", encoding="utf-8") as hadle: |
use pathlib.Path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll refactor to follow your feedback thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes applied.
for tile in dashboard.tiles: | ||
if not tile.metadata.is_query(): | ||
continue | ||
tile_export = export_path / f"{tile.metadata.id}.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to create temporary CSV files if you can write them directly to the open zip?https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.open
with ZipFile(target_folder / 'ucx-export.zip', mode='w') as z:
...
with z.open(f'{tile.id}.csv') as f:
writer = csv.DictWriter(f, fieldnames=headers)
writer.writeheader()
for row in query_results:
...
this way you don't have to cleanup a file. See the feedback in https://github.com/databrickslabs/ucx/pull/2553/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored to Use ZipFile and CVS to avoid temp files or cleanup.
def export_to_zipped_csv( | ||
self, sql_backend: SqlBackend, queries_path: Path, export_path: Path, catalog: str, database: str | ||
) -> Path: | ||
"""Export the dashboard queries to CSV files in a ZIP archive.""" | ||
dashboard = self.from_path(queries_path) | ||
dashboard = dashboard.replace_database(catalog=catalog, database=database) | ||
for tile in dashboard.tiles: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def export_to_zipped_csv( | |
self, sql_backend: SqlBackend, queries_path: Path, export_path: Path, catalog: str, database: str | |
) -> Path: | |
"""Export the dashboard queries to CSV files in a ZIP archive.""" | |
dashboard = self.from_path(queries_path) | |
dashboard = dashboard.replace_database(catalog=catalog, database=database) | |
for tile in dashboard.tiles: | |
def export_to_zipped_csv(self, sql_backend: SqlBackend, export_path: Path) -> Path: | |
"""Export the dashboard queries to CSV files in a ZIP archive.""" | |
for tile in self.tiles: |
why do you need to call from_path
if you already have a dashboard metadata instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function is using the Dashboard Metadata instance to avoid additional parameters.
except FileNotFoundError: | ||
logger.warning(f"File {file_path} not found.") | ||
except PermissionError: | ||
logger.warning(f"Permission denied for {file_path} or {zip_path}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except FileNotFoundError: | |
logger.warning(f"File {file_path} not found.") | |
except PermissionError: | |
logger.warning(f"Permission denied for {file_path} or {zip_path}.") |
don't swallow errors here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
* Added Functionality to export any dashboards-as-code into CSV ([#269](#269)). The `DashboardMetadata` class now includes a new method, `export_to_zipped_csv`, which enables exporting any dashboard as CSV files in a ZIP archive. This method accepts `sql_backend` and `export_path` as parameters and exports dashboard queries to CSV files in the specified ZIP archive by iterating through tiles and fetching dashboard queries if the tile is a query. To ensure the proper functioning of this feature, unit tests and manual testing have been conducted. A new test, `test_dashboards_export_to_zipped_csv`, has been added to verify the correct export of dashboard data to a CSV file. * Added support for generic types in `SqlBackend` ([#272](#272)). In this release, we've added support for using rich dataclasses, including those with optional and generic types, in the `SqlBackend` of the `StatementExecutionBackend` class. The new functionality is demonstrated in the `test_supports_complex_types` unit test, which creates a `Nested` dataclass containing various complex data types, such as nested dataclasses, `datetime` objects, `dict`, `list`, and optional fields. This enhancement is achieved by updating the `save_table` method to handle the conversion of complex dataclasses to SQL statements. To facilitate type inference, we've introduced a new `StructInference` class that converts Python dataclasses and built-in types to their corresponding SQL Data Definition Language (DDL) representations. This addition simplifies data definition and manipulation operations while maintaining type safety and compatibility with various SQL data types.
* Added Functionality to export any dashboards-as-code into CSV ([#269](#269)). The `DashboardMetadata` class now includes a new method, `export_to_zipped_csv`, which enables exporting any dashboard as CSV files in a ZIP archive. This method accepts `sql_backend` and `export_path` as parameters and exports dashboard queries to CSV files in the specified ZIP archive by iterating through tiles and fetching dashboard queries if the tile is a query. To ensure the proper functioning of this feature, unit tests and manual testing have been conducted. A new test, `test_dashboards_export_to_zipped_csv`, has been added to verify the correct export of dashboard data to a CSV file. * Added support for generic types in `SqlBackend` ([#272](#272)). In this release, we've added support for using rich dataclasses, including those with optional and generic types, in the `SqlBackend` of the `StatementExecutionBackend` class. The new functionality is demonstrated in the `test_supports_complex_types` unit test, which creates a `Nested` dataclass containing various complex data types, such as nested dataclasses, `datetime` objects, `dict`, `list`, and optional fields. This enhancement is achieved by updating the `save_table` method to handle the conversion of complex dataclasses to SQL statements. To facilitate type inference, we've introduced a new `StructInference` class that converts Python dataclasses and built-in types to their corresponding SQL Data Definition Language (DDL) representations. This addition simplifies data definition and manipulation operations while maintaining type safety and compatibility with various SQL data types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgarciaf106 : Could you open another PR to address my comments
@@ -893,6 +897,35 @@ def _from_dashboard_folder(cls, folder: Path) -> "DashboardMetadata": | |||
tiles.append(tile) | |||
return cls(display_name=folder.name, _tiles=tiles) | |||
|
|||
def export_to_zipped_csv(self, sql_backend: SqlBackend, export_path: Path) -> Path: | |||
"""Export the dashboard queries to CSV files directly into a ZIP archive.""" | |||
zip_export = export_path / "export_to_zipped_csv.zip" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgarciaf106 : Please except the function argument for the zip_export
instead of adding the hardcoded "export_to_zipped_csv.zip"
|
||
with ZipFile(zip_export, mode="w") as zip_file: | ||
for tile in self.tiles: | ||
if tile.metadata.is_query(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We more commonly use a pattern with continue
if not tile.metadata.is_query():
continue
...
if tile.metadata.is_query(): | ||
rows = sql_backend.fetch(tile.content) | ||
|
||
if not rows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More precies
if len(rows) == 0
|
||
bytes_buffer = BytesIO(buffer.getvalue().encode("utf-8")) | ||
|
||
with zip_file.open(f"{tile.metadata.id}.csv", "w") as csv_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing bytes, right?
with zip_file.open(f"{tile.metadata.id}.csv", "wb") as csv_file:
writer.writeheader() | ||
writer.writerow(row.asDict()) | ||
|
||
bytes_buffer = BytesIO(buffer.getvalue().encode("utf-8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is redundant, see comment below
bytes_buffer = BytesIO(buffer.getvalue().encode("utf-8")) | ||
|
||
with zip_file.open(f"{tile.metadata.id}.csv", "w") as csv_file: | ||
csv_file.write(bytes_buffer.getvalue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bytest buffer is redudant
csv_file.write(buffer.getvalue().encode("utf-8"))
Functionality
Tests