Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Functionality to export any dashboards-as-code into CSV #269

Merged
merged 3 commits into from
Sep 11, 2024

Conversation

jgarciaf106
Copy link
Contributor

Functionality

  • added functionality to export to CSV using DashboardMetadata.

Tests

  • manually tested
  • added unit tests

Comment on lines 1024 to 1036
class ExportDashboard(DashboardMetadata):
def __init__(self, sql_backend, config):
self._sql_backend = sql_backend
self._config = config
super().__init__(display_name="")

def export_dashboard(
self,
queries_path: Path,
export_path: Path,
catalog: str = "hive_metastore",
database: None = None,
) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of adding the class, create def export_to_zipped_csv(self, sql_backend: SqlBackend) -> Path on DashboardMetadata class. catalog/database replacement is already handled in that class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll refactor to follow your feedback thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes Applied

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a helper function to zip the CSV to avoid increasing the scope of the function.

src/databricks/labs/lsql/dashboards.py Outdated Show resolved Hide resolved
Comment on lines 1046 to 1047
file_name = f"{export_path}/{tile.metadata.id}.csv"
with open(file_name, mode="w", newline="", encoding="utf-8") as file:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
file_name = f"{export_path}/{tile.metadata.id}.csv"
with open(file_name, mode="w", newline="", encoding="utf-8") as file:
tile_export = export_path / f"{tile.metadata.id}.csv"
with tile_export.open(mode="w", newline="", encoding="utf-8") as hadle:

use pathlib.Path

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll refactor to follow your feedback thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes applied.

for tile in dashboard.tiles:
if not tile.metadata.is_query():
continue
tile_export = export_path / f"{tile.metadata.id}.csv"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to create temporary CSV files if you can write them directly to the open zip?https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile.open

with ZipFile(target_folder / 'ucx-export.zip', mode='w') as z:
    ...
    with z.open(f'{tile.id}.csv') as f:
        writer = csv.DictWriter(f, fieldnames=headers)
        writer.writeheader()
        for row in query_results:
             ...

this way you don't have to cleanup a file. See the feedback in https://github.com/databrickslabs/ucx/pull/2553/files

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to Use ZipFile and CVS to avoid temp files or cleanup.

Comment on lines 917 to 923
def export_to_zipped_csv(
self, sql_backend: SqlBackend, queries_path: Path, export_path: Path, catalog: str, database: str
) -> Path:
"""Export the dashboard queries to CSV files in a ZIP archive."""
dashboard = self.from_path(queries_path)
dashboard = dashboard.replace_database(catalog=catalog, database=database)
for tile in dashboard.tiles:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def export_to_zipped_csv(
self, sql_backend: SqlBackend, queries_path: Path, export_path: Path, catalog: str, database: str
) -> Path:
"""Export the dashboard queries to CSV files in a ZIP archive."""
dashboard = self.from_path(queries_path)
dashboard = dashboard.replace_database(catalog=catalog, database=database)
for tile in dashboard.tiles:
def export_to_zipped_csv(self, sql_backend: SqlBackend, export_path: Path) -> Path:
"""Export the dashboard queries to CSV files in a ZIP archive."""
for tile in self.tiles:

why do you need to call from_path if you already have a dashboard metadata instance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function is using the Dashboard Metadata instance to avoid additional parameters.

Comment on lines 908 to 911
except FileNotFoundError:
logger.warning(f"File {file_path} not found.")
except PermissionError:
logger.warning(f"Permission denied for {file_path} or {zip_path}.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
except FileNotFoundError:
logger.warning(f"File {file_path} not found.")
except PermissionError:
logger.warning(f"Permission denied for {file_path} or {zip_path}.")

don't swallow errors here.

Copy link
Collaborator

@nfx nfx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@nfx nfx merged commit 5dfaaa0 into databrickslabs:main Sep 11, 2024
7 of 8 checks passed
nfx added a commit that referenced this pull request Sep 11, 2024
* Added Functionality to export any dashboards-as-code into CSV ([#269](#269)). The `DashboardMetadata` class now includes a new method, `export_to_zipped_csv`, which enables exporting any dashboard as CSV files in a ZIP archive. This method accepts `sql_backend` and `export_path` as parameters and exports dashboard queries to CSV files in the specified ZIP archive by iterating through tiles and fetching dashboard queries if the tile is a query. To ensure the proper functioning of this feature, unit tests and manual testing have been conducted. A new test, `test_dashboards_export_to_zipped_csv`, has been added to verify the correct export of dashboard data to a CSV file.
* Added support for generic types in `SqlBackend` ([#272](#272)). In this release, we've added support for using rich dataclasses, including those with optional and generic types, in the `SqlBackend` of the `StatementExecutionBackend` class. The new functionality is demonstrated in the `test_supports_complex_types` unit test, which creates a `Nested` dataclass containing various complex data types, such as nested dataclasses, `datetime` objects, `dict`, `list`, and optional fields. This enhancement is achieved by updating the `save_table` method to handle the conversion of complex dataclasses to SQL statements. To facilitate type inference, we've introduced a new `StructInference` class that converts Python dataclasses and built-in types to their corresponding SQL Data Definition Language (DDL) representations. This addition simplifies data definition and manipulation operations while maintaining type safety and compatibility with various SQL data types.
@nfx nfx mentioned this pull request Sep 11, 2024
nfx added a commit that referenced this pull request Sep 11, 2024
* Added Functionality to export any dashboards-as-code into CSV
([#269](#269)). The
`DashboardMetadata` class now includes a new method,
`export_to_zipped_csv`, which enables exporting any dashboard as CSV
files in a ZIP archive. This method accepts `sql_backend` and
`export_path` as parameters and exports dashboard queries to CSV files
in the specified ZIP archive by iterating through tiles and fetching
dashboard queries if the tile is a query. To ensure the proper
functioning of this feature, unit tests and manual testing have been
conducted. A new test, `test_dashboards_export_to_zipped_csv`, has been
added to verify the correct export of dashboard data to a CSV file.
* Added support for generic types in `SqlBackend`
([#272](#272)). In this
release, we've added support for using rich dataclasses, including those
with optional and generic types, in the `SqlBackend` of the
`StatementExecutionBackend` class. The new functionality is demonstrated
in the `test_supports_complex_types` unit test, which creates a `Nested`
dataclass containing various complex data types, such as nested
dataclasses, `datetime` objects, `dict`, `list`, and optional fields.
This enhancement is achieved by updating the `save_table` method to
handle the conversion of complex dataclasses to SQL statements. To
facilitate type inference, we've introduced a new `StructInference`
class that converts Python dataclasses and built-in types to their
corresponding SQL Data Definition Language (DDL) representations. This
addition simplifies data definition and manipulation operations while
maintaining type safety and compatibility with various SQL data types.
Copy link
Member

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgarciaf106 : Could you open another PR to address my comments

@@ -893,6 +897,35 @@ def _from_dashboard_folder(cls, folder: Path) -> "DashboardMetadata":
tiles.append(tile)
return cls(display_name=folder.name, _tiles=tiles)

def export_to_zipped_csv(self, sql_backend: SqlBackend, export_path: Path) -> Path:
"""Export the dashboard queries to CSV files directly into a ZIP archive."""
zip_export = export_path / "export_to_zipped_csv.zip"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgarciaf106 : Please except the function argument for the zip_export instead of adding the hardcoded "export_to_zipped_csv.zip"


with ZipFile(zip_export, mode="w") as zip_file:
for tile in self.tiles:
if tile.metadata.is_query():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We more commonly use a pattern with continue

if not tile.metadata.is_query():
   continue
...

if tile.metadata.is_query():
rows = sql_backend.fetch(tile.content)

if not rows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More precies

if len(rows) == 0


bytes_buffer = BytesIO(buffer.getvalue().encode("utf-8"))

with zip_file.open(f"{tile.metadata.id}.csv", "w") as csv_file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing bytes, right?

with zip_file.open(f"{tile.metadata.id}.csv", "wb") as csv_file:

writer.writeheader()
writer.writerow(row.asDict())

bytes_buffer = BytesIO(buffer.getvalue().encode("utf-8"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is redundant, see comment below

bytes_buffer = BytesIO(buffer.getvalue().encode("utf-8"))

with zip_file.open(f"{tile.metadata.id}.csv", "w") as csv_file:
csv_file.write(bytes_buffer.getvalue())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bytest buffer is redudant

csv_file.write(buffer.getvalue().encode("utf-8"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants