Add Convert To Qbeast #102

osopardo1 · 2022-05-09T10:22:31Z

The only way of writing in Qbeast Format is to load your data and write it again with Spark Dataframes API.

It could be good to have some more easy ways to convert data in other formats to Qbeast, and that can be compatible with reading when no Metadata is found.

For that, we can think of two approaches:

Write the data in the same place but organized with the Qbeast index. If more data is added while the conversion is taking place, we are targeting this data as non-indexed and reading all of them in case we need it.
Write the data in the same place and mark it as replicated cubes. So we will only duplicate the data we need for optimizing.

Doubts/things we need to figure out:

How to specify the columns to index in the API
How to handle partitioning? Should be useful to index the columns that are in partition values?
Study the feasibility of the second approach
Other design problems that could arise

osopardo1 · 2023-01-20T09:42:09Z

UPDATE

The Convert To Qbeast command would be a naïve implementation and would only mark the table with Qbeast Metadata. It will not index any of the existing files, and not even add extra metadata to each entry.
The objective is to slowly convert the table into the Qbeast Format, to avoid rewriting the whole dataset in one single process.

The files without Qbeast metadata in the tags would be read as usual, and we need to finish #121 in order to make this operation feasible. The idea is that those files are in a "staging" area, and would be eventually indexed in batches.

The usage would be something like:

QbeastTable.convertToQbeast(columnsToIndex="col1,col2", cubeSize=500)

The operation will trigger a Metadata Update that will change the Delta Log with an entry like:

{
  "metaData": {
    "id": "aa43874a-9688-4d14-8168-e16088641fdb",
    ...
    "configuration": {
      "qbeast.lastRevisionID": "1",
      "qbeast.revision.1": "{\"revisionID\":1,\"timestamp\":1637851757680,\"tableID\":\"/tmp/qb-testing1584592925006274975\",\"desiredCubeSize\":500,\"columnTransformers\":..}"
    },
    "createdTime": 1637851765848
  }
}

osopardo1 · 2023-01-23T16:05:27Z

Other aspects/scope of the command:

A table entirely written in Delta would not be readable from Qbeast unless we trigger Convert To Qbeast command. You can find all the information about it here Make files without Metadata readable with Qbeast #121
The files not converted to Qbeast would not be optimized, analyzed or compacted. Manage the compaction of files without Qbeast metadata can be complex. The goal of the conversion is that those files would be gradually written in Qbeast Format.
How to manage updates? -> This is a complex topic, not sure how should be handled. We did not tested it yet. Need to explore and understand more about it.

Fixes #102 #121 #149

osopardo1 added type: enhancement Improvement of existing feature or code high labels May 9, 2022

osopardo1 mentioned this issue May 10, 2022

Convert To Qbeast #103

Closed

14 tasks

osopardo1 self-assigned this May 25, 2022

osopardo1 mentioned this issue Aug 30, 2022

Make files without Metadata readable with Qbeast #121

Closed

osopardo1 removed the high label Nov 3, 2022

osopardo1 assigned Jiaweihu08 Jan 20, 2023

This was referenced Jan 20, 2023

Support update metadata through MetadataManager #149

Closed

Unexpected exception when reading non-qbeast-formatted data #53

Closed

osopardo1 added this to the Benchmarking Qbeast Format milestone Jan 25, 2023

Jiaweihu08 mentioned this issue Jan 25, 2023

Convert to Qbeast #152

Merged

5 tasks

Jiaweihu08 closed this as completed in #152 Jan 27, 2023

Jiaweihu08 added a commit that referenced this issue Jan 27, 2023

Merge pull request #152 from Jiaweihu08/read-staging-data

3706738

Fixes #102 #121 #149

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Convert To Qbeast #102

Add Convert To Qbeast #102

osopardo1 commented May 9, 2022 •

edited

Loading

osopardo1 commented Jan 20, 2023 •

edited

Loading

osopardo1 commented Jan 23, 2023 •

edited

Loading

Add Convert To Qbeast #102

Add Convert To Qbeast #102

Comments

osopardo1 commented May 9, 2022 • edited Loading

osopardo1 commented Jan 20, 2023 • edited Loading

osopardo1 commented Jan 23, 2023 • edited Loading

osopardo1 commented May 9, 2022 •

edited

Loading

osopardo1 commented Jan 20, 2023 •

edited

Loading

osopardo1 commented Jan 23, 2023 •

edited

Loading