Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Objects #196

Merged
merged 78 commits into from
Nov 23, 2022
Merged

Data Objects #196

merged 78 commits into from
Nov 23, 2022

Conversation

robertdstein
Copy link
Member

This is a big breaking change that abolishes the lists-of-lists-of-lists nesting and introduces a clear, concrete data model for the processors.

There are now:

Data objects. These are either Image or SourceList objects. These replace a tuple image/header and pandas dataframe everywhere.

Batch objects. These are either ImageBatch or SourceBatch objects. They replace "batch" lists.

DataSet objects. These replace the list-of-batches. They must contain batches of the same type.

Bonus: Introduce the concept of metadata for SourceList, so you can assign in a dictionary-like way values to the SourceList rather than adding a column to all individual rows. You can then e.g batch these etc.

@robertdstein robertdstein marked this pull request as ready for review November 23, 2022 04:59
@robertdstein robertdstein enabled auto-merge (squash) November 23, 2022 05:00
@robertdstein robertdstein merged commit c3eb620 into main Nov 23, 2022
@robertdstein robertdstein deleted the data branch November 23, 2022 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Candidates Processor should output a list of dataframes instead of a single dataframe
2 participants