Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invitro export rewrite #958

Merged
merged 6 commits into from
Jan 5, 2024
Merged

Invitro export rewrite #958

merged 6 commits into from
Jan 5, 2024

Conversation

rabstejnek
Copy link
Collaborator

@rabstejnek rabstejnek commented Dec 8, 2023

Update invitro exports to use ORM values instead of expensive serializers.

@rabstejnek rabstejnek mentioned this pull request Dec 8, 2023
@rabstejnek rabstejnek changed the title Convert invitro exports Invitro export rewrite Dec 8, 2023
@rabstejnek
Copy link
Collaborator Author

@shapiromatron The exports are complete except for the Category columns on the DataPivotEndpoint exporter; you're familiar with some tag methods that may be able to efficiently do this, so I will leave this task with you. Let me know if you'd like to pass it back to me or if I can help in any way!

@rabstejnek rabstejnek marked this pull request as ready for review December 8, 2023 17:07
@shapiromatron shapiromatron self-assigned this Dec 11, 2023
Copy link
Owner

@shapiromatron shapiromatron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is really close!

We should fix how dtxsid is handled (to fix the prior error in implementation). Please take a look at my changes here to make sure you're ok with my revisions:

  • f6b1449 add endpoint categories (19 hours ago) {Andy Shapiro}

I'll also create a card to look into some more aggressive optimizations in the future, but I think what you did here was perfectly appropriate and it should be simpler in the future to investigate larger changes after we have our comparison scripts online.

self.queryset.first().assessment_id,
study_ids,
"invitro",
def handle_dsstox(self, df: pd.DataFrame) -> pd.DataFrame:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way this was previously handled is definitely a bug. The TSV is rendered incorrectly; this should be modified to just show the dtxsid_id if a available, eg., modify to just show the DTXSID e.g., DTXSID7037717

image

I'll link to to possible fix below; we might as well keep the the exporter you created in the assessment b/c it may come in handy

)
return df.drop(columns=dsstox_cols)

def handle_dose_groups(self, df: pd.DataFrame) -> pd.DataFrame:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, this is impressive, and really complicated. I think keeping it as is is good for now, this seems like a conservative conversion from our initial approach. However, when I try it on a large dataset, it's slow. Not super slow, so it may be good enough.

I'll make a card in our backlog to explore more aggressive optimizations. It'll be easier to explore the impact of the optimizations once we have the jupyter notebooks you'll be building for comparing different approaches.

To document some exploration ideas, I got pretty far with this which ran really quickly:

df = pd.DataFrame(IVEndpointGroup.objects.filter(endpoint__assessment=123).order_by('endpoint', 'dose_group_id').values())
col_vals = ['dose','n','response','variance','difference_control','significant_control','cytotoxicity_observed']
df1 = pd.pivot(df.query('dose > 0'), values=col_vals,index=['endpoint_id'],columns=['dose_group_id'])
df1.columns = df1.columns.map(lambda el: f'{el[0].replace("_", " ").title()} {el[1]+1}')
df1.fillna("-")

doseRange[0],
doseRange[1],
number_doses,
def handle_benchmarks(self, df: pd.DataFrame) -> pd.DataFrame:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar approach to EndpointGroup could probably be applied here. Again, in the future, once we've got our comparisons scripts complete.

"iv_chemical",
"chemical",
),
DSSToxExport(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think DSSToxExport can be removed, and we add dtxsid_id to the IVChemicalExport.

.reset_index(drop=True)
)

def handle_dsstox(self, df: pd.DataFrame) -> pd.DataFrame:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handle dsstox as above

@rabstejnek
Copy link
Collaborator Author

I think this is really close!

We should fix how dtxsid is handled (to fix the prior error in implementation). Please take a look at my changes here to make sure you're ok with my revisions:

  • f6b1449 add endpoint categories (19 hours ago) {Andy Shapiro}

I'll also create a card to look into some more aggressive optimizations in the future, but I think what you did here was perfectly appropriate and it should be simpler in the future to investigate larger changes after we have our comparison scripts online.

Your changes look good! I agree about optimizing in future PRs, there's definitely some work to do there.

As for the DTXSID changes, I kept the DSSToxExport for possible future use but removed it from the current exporters in favor of just the DTXSID:

image

If these changes are fine with you then I'm fine with merging!

Copy link
Owner

@shapiromatron shapiromatron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok to merge!

@rabstejnek rabstejnek merged commit a8ead5a into exports-v2 Jan 5, 2024
3 checks passed
@rabstejnek rabstejnek deleted the invitro-export-rewrite2 branch January 5, 2024 21:03
rabstejnek added a commit that referenced this pull request Jul 2, 2024
* Epi export rewrite (#911)

* preliminary epi rewrite

* changes

* changes

* changes

* fix

* added back something accidentally deleted

* moved code

* fix test

* remove old stuff

* cleanup:

* use vectorized timestamp conversion

* minor formatting

---------

Co-authored-by: Andy Shapiro <shapiromatron@gmail.com>

* rewrite riskofbias exports (#921)

* preliminary epi rewrite

* changes

* changes

* changes

* fix

* added back something accidentally deleted

* moved code

* fix test

* remove old stuff

* cleanup:

* rewrite riskofbias exports

* update naming for domain and metric

---------

Co-authored-by: Daniel Rabstejnek <rabstejnek@gmail.com>

* Epimeta export rewrite (#922)

* preliminary epi rewrite

* changes

* changes

* changes

* fix

* added back something accidentally deleted

* moved code

* fix test

* remove old stuff

* cleanup:

* epimeta export rewrite

* remove obsolete code

* changes

* merge fix

* update admin site to browse data pivot by evidence type

---------

Co-authored-by: Andy Shapiro <shapiromatron@gmail.com>

* Invitro export rewrite (#958)

* Convert invitro exports

* Fix exporter datetime converter when datetime is None

* Fix exports where there's no data

* DTXSID in export should be None instead of useless dict if missing

* add endpoint categories

* Remove bloated dsstox dict from exports in favor of dtxsid

---------

Co-authored-by: Andy Shapiro <shapiromatron@gmail.com>

* Animal export rewrite (#961)

* Made animal model exports, began configuring exporters

* first export largely done, second export mostly done

* Some cleanups, finished fourth animal exporter

* Changes after more testing

* Fix tests

* Remove remaining flat_complete_* methods

* Cleanup

* Update sql_display to accept dict

* Add TODO to comment to easily find it

* Rename exporters to match the flat file exporter class names

* Add safeguard for qs accessing and fix groupby side effects (#982)

* two updates

* move to correct place

* exports-v2 - Updates from image review (#1058)

* remove duplicate rename calls

* add ci calculation

* update treatment period calculation

* add back space; fix caused too many false positives in comparison

* properly handle a 500 response from the server in a data pivot

* fix epi logic from plot review

* fix invitro export when category id is null; remove category id from export

* add unique rob columns

* use same method for categories

* refactor into a reusable method

* remove pytest warning "Marks applied to fixtures have no effect"

* refactor; write tests

* add tests

* rewrite animal export test using same pattern

* add udf to openapi

* remove extra whitespace (will change visuals; known change)

---------

Co-authored-by: Daniel Rabstejnek <rabstejnek@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants