Invitro export rewrite #958

rabstejnek · 2023-12-08T15:27:16Z

Update invitro exports to use ORM values instead of expensive serializers.

rabstejnek · 2023-12-08T17:07:19Z

@shapiromatron The exports are complete except for the Category columns on the DataPivotEndpoint exporter; you're familiar with some tag methods that may be able to efficiently do this, so I will leave this task with you. Let me know if you'd like to pass it back to me or if I can help in any way!

shapiromatron

I think this is really close!

We should fix how dtxsid is handled (to fix the prior error in implementation). Please take a look at my changes here to make sure you're ok with my revisions:

f6b1449 add endpoint categories (19 hours ago) {Andy Shapiro}

I'll also create a card to look into some more aggressive optimizations in the future, but I think what you did here was perfectly appropriate and it should be simpler in the future to investigate larger changes after we have our comparison scripts online.

shapiromatron · 2024-01-04T13:59:24Z

hawc/apps/invitro/exports.py

-                self.queryset.first().assessment_id,
-                study_ids,
-                "invitro",
+    def handle_dsstox(self, df: pd.DataFrame) -> pd.DataFrame:


the way this was previously handled is definitely a bug. The TSV is rendered incorrectly; this should be modified to just show the dtxsid_id if a available, eg., modify to just show the DTXSID e.g., DTXSID7037717

I'll link to to possible fix below; we might as well keep the the exporter you created in the assessment b/c it may come in handy

shapiromatron · 2024-01-04T15:23:22Z

hawc/apps/invitro/exports.py

+        )
+        return df.drop(columns=dsstox_cols)
+
+    def handle_dose_groups(self, df: pd.DataFrame) -> pd.DataFrame:


wow, this is impressive, and really complicated. I think keeping it as is is good for now, this seems like a conservative conversion from our initial approach. However, when I try it on a large dataset, it's slow. Not super slow, so it may be good enough.

I'll make a card in our backlog to explore more aggressive optimizations. It'll be easier to explore the impact of the optimizations once we have the jupyter notebooks you'll be building for comparing different approaches.

To document some exploration ideas, I got pretty far with this which ran really quickly:

df = pd.DataFrame(IVEndpointGroup.objects.filter(endpoint__assessment=123).order_by('endpoint', 'dose_group_id').values()) col_vals = ['dose','n','response','variance','difference_control','significant_control','cytotoxicity_observed'] df1 = pd.pivot(df.query('dose > 0'), values=col_vals,index=['endpoint_id'],columns=['dose_group_id']) df1.columns = df1.columns.map(lambda el: f'{el[0].replace("_", " ").title()} {el[1]+1}') df1.fillna("-")

shapiromatron · 2024-01-04T15:28:24Z

hawc/apps/invitro/exports.py

-                    doseRange[0],
-                    doseRange[1],
-                    number_doses,
+    def handle_benchmarks(self, df: pd.DataFrame) -> pd.DataFrame:


A similar approach to EndpointGroup could probably be applied here. Again, in the future, once we've got our comparisons scripts complete.

shapiromatron · 2024-01-04T15:29:42Z

hawc/apps/invitro/exports.py

+                "iv_chemical",
+                "chemical",
+            ),
+            DSSToxExport(


I think DSSToxExport can be removed, and we add dtxsid_id to the IVChemicalExport.

shapiromatron · 2024-01-04T15:30:48Z

hawc/apps/invitro/exports.py

+            .reset_index(drop=True)
+        )
+
+    def handle_dsstox(self, df: pd.DataFrame) -> pd.DataFrame:


handle dsstox as above

rabstejnek · 2024-01-05T16:24:31Z

I think this is really close!

We should fix how dtxsid is handled (to fix the prior error in implementation). Please take a look at my changes here to make sure you're ok with my revisions:

f6b1449 add endpoint categories (19 hours ago) {Andy Shapiro}

I'll also create a card to look into some more aggressive optimizations in the future, but I think what you did here was perfectly appropriate and it should be simpler in the future to investigate larger changes after we have our comparison scripts online.

Your changes look good! I agree about optimizing in future PRs, there's definitely some work to do there.

As for the DTXSID changes, I kept the DSSToxExport for possible future use but removed it from the current exporters in favor of just the DTXSID:

If these changes are fine with you then I'm fine with merging!

shapiromatron

ok to merge!

* Epi export rewrite (#911) * preliminary epi rewrite * changes * changes * changes * fix * added back something accidentally deleted * moved code * fix test * remove old stuff * cleanup: * use vectorized timestamp conversion * minor formatting --------- Co-authored-by: Andy Shapiro <shapiromatron@gmail.com> * rewrite riskofbias exports (#921) * preliminary epi rewrite * changes * changes * changes * fix * added back something accidentally deleted * moved code * fix test * remove old stuff * cleanup: * rewrite riskofbias exports * update naming for domain and metric --------- Co-authored-by: Daniel Rabstejnek <rabstejnek@gmail.com> * Epimeta export rewrite (#922) * preliminary epi rewrite * changes * changes * changes * fix * added back something accidentally deleted * moved code * fix test * remove old stuff * cleanup: * epimeta export rewrite * remove obsolete code * changes * merge fix * update admin site to browse data pivot by evidence type --------- Co-authored-by: Andy Shapiro <shapiromatron@gmail.com> * Invitro export rewrite (#958) * Convert invitro exports * Fix exporter datetime converter when datetime is None * Fix exports where there's no data * DTXSID in export should be None instead of useless dict if missing * add endpoint categories * Remove bloated dsstox dict from exports in favor of dtxsid --------- Co-authored-by: Andy Shapiro <shapiromatron@gmail.com> * Animal export rewrite (#961) * Made animal model exports, began configuring exporters * first export largely done, second export mostly done * Some cleanups, finished fourth animal exporter * Changes after more testing * Fix tests * Remove remaining flat_complete_* methods * Cleanup * Update sql_display to accept dict * Add TODO to comment to easily find it * Rename exporters to match the flat file exporter class names * Add safeguard for qs accessing and fix groupby side effects (#982) * two updates * move to correct place * exports-v2 - Updates from image review (#1058) * remove duplicate rename calls * add ci calculation * update treatment period calculation * add back space; fix caused too many false positives in comparison * properly handle a 500 response from the server in a data pivot * fix epi logic from plot review * fix invitro export when category id is null; remove category id from export * add unique rob columns * use same method for categories * refactor into a reusable method * remove pytest warning "Marks applied to fixtures have no effect" * refactor; write tests * add tests * rewrite animal export test using same pattern * add udf to openapi * remove extra whitespace (will change visuals; known change) --------- Co-authored-by: Daniel Rabstejnek <rabstejnek@gmail.com>

rabstejnek added 4 commits December 8, 2023 10:25

Convert invitro exports

bc0bf27

Fix exporter datetime converter when datetime is None

3f9a857

Fix exports where there's no data

def6df1

DTXSID in export should be None instead of useless dict if missing

560f308

rabstejnek mentioned this pull request Dec 8, 2023

Invitro export rewrite #957

Closed

rabstejnek changed the title ~~Convert invitro exports~~ Invitro export rewrite Dec 8, 2023

rabstejnek requested a review from shapiromatron December 8, 2023 17:03

rabstejnek marked this pull request as ready for review December 8, 2023 17:07

shapiromatron self-assigned this Dec 11, 2023

add endpoint categories

f6b1449

shapiromatron approved these changes Jan 4, 2024

View reviewed changes

shapiromatron assigned rabstejnek and unassigned shapiromatron Jan 5, 2024

Remove bloated dsstox dict from exports in favor of dtxsid

a9f8d78

rabstejnek requested a review from shapiromatron January 5, 2024 16:26

shapiromatron approved these changes Jan 5, 2024

View reviewed changes

rabstejnek merged commit a8ead5a into exports-v2 Jan 5, 2024
3 checks passed

rabstejnek deleted the invitro-export-rewrite2 branch January 5, 2024 21:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invitro export rewrite #958

Invitro export rewrite #958

rabstejnek commented Dec 8, 2023 •

edited

Loading

rabstejnek commented Dec 8, 2023

shapiromatron left a comment

shapiromatron Jan 4, 2024

shapiromatron Jan 4, 2024

shapiromatron Jan 4, 2024

shapiromatron Jan 4, 2024

shapiromatron Jan 4, 2024

rabstejnek commented Jan 5, 2024

shapiromatron left a comment

Invitro export rewrite #958

Invitro export rewrite #958

Conversation

rabstejnek commented Dec 8, 2023 • edited Loading

rabstejnek commented Dec 8, 2023

shapiromatron left a comment

Choose a reason for hiding this comment

shapiromatron Jan 4, 2024

Choose a reason for hiding this comment

shapiromatron Jan 4, 2024

Choose a reason for hiding this comment

shapiromatron Jan 4, 2024

Choose a reason for hiding this comment

shapiromatron Jan 4, 2024

Choose a reason for hiding this comment

shapiromatron Jan 4, 2024

Choose a reason for hiding this comment

rabstejnek commented Jan 5, 2024

shapiromatron left a comment

Choose a reason for hiding this comment

rabstejnek commented Dec 8, 2023 •

edited

Loading