-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invitro export rewrite #958
Conversation
@shapiromatron The exports are complete except for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is really close!
We should fix how dtxsid is handled (to fix the prior error in implementation). Please take a look at my changes here to make sure you're ok with my revisions:
- f6b1449 add endpoint categories (19 hours ago) {Andy Shapiro}
I'll also create a card to look into some more aggressive optimizations in the future, but I think what you did here was perfectly appropriate and it should be simpler in the future to investigate larger changes after we have our comparison scripts online.
hawc/apps/invitro/exports.py
Outdated
self.queryset.first().assessment_id, | ||
study_ids, | ||
"invitro", | ||
def handle_dsstox(self, df: pd.DataFrame) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the way this was previously handled is definitely a bug. The TSV is rendered incorrectly; this should be modified to just show the dtxsid_id if a available, eg., modify to just show the DTXSID e.g., DTXSID7037717
I'll link to to possible fix below; we might as well keep the the exporter you created in the assessment b/c it may come in handy
) | ||
return df.drop(columns=dsstox_cols) | ||
|
||
def handle_dose_groups(self, df: pd.DataFrame) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow, this is impressive, and really complicated. I think keeping it as is is good for now, this seems like a conservative conversion from our initial approach. However, when I try it on a large dataset, it's slow. Not super slow, so it may be good enough.
I'll make a card in our backlog to explore more aggressive optimizations. It'll be easier to explore the impact of the optimizations once we have the jupyter notebooks you'll be building for comparing different approaches.
To document some exploration ideas, I got pretty far with this which ran really quickly:
df = pd.DataFrame(IVEndpointGroup.objects.filter(endpoint__assessment=123).order_by('endpoint', 'dose_group_id').values())
col_vals = ['dose','n','response','variance','difference_control','significant_control','cytotoxicity_observed']
df1 = pd.pivot(df.query('dose > 0'), values=col_vals,index=['endpoint_id'],columns=['dose_group_id'])
df1.columns = df1.columns.map(lambda el: f'{el[0].replace("_", " ").title()} {el[1]+1}')
df1.fillna("-")
doseRange[0], | ||
doseRange[1], | ||
number_doses, | ||
def handle_benchmarks(self, df: pd.DataFrame) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar approach to EndpointGroup could probably be applied here. Again, in the future, once we've got our comparisons scripts complete.
hawc/apps/invitro/exports.py
Outdated
"iv_chemical", | ||
"chemical", | ||
), | ||
DSSToxExport( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think DSSToxExport
can be removed, and we add dtxsid_id
to the IVChemicalExport
.
hawc/apps/invitro/exports.py
Outdated
.reset_index(drop=True) | ||
) | ||
|
||
def handle_dsstox(self, df: pd.DataFrame) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handle dsstox as above
Your changes look good! I agree about optimizing in future PRs, there's definitely some work to do there. As for the DTXSID changes, I kept the ![]() If these changes are fine with you then I'm fine with merging! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok to merge!
* Epi export rewrite (#911) * preliminary epi rewrite * changes * changes * changes * fix * added back something accidentally deleted * moved code * fix test * remove old stuff * cleanup: * use vectorized timestamp conversion * minor formatting --------- Co-authored-by: Andy Shapiro <shapiromatron@gmail.com> * rewrite riskofbias exports (#921) * preliminary epi rewrite * changes * changes * changes * fix * added back something accidentally deleted * moved code * fix test * remove old stuff * cleanup: * rewrite riskofbias exports * update naming for domain and metric --------- Co-authored-by: Daniel Rabstejnek <rabstejnek@gmail.com> * Epimeta export rewrite (#922) * preliminary epi rewrite * changes * changes * changes * fix * added back something accidentally deleted * moved code * fix test * remove old stuff * cleanup: * epimeta export rewrite * remove obsolete code * changes * merge fix * update admin site to browse data pivot by evidence type --------- Co-authored-by: Andy Shapiro <shapiromatron@gmail.com> * Invitro export rewrite (#958) * Convert invitro exports * Fix exporter datetime converter when datetime is None * Fix exports where there's no data * DTXSID in export should be None instead of useless dict if missing * add endpoint categories * Remove bloated dsstox dict from exports in favor of dtxsid --------- Co-authored-by: Andy Shapiro <shapiromatron@gmail.com> * Animal export rewrite (#961) * Made animal model exports, began configuring exporters * first export largely done, second export mostly done * Some cleanups, finished fourth animal exporter * Changes after more testing * Fix tests * Remove remaining flat_complete_* methods * Cleanup * Update sql_display to accept dict * Add TODO to comment to easily find it * Rename exporters to match the flat file exporter class names * Add safeguard for qs accessing and fix groupby side effects (#982) * two updates * move to correct place * exports-v2 - Updates from image review (#1058) * remove duplicate rename calls * add ci calculation * update treatment period calculation * add back space; fix caused too many false positives in comparison * properly handle a 500 response from the server in a data pivot * fix epi logic from plot review * fix invitro export when category id is null; remove category id from export * add unique rob columns * use same method for categories * refactor into a reusable method * remove pytest warning "Marks applied to fixtures have no effect" * refactor; write tests * add tests * rewrite animal export test using same pattern * add udf to openapi * remove extra whitespace (will change visuals; known change) --------- Co-authored-by: Daniel Rabstejnek <rabstejnek@gmail.com>
Update invitro exports to use ORM values instead of expensive serializers.