Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restructure code #1

Open
14 of 18 tasks
svteichman opened this issue Jan 24, 2025 · 0 comments
Open
14 of 18 tasks

restructure code #1

svteichman opened this issue Jan 24, 2025 · 0 comments
Assignees

Comments

@svteichman
Copy link
Collaborator

svteichman commented Jan 24, 2025

make main function fastEmuFit():

  • takes over functionality of fastEmuTest()
  • same idea as emuFit(), where can run it just to fit, or to also run score tests (for all taxa or for a set)
  • unlike emuFit(), it needs to start by determining estimand by generating a reference set (if not given)
  • ways to run score tests are to run them all serially after estimation with a single call to fastEmuFit(), or start by estimation with fastEmuFit() then input that fastEmuFit output object into other calls that run specific tests in parallel
  • when running score tests, for each test a new dataset is made with the target taxon and reference set (and any other taxa needed to include all samples) and then emuFit() is called, and the robust score test statistic and p-value are saved in the appropriate spot
  • borrow functionality from emuFit() so this will work with a phyloseq or TreeSummarizedExperiment object
  • align returned object with that of emuFit(), additionally including reference set and set of included categories for each - [ ] score test that is run
  • decide how to choose reference set when p > 2 (and add argument to main function for this!)

make helper function chooseRefSet():

  • called by fastEmuFit()
  • takes in emuFit object with smoothed median constraint and reference set size, and returns set of given size of taxa that have closest estimated LFD's to the smoothed median LFD over all taxa
  • update this so that there is a way to choose the reference set when p > 2 (add argument for this)

make helper functions chooseRefSetSS() and chooseRefSetThin

  • called by fastEmuFit() with option ss or thin
  • take in X, Y, reference set size, and proportion of Y data to go into reference selection set, returns reference set as well as X and Y split into reference selection set and inference set (X, Y_ref, Y_inf for thinning, X_ref, Y_ref, X_inf, Y_inf for sample splitting)

deprecate fastEmuTest()

  • make it so this function can still be run (if someone has an older analysis) but it has a note that the maintained function is fastEmuFit()

add testing for new functions

  • copy all tests (and update as appropriate) from fastEmuTest() for fastEmuFit(), and add extra ones that are needed based on other functionality
  • update fastEmuTest() tests to deal with deprecation warning

align error messages:

  • figure out how to make warnings/messages (with verbose option) aligned between fastEmuFit and emuFit

add dependency testing:

  • some way to automatically run github actions for fastEmu when changes are made to radEmu
@svteichman svteichman self-assigned this Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant