Skip to content
This repository was archived by the owner on Feb 6, 2024. It is now read-only.

Numerical problem with AU test #86

Closed
bqminh opened this issue Oct 18, 2018 · 2 comments
Closed

Numerical problem with AU test #86

bqminh opened this issue Oct 18, 2018 · 2 comments

Comments

@bqminh
Copy link
Member

bqminh commented Oct 18, 2018

Personal comm. with Shimodaira:

July-Sep-2018
Minh & Heiko:

Dear Shimo,

I am writing to you regarding a question about some extreme cases in the AU test. (My colleague Minh has written about this also mid of July, but seemingly you have been too busy to answer.)

The question is about the following part in your AU paper:

Screen Shot 2018-09-10 at 10.05.25 am.png

The problem is that the quantile function of the standard normal distribution

\Phi^{-1}(0) = -infinity and
\Phi^{-1}(1) = +infinity

So the question is, how do you handle the cases in CONSEL (or in your R package) if BP is either 0% or 100%?


25-Sep-2018
Shimodaira:

Here is my quick answer based on my memory. If you think further issues, I may look at the code.

In Consel and pvclust, there are two estimation methods are implemented: weighted least square (WLS) and maximum likelihood estimator (MLE). The above formula is for WLS. WLS is used for computing the initial value for the iterative computation of MLE.

For WLE, the zero or one bp values are simply ignored.

For MLE, the zero and one values are properly included in the estimation (fully used). However, if all the scales has zero or one values, mle will not make sense, so I simply set AU = 0 or 1.

Probably, there are fine tuning in the code so that there is a threshold of the number of zero or one values to decide if AU = 0 or 1 used or try estimation.

@bqminh
Copy link
Member Author

bqminh commented Oct 18, 2018

7-Sep-2017
Ed Susko:

I have attached a plot with the comparison of the AU p-values w/ and w/o consel. Since the p-values with iqtree/consel are comparable to raxml/consel it seems pretty clear that the issue is with the AU calculation w/in iqtree.

I can only guess about what might be going wrong. There are some trees that have very large p-values under AU but that are small even under SH. I would guess that the BPs that AU uses for these trees can be very small or even 0. AU uses these BPs through Phi**(-1)(1-BP). If BP is 0, Phi**(-1)(1-BP) is infinite, so maybe some adjustment for BP=0 is what causes consel results to differ.

The iqtree version of the AU test is much faster than iqtree/consel, so if you are able to make progress on this, I would be happy to hear about it.


A student and I have been looking at procedures for calculating confidence sets of trees. In the course of doing so we seem to have come across a potential problem with the calculation of AU p-values in IQTREE.

I have attached a plot for an example data set comparing p-values:

AU RAXML/CONSEL vs AU IQTREE - many more large pvalues for IQTREE
SH RAXML/CONSEL vs SH IQTREE - comparable
SH IQTREE vs AU IQTREE - many more large pvalues for AU IQTREE

As you may know, SH should be a conservative test so the fact that AU IQTREE gives many more large pvalues than SH and gives many more large pvalues than AU RAXML/CONSEL suggests that it is the AU IQTREE values that are problematic rather than the RAXML/CONSEL ones

I have attached the data file and tree file that I used. To obtain the p-values for RAXML/CONSEL I used

$ raxmlHPC -s tmp.infile -n out -m GTRGAMMA -fg -T2 -z eight-taxon.trees > tmp.out
$ makermt --puzzle RAxML_perSiteLLs.out > tmp.out
$ consel RAxML_perSiteLLs > tmp.out
$ catpv RAxML_perSiteLLs.pv > tmp.out

To obtain the p-values for IQTREE, I used

$ iqtree-omp -s tmp.infile -z eight-taxon.trees nt 1 -m GTR+G -n 0 -zb 10000 -au -wsl

Note that this is a large example. We were comparing all 10395 eight-taxon trees. But it does seem that there is a discrepancy between the way that AU RAXML/CONSEL is calculating p-values and the way that IQTREE does. Further, in this case, AU RAXML/CONSEL seems more likely correct. So I thought you might want to know.

Drop me a note if you make progress. The student will continue his work on the topic, so if a fix is available, it would be valuable to know about.

@bqminh
Copy link
Member Author

bqminh commented Oct 28, 2018

After a careful investigation, the conclusion is that this problem is because the weighted least square approach to estimate AU p-values is NOT accurate when bp-RELL are zero or close to zero. So IQ-Tree now switches to the maximum likelihood estimates of AU p-values. This is included in the version v1.6.8.

@bqminh bqminh closed this as completed Oct 28, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant