-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HicMatrix generated cool file not supported - possible solution #66
Comments
Hi Vittorio, where ::/resolutions/10000 is the resolution you want to use |
Hi @Mestizia, Thanks for reporting this issue, and for your thorough investigation! To answer your question, We indeed rely on the bin-size information to handle conversion between base-pairs and #diagonals, therefore chromosight is not expected to work with cool files that have a variable bin-size. We should either:
If you feel like it, you are very welcome to propose a pull request for this and we will happily review it! Otherwise, we can handle this when time allows. |
Thanks for the prompt reply to both of you.
Since I can determine the binsize from my previous work, my temporary solution would be to set the binsize to that specific value. From what I can tell, the bin size is either used as a denominator and the resulting value ends up being small enough that the tool automatically resort to defaults of 1 or as a numerator when it comes to "scanning distance" and the number ends up being large. Computationally, this is not a problem on my end and I don't need heuristics to speed it up (smaller scanning distance, if I am interpreting it correctly). Do you think this temporary solution would introduce artifacts in the data? Once this is clarified, I am up for making a pull request (with slightly cleaner solutions). Generally I would commend you and the team for the clean code. It was quite easy to navigate and debug on my end! I could also see this working better if cooler itself was able to tell when a variable binsize cool file is set to fixed binsize. I briefly looked in their repository but I couldn't find an immediate solution. Cheers, |
If you use cooler zoomify on a file with variable bin size, the resulting bin sizes are still variable, it only pools them by factor 2. What we mean by "fixed" is that each bin represents a segment of the same length on the chromosome. If your cool file has variable bin size, e.g. 1 bin = 1 restriction fragment, even if you zoomify it (1 bin = 2, 4, 8 ... restriction fragments), there would still not be a constant mapping between #bins and #basepairs. Thus many of our assumptions would fail and the calls would be unreliable. This is why we do not support variable bin sizes. You can of course use this hack, if you know your matrix has a fixed resolution, but the clean solution is to build your matrix on a fixed bin size from the start, and it should be reflected in the metadata of the cool file. |
@cmdoret Thank you for the clarification. I can look at the distribution of my bin_size lengths to verify that the spread is not significant. I suggest formally including in a step to check for variable bin size and explain the reasoning in chromosight. With that being said, you can close this issue. Many thanks to both of you. On the side, I will also try running chromosight with a fixed binsize matrix, I will write a report of how similar-different it ends up being in the context of the genome I am working with. |
Hi,
I am trying to compare the results I got from hicExplorer TAD prediction to chromosight. However, I am having trouble using my matrix. I exported the matrix to .cool as described in the instructions. However, i get the error:
I downloaded the example.cool file and ran it in parallel for debugging
I did some debugging on my end. And I got to the step where in the keep_distance function within contact_map.py.
Here, self.max_dist is None. Therefore,
mat_max_dist = self.matrix.shape[0]+self.largest_kernel
(10607 + 3)This value gets moved around (goes as max_dist in
preprocess.detrend
) but at no pointself.max_dist
is set to a different value. At the end (the error message) is becausematrix.max_dist
is never changed from None and it gets fed tonumpy.tril
that throws the error.I checked my cool file with cooler info and the main difference compared to example.cool seems to be:
versus:
For the example.cool file. I am not 100% sure that is the origin but that is my best guess atm.
In order to make the tool work, I changed
self.max_dist = self.keep_distance
incontats_map.detrend
and it seems to work now. However, I am not sure what is the effect of this change on the overall results as I am not familiar with the role of max_dist in the overall tool. Do you think this workaround is likely to affect the final results in a major way?Cheers,
Vittorio
The text was updated successfully, but these errors were encountered: