-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: The number of observations must be larger than the number of variables. #68
Comments
Hi ryotag, Thank you for reporting this. I'll look into it. Cheers, |
Thank you for your message.
However, when I tried exactly the same thing as in my last comment (using the same command and the same file), I received a different error.
Interestingly, when I ran NanoPlot with a
It is totally fine because I can work with the original file (without downsampling) and quality-filtered file, but I'm very confused with those results... |
Interesting, and confusing. Thanks for the detailed problem description! |
I think I figured it out, will try to provide a solution on Monday. |
I've just pushed an update to the submodule nanoplotter (v0.39.1). I believe that could solve your issue after updating, but can you please confirm? |
I upgraded NanoPlot with a command |
Thanks for the feedback! |
Hi @wdecoster ! Hope you are well :) Thank you for this great tool. I have added it to the nf-core/nanoseq and nf-core/viralrecon pipelines and it is very useful. I am getting the same error as described in the title of this issue with the latest Biocontainer v1.32.1. I also tried running the same command via a local Conda install but got the same error. I was tempted to re-open this issue but I will leave that to your discretion just in case I am missing something obvious. I installed the name: nanoplot-1.32.1
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- conda-forge::seaborn=0.10.1
- bioconda::nanoplot=1.32.1 I have attached the raw fastq file here and the command I used was: NanoPlot --fastq barcode87.fastq.gz I noticed you have had an open issue on Bioconda to update to the latest version so apologies if this is already fixed. Please let me know if you need anything else from me. Thanks in advance! The full 2021-03-03 22:43:31,141 NanoPlot 1.32.1 started with arguments Namespace(N50=False, alength=False, bam=None, barcoded=False, color='#4CB391', colormap='Greens', cram=Non
e, downsample=None, dpi=100, drop_outliers=False, fasta=None, fastq=['barcode87.fastq.gz'], fastq_minimal=None, fastq_rich=None, feather=None, font_scale=1, format='png'
, hide_stats=False, huge=False, listcolormaps=False, listcolors=False, loglength=False, maxlength=None, minlength=None, minqual=None, no_N50=False, no_supplementary=Fals
e, outdir='.', path='./', percentqual=False, pickle=None, plots=['kde', 'dot'], prefix='', raw=False, readtype='1D', runtime_until=None, store=False, summary=None, threa
ds=4, title=None, tsv_stats=False, ubam=None, verbose=False)
2021-03-03 22:43:31,141 Python version is: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27) [GCC 9.3.0]
2021-03-03 22:43:31,141 NanoPlot: valid output format png
2021-03-03 22:43:31,149 Nanoget: Starting to collect statistics from plain fastq file.
2021-03-03 22:43:31,151 Nanoget: Decompressing gzipped fastq barcode87.fastq.gz
2021-03-03 22:43:31,447 Reduced DataFrame memory usage from 4.57763671875e-05Mb to 2.6702880859375e-05Mb
2021-03-03 22:43:31,455 Nanoget: Gathered all metrics of 2 reads
2021-03-03 22:43:31,466 Calculated statistics
2021-03-03 22:43:31,467 Using sequenced read lengths for plotting.
2021-03-03 22:43:31,468 NanoPlot: Valid color #4CB391.
2021-03-03 22:43:31,475 NanoPlot: Valid colormap Greens.
2021-03-03 22:43:31,476 NanoPlot: Creating length plots for Read length.
2021-03-03 22:43:31,476 NanoPlot: Using 2 reads maximum of 609bp.
2021-03-03 22:43:37,617 Nanoplotter: orca not found, not creating static image of html. See https://github.com/plotly/orca
2021-03-03 22:43:37,617 Image generation requires the psutil package.
Install using pip:
$ pip install psutil
Install using conda:
$ conda install psutil
Traceback (most recent call last):
File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/nanoplotter/plot.py", line 60, in save_static
pio.write_image(self.fig, self.path.replace('html', 'png'))
File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/plotly/io/_kaleido.py", line 245, in write_image
img_data = to_image(
File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/plotly/io/_kaleido.py", line 103, in to_image
return to_image_orca(
File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/plotly/io/_orca.py", line 1535, in to_image
ensure_server()
File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/plotly/io/_orca.py", line 1361, in ensure_server
raise ValueError(
ValueError: Image generation requires the psutil package.
Install using pip:
$ pip install psutil
Install using conda:
$ conda install psutil
2021-03-03 22:43:37,914 Created length plots
2021-03-03 22:43:37,915 NanoPlot: Creating Read lengths vs Average read quality plots using statistics from 2 reads.
2021-03-03 22:43:38,494 The number of observations must be larger than the number of variables.
Traceback (most recent call last):
File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/nanoplot/NanoPlot.py", line 101, in main
plots = make_plots(datadf, settings)
File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/nanoplot/NanoPlot.py", line 160, in make_plots
nanoplotter.scatter(
File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/nanoplotter/nanoplotter_main.py", line 193, in scatter
plot = sns.jointplot(
File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/seaborn/axisgrid.py", line 2313, in jointplot
grid.plot_joint(kdeplot, **joint_kws)
File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/seaborn/axisgrid.py", line 1777, in plot_joint
func(self.x, self.y, **kwargs)
File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/seaborn/distributions.py", line 696, in kdeplot
ax = _bivariate_kdeplot(x, y, shade, shade_lowest,
File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/seaborn/distributions.py", line 402, in _bivariate_kdeplot
xx, yy, z = _statsmodels_bivariate_kde(x, y, bw, gridsize, cut, clip)
File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/seaborn/distributions.py", line 474, in _statsmodels_bivariate_kde
kde = smnp.KDEMultivariate([x, y], "cc", bw)
File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/statsmodels/nonparametric/kernel_density.py", line 108, in __init__
raise ValueError("The number of observations must be larger " \
ValueError: The number of observations must be larger than the number of variables. |
Hi, thank you for the comprehensive bug report! I expect the cause of this error to be that you are just plotting two reads. But I agree it shouldn't crash and should rather warn you about it. And yes, updating bioconda appears to be a problem :-( but PyPI has the latest version. |
Great! Thanks for the quick fix. Warning sounds great. I only observed this issue when I set Will look out for the new release on Bioconda as we are only pulling Biocontainers for the entire pipeline! Thanks again. |
Hi,
I tried to run NanoPlot using the command below,
NanoPlot --fastq MD0003_01A00_N1_20180215_GA10000_FAH47873.fastq --loglength --downsample 1000
and I faced an error shown in the following log file.
Before running NanoPlot, I upgraded scipy using the command
python3 -m pip install scipy -U
and got the following.Thank you,
The text was updated successfully, but these errors were encountered: