Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: The number of observations must be larger than the number of variables. #68

Closed
ryotag opened this issue May 31, 2018 · 10 comments
Assignees
Labels

Comments

@ryotag
Copy link

ryotag commented May 31, 2018

Hi,

I tried to run NanoPlot using the command below,

NanoPlot --fastq MD0003_01A00_N1_20180215_GA10000_FAH47873.fastq --loglength --downsample 1000

and I faced an error shown in the following log file.

2018-05-30 18:25:30,995 NanoPlot 1.13.0 started with arguments Namespace(N50=False, alength=False, bam=None, barcoded=False, color='#4CB391', cram=None, downsample=1000, drop_outliers=False, fasta=None, fastq=['MD0003_01A00_N1_20180215_GA10000_FAH47873.fastq'], fastq_minimal=None, fastq_rich=None, format='png', listcolors=False, loglength=True, maxlength=None, minlength=None, minqual=None, no_N50=False, outdir='.', percentqual=False, pickle=None, plots=['kde', 'dot'], prefix='', raw=False, readtype='1D', store=False, summary=None, threads=4, title=None, verbose=False)
2018-05-30 18:25:30,995 Python version is: 3.5.2 (default, Nov 23 2017, 16:37:01)  [GCC 5.4.0 20160609]
2018-05-30 18:25:30,995 Nanoplotter: valid output format png
2018-05-30 18:25:31,002 Nanoget: Starting to collect statistics from plain fastq file.
2018-05-30 19:08:01,167 Nanoget: Gathered all metrics of 1303066 reads
2018-05-30 19:08:01,844 Calculated statistics
2018-05-30 19:08:01,844 Using sequenced read lengths for plotting.
2018-05-30 19:08:01,877 Using Log10 scaled read lengths.
2018-05-30 19:08:01,877 Downsampling the dataset from 1303066 to 1000 reads
2018-05-30 19:08:01,908 Processed the reads, optionally filtered. 1000 reads left
2018-05-30 19:08:01,918 Calculated statistics
2018-05-30 19:08:01,918 Nanoplotter: Valid color #4CB391.
2018-05-30 19:08:01,918 Nanoplotter: Creating length plots for Read length.
2018-05-30 19:08:01,918 Nanoplotter: Using 1000 reads maximum of 38857bp.
2018-05-30 19:08:03,205 Created length plots
2018-05-30 19:08:03,206 Nanoplotter: Creating Read lengths vs Average read quality plots using statistics from 1000 reads.
2018-05-30 19:08:03,699 The number of observations must be larger than the number of variables.
Traceback (most recent call last):
  File "/home/matsudalab/.local/lib/python3.5/site-packages/nanoplot/NanoPlot.py", line 87, in main
    plots = make_plots(datadf, settings)
  File "/home/matsudalab/.local/lib/python3.5/site-packages/nanoplot/NanoPlot.py", line 382, in make_plots
    title=settings["title"])
  File "/home/matsudalab/.local/lib/python3.5/site-packages/nanoplotter/nanoplotter.py", line 206, in scatter
    size=10)
  File "/home/matsudalab/.local/lib/python3.5/site-packages/seaborn/axisgrid.py", line 2271, in jointplot
    grid.plot_joint(kdeplot, **joint_kws)
  File "/home/matsudalab/.local/lib/python3.5/site-packages/seaborn/axisgrid.py", line 1755, in plot_joint
    func(self.x, self.y, **kwargs)
  File "/home/matsudalab/.local/lib/python3.5/site-packages/seaborn/distributions.py", line 653, in kdeplot
    cbar, cbar_ax, cbar_kws, ax, **kwargs)
  File "/home/matsudalab/.local/lib/python3.5/site-packages/seaborn/distributions.py", line 383, in _bivariate_kdeplot
    xx, yy, z = _statsmodels_bivariate_kde(x, y, bw, gridsize, cut, clip)
  File "/home/matsudalab/.local/lib/python3.5/site-packages/seaborn/distributions.py", line 433, in _statsmodels_bivariate_kde
    kde = smnp.KDEMultivariate([x, y], "cc", bw)
  File "/home/matsudalab/.local/lib/python3.5/site-packages/statsmodels/nonparametric/kernel_density.py", line 111, in __init__
    raise ValueError("The number of observations must be larger " \
ValueError: The number of observations must be larger than the number of variables.

Before running NanoPlot, I upgraded scipy using the command python3 -m pip install scipy -U and got the following.

Installing collected packages: numpy, scipy
Successfully installed numpy-1.14.3 scipy-1.1.0

Thank you,

@wdecoster wdecoster self-assigned this May 31, 2018
@wdecoster
Copy link
Owner

Hi ryotag,

Thank you for reporting this. I'll look into it.
Could you check if removing --downsample 1000 reproduces the same error?

Cheers,
Wouter

@wdecoster wdecoster added the bug label May 31, 2018
@ryotag
Copy link
Author

ryotag commented Jun 1, 2018

Thank you for your message.
I tried it without --downsample 1000, and could successfully finish NanoPlot.

2018-05-31 18:35:09,060 NanoPlot 1.13.0 started with arguments Namespace(N50=False, alength=False, bam=None, barcoded=False, color='#4CB391', cram=None, downsample=None, drop_outliers=False, fasta=None, fastq=['MD0003_01A00_N1_20180215_GA10000_FAH47873.fastq.gz'], fastq_minimal=None, fastq_rich=None, format='png', listcolors=False, loglength=True, maxlength=None, minlength=None, minqual=None, no_N50=False, outdir='.', percentqual=False, pickle=None, plots=['kde', 'dot'], prefix='', raw=False, readtype='1D', store=False, summary=None, threads=4, title=None, verbose=False)
2018-05-31 18:35:09,060 Python version is: 3.5.2 (default, Nov 23 2017, 16:37:01)  [GCC 5.4.0 20160609]
2018-05-31 18:35:09,061 Nanoplotter: valid output format png
2018-05-31 18:35:09,068 Nanoget: Starting to collect statistics from plain fastq file.
2018-05-31 18:35:09,068 Nanoget: Decompressing gzipped fastq MD0003_01A00_N1_20180215_GA10000_FAH47873.fastq.gz
2018-05-31 19:20:31,142 Nanoget: Gathered all metrics of 1303066 reads
2018-05-31 19:20:31,823 Calculated statistics
2018-05-31 19:20:31,823 Using sequenced read lengths for plotting.
2018-05-31 19:20:31,855 Using Log10 scaled read lengths.
2018-05-31 19:20:31,855 Processed the reads, optionally filtered. 1303066 reads left
2018-05-31 19:20:31,855 Nanoplotter: Valid color #4CB391.
2018-05-31 19:20:31,855 Nanoplotter: Creating length plots for Read length.
2018-05-31 19:20:31,860 Nanoplotter: Using 1303066 reads maximum of 349163bp.
2018-05-31 19:20:43,467 Created length plots
2018-05-31 19:20:43,467 Nanoplotter: Creating Read lengths vs Average read quality plots using statistics from 1303066 reads.
2018-05-31 19:20:52,969 Created LengthvsQual plot
2018-05-31 19:20:52,969 Writing html report.
2018-05-31 19:21:03,245 Finished!

However, when I tried exactly the same thing as in my last comment (using the same command and the same file), I received a different error.

2018-05-31 20:47:20,816 NanoPlot 1.13.0 started with arguments Namespace(N50=False, alength=False, bam=None, barcoded=False, color='#4CB391', cram=None, downsample=1000, drop_outliers=False, fasta=None, fastq=['MD0003_01A00_N1_20180215_GA10000_FAH47873.fastq'], fastq_minimal=None, fastq_rich=None, format='png', listcolors=False, loglength=True, maxlength=None, minlength=None, minqual=None, no_N50=False, outdir='.', percentqual=False, pickle=None, plots=['kde', 'dot'], prefix='', raw=False, readtype='1D', store=False, summary=None, threads=4, title=None, verbose=False)
2018-05-31 20:47:20,816 Python version is: 3.5.2 (default, Nov 23 2017, 16:37:01)  [GCC 5.4.0 20160609]
2018-05-31 20:47:20,817 Nanoplotter: valid output format png
2018-05-31 20:47:20,846 Nanoget: Starting to collect statistics from plain fastq file.
2018-05-31 21:30:14,593 Nanoget: Gathered all metrics of 1303066 reads
2018-05-31 21:30:15,239 Calculated statistics
2018-05-31 21:30:15,239 Using sequenced read lengths for plotting.
2018-05-31 21:30:15,272 Using Log10 scaled read lengths.
2018-05-31 21:30:15,272 Downsampling the dataset from 1303066 to 1000 reads
2018-05-31 21:30:15,308 Processed the reads, optionally filtered. 1000 reads left
2018-05-31 21:30:15,319 Calculated statistics
2018-05-31 21:30:15,319 Nanoplotter: Valid color #4CB391.
2018-05-31 21:30:15,319 Nanoplotter: Creating length plots for Read length.
2018-05-31 21:30:15,319 Nanoplotter: Using 1000 reads maximum of 97993bp.
2018-05-31 21:30:17,444 Created length plots
2018-05-31 21:30:17,444 Nanoplotter: Creating Read lengths vs Average read quality plots using statistics from 1000 reads.
2018-05-31 21:30:17,943 'None of [[410 194 790 426  95 930  63 594 383 138  23 750 180 474  33 557 885 822\n 606 523 127 350 834 480 234  28 865 229  17 999 583 504 895 288 143 114\n 893 492 598 667 214 622 360 591 483 666 245 906 222 497 835 243 962 307\n 505 449  36 715 550 776 377  13 561 498 626 565 368 736 182  80 212 708\n 241 265 255 389 226 153 479 253 831 117 200 941 126 386 977  47 988 867\n 447 920 897 499 385 191 600 643 503 734 527 269 624 134 352 367 660 209\n 570 580 342 638 862 729 818 121 205  61 992 537 892  32 177  96 703 967\n 100 546 502 628 547 482 599 296 244  73 975 247  11 163 933 193  38 506\n 880 336 646 395 526 142 945 309 344 477 856 672 727 429  34 273 730 617\n 133 850 158 652 101 295 321 407 438 398 218 116 346 614 373 783 326 645\n 974 148 874 658 806 445 371 625 125 942  67 707 876 595 162 851 333 484\n  68 230 528 223 830 675 357 405 564 662 889 873 500 105 972 699 724  16\n 712 965  91 119  45 588 511 970 680 418 872 918 518 749 719 789 826 467\n 276 905 739 541 516 693 875 304  55  52 417 412 224 391 349 310 439 829\n 509 531 328 204 935 878 495 676  64  30 137 613 293 263 746 901 744 124\n  72 786 904 110 248 949 236 877 791 315 940 555 286 916 166 820  77 679\n 345 147 468 860 381 159 384 848 387 704 775 369 512 677 208 632 934 618\n 802 176 706 466 612 952 489 548 808 308 590 890 723 951 674 478 168 771\n 318 198 278 757 154 332 508 589 213 692 810 616 943 849 277 858  62 390\n 987 682 630 839 796  29 883  21 701 899 414 254 836 493 800 106 656 264\n 686 814 758 165 995 644 540 828 787 415 846 748 563 167 737 409 559 957\n 803   4 330 341 990 250 726 640 760 297 710 348 361 160 870 759 408 844\n 476 788 325  35 909 663   2 879 639 109 801 556 130 717 781 841 991 359\n 798 473 832 542 141 281 567 206 362 201 714 272 864 399 823 465 442 602\n 596  14 123 475 238 683 894 146 322 413 190 985 181 552 960 132  83 164\n 195 313  86 964 370 434  39 980 145 303 969 631 379 572 896 462 246 755\n 635 422 320 579 732 799 259 869 735 446 420 586 156 185 782 285 891 558\n 767 837 838 425  98 702 173 959   3 544 840 539  88 577 258 549 968 711\n 553 299 532 122 514 128 491 457 764 721 257 903 298 375 522 261 713 973\n 578  66  93 705 331 924 139  69 785 463  19 592 507 661 108 536 813 795\n  99 738 649 560 144 324 179   5 866 981 275 922 487 929 543 459 455 311\n 335 337 282 107 984 944 510 436 857 871 270 793 641  71 219 233 262 915\n 842 317  84  85 956 948 225 697 978  92  31  89 456 745 573 647 215 926\n  49 584 152 424 183 186 431  97 237 809 301 113 440 779 256  57 151 538\n  65 287 210 472 728 520 925 396 111  26 312 654 868 551 513 524 709  60\n 914 611 187 566 609 490 184 769  43 363 900 770 366 525 620 659 103 938\n 136 300 339 211 316 766 716 664  70 535 338 921 946 104 249 305  79 907\n 665 157 742 898 847 402 530 794 403 852 812 694  75  94 695 917 887 673\n  51 725 642 197 845 421 953 576 569  50 271 411 188 774 235 797 404 912\n 604  22 931 481 825 700 207 884 327 433  76 279 650 824 443 574 488 372\n  48 976 998  40 756 228 947 989 670 819 923 192 765 534 807 993  12 688\n 979 633 698 908 378 687 461  81 178 242 280 861 722 668 292 419 202 356\n 718 753 441 294 636 521 131 533 655 954 927 340 651 582 804 400 376 251\n 743 290  82 485 364 231 833 777 353 252 768  10 172 216 997 392 203   8\n 886  25 761 220 268  78 118 597 354 450 380 454 189 129 696 494  74  27\n 428 607 971 966  53  58 690 314 423 936 629 653 681 881 169  24 821 603\n 684 284 691 471 882 621 453 581 430 939 140 982 608 365 827 239 452 752\n 115  15 910 416 545 470 266 217 911 575 515  59 460 859 671 135 751 486\n   0 747 406 888  20 637 855 432 983  37 937 343 358 805 517 283 554 464\n 469 585 928   7 102 772 571 444 605 733 720 657 961 634 496 260 394 627\n 171 435 623 329 501 754 351 950 741 610 593 388 740  56  42 587 347 784\n  18  44 854 762 274 780 397 863 451 843 196 149 562  54 731 619 199 401\n  46 319 811 427 685 816 669 150 815 963 955 221 601   9  41 382 568 986\n 240 615  90 170 306 289 919 529 291 958 393 763 458 902 853 267 175 112\n 448 689 773 334 227 232 519 996 174 437 932 155 994 120 817 778 323 374\n 161 302 648 678   1  87   6 355 913 792]] are in the [index]'
Traceback (most recent call last):
  File "/home/matsudalab/.local/lib/python3.5/site-packages/nanoplot/NanoPlot.py", line 87, in main
    plots = make_plots(datadf, settings)
  File "/home/matsudalab/.local/lib/python3.5/site-packages/nanoplot/NanoPlot.py", line 382, in make_plots
    title=settings["title"])
  File "/home/matsudalab/.local/lib/python3.5/site-packages/nanoplotter/nanoplotter.py", line 196, in scatter
    x=x[idx],
  File "/home/matsudalab/.local/lib/python3.5/site-packages/pandas/core/series.py", line 809, in __getitem__
    return self._get_with(key)
  File "/home/matsudalab/.local/lib/python3.5/site-packages/pandas/core/series.py", line 841, in _get_with
    return self.loc[key]
  File "/home/matsudalab/.local/lib/python3.5/site-packages/pandas/core/indexing.py", line 1478, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/home/matsudalab/.local/lib/python3.5/site-packages/pandas/core/indexing.py", line 1901, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "/home/matsudalab/.local/lib/python3.5/site-packages/pandas/core/indexing.py", line 1143, in _getitem_iterable
    self._validate_read_indexer(key, indexer, axis)
  File "/home/matsudalab/.local/lib/python3.5/site-packages/pandas/core/indexing.py", line 1206, in _validate_read_indexer
    key=key, axis=self.obj._get_axis_name(axis)))
KeyError: 'None of [[410 194 790 426  95 930  63 594 383 138  23 750 180 474  33 557 885 822\n 606 523 127 350 834 480 234  28 865 229  17 999 583 504 895 288 143 114\n 893 492 598 667 214 622 360 591 483 666 245 906 222 497 835 243 962 307\n 505 449  36 715 550 776 377  13 561 498 626 565 368 736 182  80 212 708\n 241 265 255 389 226 153 479 253 831 117 200 941 126 386 977  47 988 867\n 447 920 897 499 385 191 600 643 503 734 527 269 624 134 352 367 660 209\n 570 580 342 638 862 729 818 121 205  61 992 537 892  32 177  96 703 967\n 100 546 502 628 547 482 599 296 244  73 975 247  11 163 933 193  38 506\n 880 336 646 395 526 142 945 309 344 477 856 672 727 429  34 273 730 617\n 133 850 158 652 101 295 321 407 438 398 218 116 346 614 373 783 326 645\n 974 148 874 658 806 445 371 625 125 942  67 707 876 595 162 851 333 484\n  68 230 528 223 830 675 357 405 564 662 889 873 500 105 972 699 724  16\n 712 965  91 119  45 588 511 970 680 418 872 918 518 749 719 789 826 467\n 276 905 739 541 516 693 875 304  55  52 417 412 224 391 349 310 439 829\n 509 531 328 204 935 878 495 676  64  30 137 613 293 263 746 901 744 124\n  72 786 904 110 248 949 236 877 791 315 940 555 286 916 166 820  77 679\n 345 147 468 860 381 159 384 848 387 704 775 369 512 677 208 632 934 618\n 802 176 706 466 612 952 489 548 808 308 590 890 723 951 674 478 168 771\n 318 198 278 757 154 332 508 589 213 692 810 616 943 849 277 858  62 390\n 987 682 630 839 796  29 883  21 701 899 414 254 836 493 800 106 656 264\n 686 814 758 165 995 644 540 828 787 415 846 748 563 167 737 409 559 957\n 803   4 330 341 990 250 726 640 760 297 710 348 361 160 870 759 408 844\n 476 788 325  35 909 663   2 879 639 109 801 556 130 717 781 841 991 359\n 798 473 832 542 141 281 567 206 362 201 714 272 864 399 823 465 442 602\n 596  14 123 475 238 683 894 146 322 413 190 985 181 552 960 132  83 164\n 195 313  86 964 370 434  39 980 145 303 969 631 379 572 896 462 246 755\n 635 422 320 579 732 799 259 869 735 446 420 586 156 185 782 285 891 558\n 767 837 838 425  98 702 173 959   3 544 840 539  88 577 258 549 968 711\n 553 299 532 122 514 128 491 457 764 721 257 903 298 375 522 261 713 973\n 578  66  93 705 331 924 139  69 785 463  19 592 507 661 108 536 813 795\n  99 738 649 560 144 324 179   5 866 981 275 922 487 929 543 459 455 311\n 335 337 282 107 984 944 510 436 857 871 270 793 641  71 219 233 262 915\n 842 317  84  85 956 948 225 697 978  92  31  89 456 745 573 647 215 926\n  49 584 152 424 183 186 431  97 237 809 301 113 440 779 256  57 151 538\n  65 287 210 472 728 520 925 396 111  26 312 654 868 551 513 524 709  60\n 914 611 187 566 609 490 184 769  43 363 900 770 366 525 620 659 103 938\n 136 300 339 211 316 766 716 664  70 535 338 921 946 104 249 305  79 907\n 665 157 742 898 847 402 530 794 403 852 812 694  75  94 695 917 887 673\n  51 725 642 197 845 421 953 576 569  50 271 411 188 774 235 797 404 912\n 604  22 931 481 825 700 207 884 327 433  76 279 650 824 443 574 488 372\n  48 976 998  40 756 228 947 989 670 819 923 192 765 534 807 993  12 688\n 979 633 698 908 378 687 461  81 178 242 280 861 722 668 292 419 202 356\n 718 753 441 294 636 521 131 533 655 954 927 340 651 582 804 400 376 251\n 743 290  82 485 364 231 833 777 353 252 768  10 172 216 997 392 203   8\n 886  25 761 220 268  78 118 597 354 450 380 454 189 129 696 494  74  27\n 428 607 971 966  53  58 690 314 423 936 629 653 681 881 169  24 821 603\n 684 284 691 471 882 621 453 581 430 939 140 982 608 365 827 239 452 752\n 115  15 910 416 545 470 266 217 911 575 515  59 460 859 671 135 751 486\n   0 747 406 888  20 637 855 432 983  37 937 343 358 805 517 283 554 464\n 469 585 928   7 102 772 571 444 605 733 720 657 961 634 496 260 394 627\n 171 435 623 329 501 754 351 950 741 610 593 388 740  56  42 587 347 784\n  18  44 854 762 274 780 397 863 451 843 196 149 562  54 731 619 199 401\n  46 319 811 427 685 816 669 150 815 963 955 221 601   9  41 382 568 986\n 240 615  90 170 306 289 919 529 291 958 393 763 458 902 853 267 175 112\n 448 689 773 334 227 232 519 996 174 437 932 155 994 120 817 778 323 374\n 161 302 648 678   1  87   6 355 913 792]] are in the [index]'

Interestingly, when I ran NanoPlot with a --downsample 1000 option using a quality-filtered file as an input, I could successfully finish NanoPlot.

2018-05-31 22:28:55,321 NanoPlot 1.13.0 started with arguments Namespace(N50=False, alength=False, bam=None, barcoded=False, color='#4CB391', cram=None, downsample=1000, drop_outliers=False, fasta=None, fastq=['MD0003filtered.fastq'], fastq_minimal=None, fastq_rich=None, format='png', listcolors=False, loglength=True, maxlength=None, minlength=None, minqual=None, no_N50=False, outdir='.', percentqual=False, pickle=None, plots=['kde', 'dot'], prefix='', raw=False, readtype='1D', store=False, summary=None, threads=4, title=None, verbose=False)
2018-05-31 22:28:55,321 Python version is: 3.5.2 (default, Nov 23 2017, 16:37:01)  [GCC 5.4.0 20160609]
2018-05-31 22:28:55,321 Nanoplotter: valid output format png
2018-05-31 22:28:55,328 Nanoget: Starting to collect statistics from plain fastq file.
2018-05-31 22:30:47,500 Nanoget: Gathered all metrics of 20979 reads
2018-05-31 22:30:47,538 Calculated statistics
2018-05-31 22:30:47,538 Using sequenced read lengths for plotting.
2018-05-31 22:30:47,540 Using Log10 scaled read lengths.
2018-05-31 22:30:47,541 Downsampling the dataset from 20979 to 1000 reads
2018-05-31 22:30:47,543 Processed the reads, optionally filtered. 1000 reads left
2018-05-31 22:30:47,557 Calculated statistics
2018-05-31 22:30:47,558 Nanoplotter: Valid color #4CB391.
2018-05-31 22:30:47,558 Nanoplotter: Creating length plots for Read length.
2018-05-31 22:30:47,558 Nanoplotter: Using 1000 reads maximum of 71277bp.
2018-05-31 22:30:49,302 Created length plots
2018-05-31 22:30:49,302 Nanoplotter: Creating Read lengths vs Average read quality plots using statistics from 1000 reads.
2018-05-31 22:30:50,442 Created LengthvsQual plot
2018-05-31 22:30:50,442 Writing html report.
2018-05-31 22:30:52,181 Finished!

It is totally fine because I can work with the original file (without downsampling) and quality-filtered file, but I'm very confused with those results...

@wdecoster
Copy link
Owner

Interesting, and confusing. Thanks for the detailed problem description!

@wdecoster
Copy link
Owner

I think I figured it out, will try to provide a solution on Monday.

wdecoster added a commit to wdecoster/nanoplotter that referenced this issue Jun 4, 2018
@wdecoster
Copy link
Owner

I've just pushed an update to the submodule nanoplotter (v0.39.1). I believe that could solve your issue after updating, but can you please confirm?

@ryotag
Copy link
Author

ryotag commented Jun 4, 2018

I upgraded NanoPlot with a command python3 -m pip install NanoPlot --upgrade (and also python3 -m pip install nanoplotter --upgrade just in case), and could successfully finish NanoPlot.
Thank you for your support!

@wdecoster
Copy link
Owner

Thanks for the feedback!

@drpatelh
Copy link

drpatelh commented Mar 3, 2021

Hi @wdecoster ! Hope you are well :)

Thank you for this great tool. I have added it to the nf-core/nanoseq and nf-core/viralrecon pipelines and it is very useful. I am getting the same error as described in the title of this issue with the latest Biocontainer v1.32.1. I also tried running the same command via a local Conda install but got the same error. I was tempted to re-open this issue but I will leave that to your discretion just in case I am missing something obvious.

I installed the environment.yml below with conda env create -f environment.yml

name: nanoplot-1.32.1
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - conda-forge::seaborn=0.10.1
  - bioconda::nanoplot=1.32.1

I have attached the raw fastq file here and the command I used was:

NanoPlot --fastq barcode87.fastq.gz

barcode87.fastq.gz

I noticed you have had an open issue on Bioconda to update to the latest version so apologies if this is already fixed. Please let me know if you need anything else from me. Thanks in advance!

The full NanoPlot_*.log with some path cleaning is below:

2021-03-03 22:43:31,141 NanoPlot 1.32.1 started with arguments Namespace(N50=False, alength=False, bam=None, barcoded=False, color='#4CB391', colormap='Greens', cram=Non
e, downsample=None, dpi=100, drop_outliers=False, fasta=None, fastq=['barcode87.fastq.gz'], fastq_minimal=None, fastq_rich=None, feather=None, font_scale=1, format='png'
, hide_stats=False, huge=False, listcolormaps=False, listcolors=False, loglength=False, maxlength=None, minlength=None, minqual=None, no_N50=False, no_supplementary=Fals
e, outdir='.', path='./', percentqual=False, pickle=None, plots=['kde', 'dot'], prefix='', raw=False, readtype='1D', runtime_until=None, store=False, summary=None, threa
ds=4, title=None, tsv_stats=False, ubam=None, verbose=False)
2021-03-03 22:43:31,141 Python version is: 3.8.8 | packaged by conda-forge | (default, Feb 20 2021, 16:22:27)  [GCC 9.3.0]
2021-03-03 22:43:31,141 NanoPlot:  valid output format png
2021-03-03 22:43:31,149 Nanoget: Starting to collect statistics from plain fastq file.
2021-03-03 22:43:31,151 Nanoget: Decompressing gzipped fastq barcode87.fastq.gz
2021-03-03 22:43:31,447 Reduced DataFrame memory usage from 4.57763671875e-05Mb to 2.6702880859375e-05Mb
2021-03-03 22:43:31,455 Nanoget: Gathered all metrics of 2 reads
2021-03-03 22:43:31,466 Calculated statistics
2021-03-03 22:43:31,467 Using sequenced read lengths for plotting.
2021-03-03 22:43:31,468 NanoPlot:  Valid color #4CB391.
2021-03-03 22:43:31,475 NanoPlot:  Valid colormap Greens.
2021-03-03 22:43:31,476 NanoPlot:  Creating length plots for Read length.
2021-03-03 22:43:31,476 NanoPlot:  Using 2 reads maximum of 609bp.
2021-03-03 22:43:37,617 Nanoplotter: orca not found, not creating static image of html. See https://github.com/plotly/orca
2021-03-03 22:43:37,617 Image generation requires the psutil package.

Install using pip:
    $ pip install psutil

Install using conda:
    $ conda install psutil
Traceback (most recent call last):
  File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/nanoplotter/plot.py", line 60, in save_static
    pio.write_image(self.fig, self.path.replace('html', 'png'))
  File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/plotly/io/_kaleido.py", line 245, in write_image
    img_data = to_image(
  File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/plotly/io/_kaleido.py", line 103, in to_image
    return to_image_orca(
  File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/plotly/io/_orca.py", line 1535, in to_image
    ensure_server()
  File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/plotly/io/_orca.py", line 1361, in ensure_server
    raise ValueError(
ValueError: Image generation requires the psutil package.

Install using pip:
    $ pip install psutil

Install using conda:
    $ conda install psutil

2021-03-03 22:43:37,914 Created length plots
2021-03-03 22:43:37,915 NanoPlot:  Creating Read lengths vs Average read quality plots using statistics from 2 reads.
2021-03-03 22:43:38,494 The number of observations must be larger than the number of variables.
Traceback (most recent call last):
  File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/nanoplot/NanoPlot.py", line 101, in main
    plots = make_plots(datadf, settings)
  File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/nanoplot/NanoPlot.py", line 160, in make_plots
    nanoplotter.scatter(
  File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/nanoplotter/nanoplotter_main.py", line 193, in scatter
    plot = sns.jointplot(
  File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/seaborn/axisgrid.py", line 2313, in jointplot
    grid.plot_joint(kdeplot, **joint_kws)
  File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/seaborn/axisgrid.py", line 1777, in plot_joint
    func(self.x, self.y, **kwargs)
  File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/seaborn/distributions.py", line 696, in kdeplot
    ax = _bivariate_kdeplot(x, y, shade, shade_lowest,
  File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/seaborn/distributions.py", line 402, in _bivariate_kdeplot
    xx, yy, z = _statsmodels_bivariate_kde(x, y, bw, gridsize, cut, clip)
  File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/seaborn/distributions.py", line 474, in _statsmodels_bivariate_kde
    kde = smnp.KDEMultivariate([x, y], "cc", bw)
  File "<TRUNCATED>/conda/envs/nanoplot-1.32.1/lib/python3.8/site-packages/statsmodels/nonparametric/kernel_density.py", line 108, in __init__
    raise ValueError("The number of observations must be larger " \
ValueError: The number of observations must be larger than the number of variables.

wdecoster added a commit that referenced this issue Mar 4, 2021
@wdecoster
Copy link
Owner

Hi, thank you for the comprehensive bug report! I expect the cause of this error to be that you are just plotting two reads. But I agree it shouldn't crash and should rather warn you about it.
I replicated your error with a fast with just two reads, and I got the same error. The error went away as soon as the fastq has three reads. You will get a more friendly warning from version v1.34.1.

And yes, updating bioconda appears to be a problem :-( but PyPI has the latest version.

@drpatelh
Copy link

drpatelh commented Mar 4, 2021

Great! Thanks for the quick fix. Warning sounds great. I only observed this issue when I set --min_barcode_reads 1 in the viralrecon pipeline for testing which normally has a default value of 100 so shouldn't really be an issue unless that is set to 2 or below but thought I would report anyway :)

Will look out for the new release on Bioconda as we are only pulling Biocontainers for the entire pipeline! Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants