Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

base.py line 34, in make_ranges q, s, t = zip(*ib) -- ValueError: not enough values to unpack (expected 3, got 0) #690

Closed
Tong-Chen opened this issue Jul 21, 2024 · 2 comments

Comments

@Tong-Chen
Copy link
Contributor

Tong-Chen commented Jul 21, 2024

When running python -m jcvi.compara.synteny mcscan Bbalcxl0706.bed Bbalcxl0706.Aan1.lifted.anchors --iter=100 -o Bbalcxl0706.Aan1.blocks.txt, I met the error below:

  [11:24:01] DEBUG    Load file `Bbalcxl0706.bed`                       base.py:34
  [11:24:02] DEBUG    Load file `Bbalcxl0706.Aan1.lifted.anchors`       base.py:34
  Traceback (most recent call last):
    File "/anaconda3/envs/jcvi/lib/python3.10/runpy.py", line 196, in _run_module_as_main
      return _run_code(code, main_globals, None,
    File "/anaconda3/envs/jcvi/lib/python3.10/runpy.py", line 86, in _run_code
      exec(code, run_globals)
    File "/anaconda3/envs/jcvi/lib/python3.10/site-packages/jcvi/compara/synteny.py", line 1881, in 
      main()
    File "/anaconda3/envs/jcvi/lib/python3.10/site-packages/jcvi/compara/synteny.py", line 462, in main
      p.dispatch(globals())
    File "/anaconda3/envs/jcvi/lib/python3.10/site-packages/jcvi/apps/base.py", line 140, in dispatch
      globals[action](sys.argv[2:])
    File "/anaconda3/envs/jcvi/lib/python3.10/site-packages/jcvi/compara/synteny.py", line 1507, in mcscan
      ranges, block_pairs = ac.make_ranges(order, clip=clip)
    File "/anaconda3/envs/jcvi/lib/python3.10/site-packages/jcvi/compara/base.py", line 34, in make_ranges
      q, s, t = zip(*ib)
  ValueError: not enough values to unpack (expected 3, got 0)
  

I added print(id) before q, s, t = zip(*ib) in /anaconda3/envs/jcvi/lib/python3.10/site-packages/jcvi/compara/base.py", and got the empty list which caused the error.

[['evm.model.Chr4.7317', 'mrna.AA557800.t1', '1370'], ['evm.model.Chr4.7320', 'mrna.AA557790.t1', '1090']]
[['evm.model.Chr4.7348', 'mrna.AA557800.t1', '1370'], ['evm.model.Chr4.7351', 'mrna.AA557790.t1', '1080']]
[['evm.model.Chr4.7501', 'mrna.AA556550.t1', '4720'], ['evm.model.Chr4.7503', 'mrna.AA556540.t1', '972'], ['evm.model.Chr4.7503', 'mrna.AA556560.t1', '150']]
[['evm.model.Chr4.5631', 'mrna.AA559650.t1', '919'], ['evm.model.Chr4.5633', 'mrna.AA559640.t1', '1600'], ['evm.model.Chr4.5634', 'mrna.AA559630.t1', '571']]
[]

I could add an if test to skip empty items. However, I do not know why there is an empty block.

Then I checked the input file with the context neighboring evm.model.Chr4.5634 (the block just before the empty block)

In Bbalcxl0706.Aan1.lifted.anchors

evm.model.Chr4.7503     mrna.AA556540.t1        972
evm.model.Chr4.7503     mrna.AA556560.t1        150
###
evm.model.Chr4.5631     mrna.AA559650.t1        919
evm.model.Chr4.5633     mrna.AA559640.t1        1600
evm.model.Chr4.5634     mrna.AA559630.t1        571
###
###
evm.model.Chr4.1717     mrna.AA564650.t1        1210
evm.model.Chr4.1718.1   mrna.AA564640.t1        389
###
evm.model.Chr4.7543     mrna.AA563300.t1        356
evm.model.Chr4.7546     mrna.AA563320.t1        639

I do not know if this is caused by the ###\n###? And why do we get these double comment lines?

The command used to generate the anchors:

python -m jcvi.compara.catalog ortholog Bbalcxl0706 Aan1 --no_strip_names --no_dotplot --cscore 0.5 --cpus 10 --dist 2 --min_size 1 --dbtype nucl --ignore_zero_anchor 

Thanks!

Tong

@tanghaibao
Copy link
Owner

@Tong-Chen

Thanks for reporting the issue and sending the data to debug. You are correct that the empty block is the issue here.

###
evm.model.Chr4.5631     mrna.AA559650.t1        919
evm.model.Chr4.5633     mrna.AA559640.t1        1600
evm.model.Chr4.5634     mrna.AA559630.t1        571
###
- evm.model.Chr4.7694     mrna.AA560110.t1        2300
- evm.model.Chr4.7694     mrna.AA560120.t1        1410
###
evm.model.Chr4.1717     mrna.AA564650.t1        1210
evm.model.Chr4.1718.1   mrna.AA564640.t1        389

This is the removed block. Why is it removed is a little complicated. You can skip the below if TLDR;


When JCVI processes the LAST output, it tries to identify likely tandem genes since their presence will inflate the # of syntenic anchors. In the initial LAST output, there were two pairs:

evm.model.Chr4.7692     mrna.AA560110.t1        2300
evm.model.Chr4.7693     mrna.AA560120.t1        1410

These get converted to:

evm.model.Chr4.7694     mrna.AA560110.t1        2300
evm.model.Chr4.7694     mrna.AA560120.t1        1410

since JCVI considers evm.model.Chr4.7694 to be the tandem "master" for ..7692 and ..7693. Why? You can check for example both ..7693 and ..7694 matches the same gene mrna.AA045610.t1.

Normally, this would be okay, but, since LAST is not transitive, this can lead to non-existent LAST pairs (for example, if mrna.AA045610.t1 is a fusion between ..7693 and ..7694). In particular, the above two pairs with ..7694 were never present in the LAST output. So JCVI sees them and remove both pairs in .filtered.anchors which leaves an empty block.


I'll have a fix for that empty block soon.

tanghaibao added a commit that referenced this issue Jul 22, 2024
* debug bp seq

* Do not keep empty blocks (#690, thanks @Tong-Chen)

* Add test_empty_blocks

* move filter_blocks outside print_to_file
@Tong-Chen
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants