java.lang.ArrayIndexOutOfBoundsException: 47118 #7

jdmontenegro · 2022-09-01T08:14:02Z

hi, I am running gusher from inside the braker pipeline. For some reason the first time I ran it on a chromosome-level assembly it worked nicely, but now, running it on a scaffold-level assembly (25K scaffolds) I keep getting the following error when running gush:

> java -jar /scratch/molevo/jmontenegro/software/GUSHR/GeMoMa-1.6.2.jar CLI AnnotationFinalizer u=YES g=genome.fa a=gushr-TIJPJDZUYCXQ/complete_gemoma_like.gff3 i=gushr-TIJPJDZUYCXQ/introns.gff c=UNSTRANDED coverage_unstranded=gushr-TIJPJDZUYCXQ/coverage.bedgraph rename=NO outdir=gushr-TIJPJDZUYCXQ/
jar time stamp: Sat Aug 20 17:22:40 CEST 2022

Searching for the new GeMoMa updates ...
You are using GeMoMa 1.6.2, but the latest version is 1.9.
You can download the latest version from http://www.jstacs.de/index.php/GeMoMa

Parameters of tool "AnnotationFinalizer" (AnnotationFinalizer, version: 1.6.2):
a - annotation (The predicted genome annotation file (GFF))	= gushr-TIJPJDZUYCXQ/complete_gemoma_like.gff3
t - tag (A user-specified tag for transcript predictions in the third column of the returned gff. It might be beneficial to set this to a specific value for some genome browsers., default = prediction)	= prediction
u - UTR (allows to predict UTRs using RNA-seq data, range={NO, YES}, default = NO)	= YES
    No parameters for selection "NO"
    Parameters for selection "YES":
    	g - genome (The genome file (FASTA), i.e., the target sequences in the blast run. Should be in IUPAC code)	= genome.fa
    		The following parameter(s) can be used multiple times:
    		i - introns file (Introns (GFF), which might be obtained from RNA-seq)	= gushr-TIJPJDZUYCXQ/introns.gff
    	r - reads (if introns are given by a GFF, only use those which have at least this number of supporting split reads, valid range = [1, 2147483647], default = 1)	= 1
    		The following parameter(s) can be used multiple times:
    		c - coverage file (experimental coverage (RNA-seq), range={NO, UNSTRANDED, STRANDED}, default = NO)	= UNSTRANDED
    		    No parameters for selection "NO"
    		    Parameters for selection "UNSTRANDED":
    		    	coverage_unstranded - coverage_unstranded (The coverage file contains the unstranded coverage of the genome per interval. Intervals with coverage 0 (zero) can be left out.)	= gushr-TIJPJDZUYCXQ/coverage.bedgraph
    		    Parameters for selection "STRANDED":
    		    	coverage_forward - coverage_forward (The coverage file contains the forward coverage of the genome per interval. Intervals with coverage 0 (zero) can be left out.)	= null
    		    	coverage_reverse - coverage_reverse (The coverage file contains the reverse coverage of the genome per interval. Intervals with coverage 0 (zero) can be left out.)	= null
rename - rename (allows to generate generic gene and transcripts names (cf. attribute &quot;Name&quot;), range={COMPOSED, SIMPLE, NO}, default = COMPOSED)	= NO
         Parameters for selection "COMPOSED":
         	p - prefix (the prefix of the generic name)	= null
         	infix - infix (the infix of the generic name, default = G)	= G
         	s - suffix (the suffix of the generic name, default = 0)	= 0
         	d - digits (the number of informative digits, valid range = [4, 10], default = 5)	= 5
         	di - delete infix (a comma-separated list of infixes that is deleted from the sequence names before building the gene/transcript name, default = )	= 
         Parameters for selection "SIMPLE":
         	p - prefix (the prefix of the generic name)	= null
         	d - digits (the number of informative digits, valid range = [4, 10], default = 5)	= 5
         No parameters for selection "NO"
outdir - The output directory, defaults to the current working directory (.)	= gushr-TIJPJDZUYCXQ/
genome parts: 25454	[Seg10865, Seg10864, Seg10863, Seg10862, Seg10869, Seg10868, Seg10867, Seg10866, Seg9583, Seg9584, Seg9585, Seg9586, Seg22850, Seg9580, Seg22851, Seg9581, Seg9582, Seg19202, Seg22843, Seg19201, Seg228...
possible introns from RNA-seq (split reads>=1): 864409
+: 163226
-: 170825
.: 265179
Check RNA-seq data (introns): 48% of the sequences in the reference genome are covered.

#genes: 52801
#warnings: [0, 0]
#predictions: 52801
#warnings: [0, 0]
#CDSs: 237069
#warnings: [0, 0]
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 47118
	at projects.gemoma.AnnotationFinalizer.extendUTR(AnnotationFinalizer.java:673)
	at projects.gemoma.AnnotationFinalizer.run(AnnotationFinalizer.java:564)
	at projects.gemoma.AnnotationFinalizer.run(AnnotationFinalizer.java:444)
	at de.jstacs.tools.ui.cli.CLI.run(CLI.java:427)
	at projects.gemoma.GeMoMa.main(GeMoMa.java:368)

I am trying to understand what else could be going on here and how to fix it or work around it.
The original braker command was as follows:

braker.pl --cores 16 --species=new --softmasking --UTR=on --workingdir=/tmp/slurm-5396296/braker2_rna --AUGUSTUS_BIN_PATH=/apps/augustus/3.4.0/bin --AUGUSTUS_SCRIPTS_PATH=/apps/augustus/3.4.0/scripts --genome=genome.fa --bam=merged.dd.bam

Any help would be much appreciated.

Regards,

Juan D.

The text was updated successfully, but these errors were encountered:

Aswin2667 · 2022-09-01T09:52:59Z

Can i do this?

jdmontenegro · 2022-09-01T10:47:09Z

Would it make sense to upgrade GEMOMA to 1.9? would braker/gushr still be compatible with that version?

tomomano · 2022-11-07T09:55:25Z

@jdmontenegro

My comment here may fix your problem.
Gaius-Augustus/BRAKER#456 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

java.lang.ArrayIndexOutOfBoundsException: 47118 #7

java.lang.ArrayIndexOutOfBoundsException: 47118 #7

jdmontenegro commented Sep 1, 2022

Aswin2667 commented Sep 1, 2022

jdmontenegro commented Sep 1, 2022

tomomano commented Nov 7, 2022

java.lang.ArrayIndexOutOfBoundsException: 47118 #7

java.lang.ArrayIndexOutOfBoundsException: 47118 #7

Comments

jdmontenegro commented Sep 1, 2022

Aswin2667 commented Sep 1, 2022

jdmontenegro commented Sep 1, 2022

tomomano commented Nov 7, 2022