Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.ArrayIndexOutOfBoundsException: 47118 #7

Open
jdmontenegro opened this issue Sep 1, 2022 · 3 comments
Open

java.lang.ArrayIndexOutOfBoundsException: 47118 #7

jdmontenegro opened this issue Sep 1, 2022 · 3 comments

Comments

@jdmontenegro
Copy link

hi, I am running gusher from inside the braker pipeline. For some reason the first time I ran it on a chromosome-level assembly it worked nicely, but now, running it on a scaffold-level assembly (25K scaffolds) I keep getting the following error when running gush:

> java -jar /scratch/molevo/jmontenegro/software/GUSHR/GeMoMa-1.6.2.jar CLI AnnotationFinalizer u=YES g=genome.fa a=gushr-TIJPJDZUYCXQ/complete_gemoma_like.gff3 i=gushr-TIJPJDZUYCXQ/introns.gff c=UNSTRANDED coverage_unstranded=gushr-TIJPJDZUYCXQ/coverage.bedgraph rename=NO outdir=gushr-TIJPJDZUYCXQ/
jar time stamp: Sat Aug 20 17:22:40 CEST 2022

Searching for the new GeMoMa updates ...
You are using GeMoMa 1.6.2, but the latest version is 1.9.
You can download the latest version from http://www.jstacs.de/index.php/GeMoMa

Parameters of tool "AnnotationFinalizer" (AnnotationFinalizer, version: 1.6.2):
a - annotation (The predicted genome annotation file (GFF))	= gushr-TIJPJDZUYCXQ/complete_gemoma_like.gff3
t - tag (A user-specified tag for transcript predictions in the third column of the returned gff. It might be beneficial to set this to a specific value for some genome browsers., default = prediction)	= prediction
u - UTR (allows to predict UTRs using RNA-seq data, range={NO, YES}, default = NO)	= YES
    No parameters for selection "NO"
    Parameters for selection "YES":
    	g - genome (The genome file (FASTA), i.e., the target sequences in the blast run. Should be in IUPAC code)	= genome.fa
    		The following parameter(s) can be used multiple times:
    		i - introns file (Introns (GFF), which might be obtained from RNA-seq)	= gushr-TIJPJDZUYCXQ/introns.gff
    	r - reads (if introns are given by a GFF, only use those which have at least this number of supporting split reads, valid range = [1, 2147483647], default = 1)	= 1
    		The following parameter(s) can be used multiple times:
    		c - coverage file (experimental coverage (RNA-seq), range={NO, UNSTRANDED, STRANDED}, default = NO)	= UNSTRANDED
    		    No parameters for selection "NO"
    		    Parameters for selection "UNSTRANDED":
    		    	coverage_unstranded - coverage_unstranded (The coverage file contains the unstranded coverage of the genome per interval. Intervals with coverage 0 (zero) can be left out.)	= gushr-TIJPJDZUYCXQ/coverage.bedgraph
    		    Parameters for selection "STRANDED":
    		    	coverage_forward - coverage_forward (The coverage file contains the forward coverage of the genome per interval. Intervals with coverage 0 (zero) can be left out.)	= null
    		    	coverage_reverse - coverage_reverse (The coverage file contains the reverse coverage of the genome per interval. Intervals with coverage 0 (zero) can be left out.)	= null
rename - rename (allows to generate generic gene and transcripts names (cf. attribute "Name"), range={COMPOSED, SIMPLE, NO}, default = COMPOSED)	= NO
         Parameters for selection "COMPOSED":
         	p - prefix (the prefix of the generic name)	= null
         	infix - infix (the infix of the generic name, default = G)	= G
         	s - suffix (the suffix of the generic name, default = 0)	= 0
         	d - digits (the number of informative digits, valid range = [4, 10], default = 5)	= 5
         	di - delete infix (a comma-separated list of infixes that is deleted from the sequence names before building the gene/transcript name, default = )	= 
         Parameters for selection "SIMPLE":
         	p - prefix (the prefix of the generic name)	= null
         	d - digits (the number of informative digits, valid range = [4, 10], default = 5)	= 5
         No parameters for selection "NO"
outdir - The output directory, defaults to the current working directory (.)	= gushr-TIJPJDZUYCXQ/
genome parts: 25454	[Seg10865, Seg10864, Seg10863, Seg10862, Seg10869, Seg10868, Seg10867, Seg10866, Seg9583, Seg9584, Seg9585, Seg9586, Seg22850, Seg9580, Seg22851, Seg9581, Seg9582, Seg19202, Seg22843, Seg19201, Seg228...
possible introns from RNA-seq (split reads>=1): 864409
+: 163226
-: 170825
.: 265179
Check RNA-seq data (introns): 48% of the sequences in the reference genome are covered.

#genes: 52801
#warnings: [0, 0]
#predictions: 52801
#warnings: [0, 0]
#CDSs: 237069
#warnings: [0, 0]
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 47118
	at projects.gemoma.AnnotationFinalizer.extendUTR(AnnotationFinalizer.java:673)
	at projects.gemoma.AnnotationFinalizer.run(AnnotationFinalizer.java:564)
	at projects.gemoma.AnnotationFinalizer.run(AnnotationFinalizer.java:444)
	at de.jstacs.tools.ui.cli.CLI.run(CLI.java:427)
	at projects.gemoma.GeMoMa.main(GeMoMa.java:368)

I am trying to understand what else could be going on here and how to fix it or work around it.
The original braker command was as follows:

braker.pl --cores 16 --species=new --softmasking --UTR=on --workingdir=/tmp/slurm-5396296/braker2_rna --AUGUSTUS_BIN_PATH=/apps/augustus/3.4.0/bin --AUGUSTUS_SCRIPTS_PATH=/apps/augustus/3.4.0/scripts --genome=genome.fa --bam=merged.dd.bam

Any help would be much appreciated.

Regards,

Juan D.

@Aswin2667
Copy link

Can i do this?

@jdmontenegro
Copy link
Author

Would it make sense to upgrade GEMOMA to 1.9? would braker/gushr still be compatible with that version?

@tomomano
Copy link

tomomano commented Nov 7, 2022

@jdmontenegro

My comment here may fix your problem.
Gaius-Augustus/BRAKER#456 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants