-
Notifications
You must be signed in to change notification settings - Fork 1
Home
The README should contain basically all of the information necessary for running the code. Additional information is available in the code's documentation. The following are FAQs:
The Folder=
option in the configuration specifies where output will be stored. By default after every round of convergence the model will print a human readable version of the the generative distributions used by the model. In addition, lexical distributions will have a second version with the suffix .lex.gz
which are conditional distributions. Other files generated include the grammar Grammar.gz
, the induced lexicon Lexicon.gz
, the serialized models model#
and any output from testing Test.#.#.JSON.gz
.
Yes, this verbose printing can be turned off by setting the configuration flag printModelsVerbose=False
Load Java and Maven modules (these can be added to your .bash_profile
module load sun-jdk/1.8.0
module load apache-maven/3.0.5
If you have not registered your SSH-Keys with Bitbucket, set terminal to ask for password
unset SSH_ASKPASS
CoNLL Shared Task
Index word lemma Coarse Fine Feats Head Label
1 Afirmó afirmar v vm num=s|per=3|mod=i|tmp=s 0 ROOT
NAACL Shared Task
Index word lemma Coarse Fine UNIVERSAL Feats Head Label
1 Afirmó afirmar v vm VERB num=s|per=3|mod=i|tmp=s 0 ROOT
Universal tagset mappings for some languages are available in www.YonatanBisk.com/Thesis
Tagset
https://github.com/ybisk/CCG-Induction/blob/master/src/main/resources/english.pos.map
English | mapping | Tag Type |
---|---|---|
. | punct | Period |
, | punct conj | Comma |
CC | conj | Coordinationg Conjunction |
JJ | Adjective | |
VBD | verb | Verb, past tense |
VBG | verb | Verb, gerund |
Roles are used by Induction to denote special restrictions
CCGBank
PARG CCG-style dependencies
SRC TAR CAT Arg Index SRC word TAR word
<s> 3
2 0 S[frg]/NP 1 year Not
2 1 NP[nb]/N 1 year this
<\s>
AUTO A bracketed parse (we assume these are collapsed to a single line):
(<T S[frg] 0 2>
(<T S[frg] 0 2>
(<L S[frg]/NP RB RB Not S[frg]/NP_158>)
(<T NP 1 2>
(<L NP[nb]/N DT DT this NP[nb]_165/N_165>)
(<L N NN NN year N>)
) ¬
) ¬
(<L . . . . .>)
)
-Xmx20g -- Specifies that the heap can grow to 20gb
Should be set to value < total machine memory
-XX:+UseParallelGC -- JVM spawns parallel garbage collection threads
-XX:ParallelGCThreads=2 -- Specifies the number of threads.
-server -- Optimize loops, etc
-XX:+UseFastAccessorMethods -- Optimize
-XX:+AggressiveOpts -- Optimize