-
Notifications
You must be signed in to change notification settings - Fork 19
/
Copy pathPLAN_COLING
40 lines (35 loc) · 1.43 KB
/
PLAN_COLING
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
==== Things to Avoid (?) ====
- Amazon Mechanical Turk (I had experience on it. poor quality
control, too time consuming.)
- poem (sonnet)
==== Random Thoughts ====
- other than Shakespeare
I tried to find some 'parallel' corpus for other writers or period in
English, but failed. I guess I knew too little about English
literature.
- other than English (dropped)
Also checked on Chinese literature, don't feel easy to collect enough
data either; it is probably doable but needs a lot effort.
==== Modern -> Original ====
Claims:
- design a new metric to measure the similarity to desired style
- using smt techniques to mimic a particular writing style
- how far you can go without a parallel corpus
- impact of more parallel corpus (N->1 mapping)
- impact of language models (more texts from same period)
- adding factors to the phrase table to discourage lexical similarity
in the paraphrases
- automatic metrics correlate to human judgements
Evaluations:
- baseline: another SMT system that only has a Shakespearian language
model and a thesaurus for its phrase table
- hold out R&J (BLEU/PINC) as golden Shakespearian text
- mimic style on Tweets/quotations/essay, compare (BLEU/PINC) with
human judgement
==== Original -> Modern ====
Claims:
- using smt techniques help people read ancient English texts
Evaluations:
- hold out R&J (BLEU, human)
- other plays by playwrights of the same periods, such as Christopher
Marlowe, Ben Jonson, and John Webster.