-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathMore Results.txt
352 lines (320 loc) · 6.18 KB
/
More Results.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
More Results:
Kim
0 0.0126582620368
1 0.0112869412425
2 0.00828799305293
3 0.00886458802011
4 0.00794793706616
5 0.0110541294575
6 0.00698402315006
7 0.00707960645513
8 0.012601143411
9 0.0115979403781
10 0.0082708800156
11 0.0084946958388
12 0.00837502898381
13 0.00930394659936
14 0.00917495555052
15 0.00889141613427
16 0.00673666337711
17 0.00756673761296
18 0.0104597138789
19 0.0118336726909
20 0.0074473473605
21 0.00931552632465
22 0.012589571491
23 0.00900101114645
24 0.0173106022199
25 0.0121650713191
26 0.00862554881146
27 0.0109244862203
28 0.00866513562857
29 0.00584240838421
30 0.00800700961427
31 0.00924874124394
32 0.0100098494719
33 0.00915080581293
34 0.00952561892333
35 0.00924506535079
36 0.0139134450402
37 0.0101416990826
38 0.0110261927327
39 0.00876504341478
40 0.00950854874593
41 0.00728619209208
42 0.0107985959682
43 0.00839818811225
44 0.00939388296165
45 0.0121039718296
46 0.0102227896607
47 0.00888613097558
48 0.0101158014691
49 0.0103147721845
50 0.0084987590833
51 0.00940351606072
52 0.0107011025086
53 0.00733449162396
54 0.00612953956423
55 0.0143510117487
56 0.00788609215584
57 0.0119225257555
58 0.00807274741232
59 0.00538995422995
60 0.00837863566456
61 0.0119697263298
62 0.0111232460004
63 0.00863898259356
64 0.0119705648974
65 0.00716721440063
66 0.0108269825006
67 0.00821640531091
68 0.00983115327153
69 0.00681911969996
70 0.00459816007424
71 0.0129027050578
72 0.00710121722182
73 0.00900487522994
74 0.00883601929164
75 0.00763927377099
76 0.00621785256021
77 0.00838124271175
78 0.0088937614535
79 0.00951829371403
80 0.00752552084312
81 0.00762409091794
82 0.00737733105824
83 0.00951538586927
84 0.0104705366802
85 0.00823604621064
86 0.00533436039726
87 0.0078957730925
88 0.00985581062539
89 0.0115272614861
90 0.0081692192384
91 0.00913450446751
92 0.00869231983818
93 0.00800088204573
94 0.00618971482278
95 0.00805201464258
96 0.00722380234015
97 0.00736818187105
98 0.00638727043488
99 0.0110879239475
100 0.0182637004566
101 0.0143049563731
102 0.0105363684086
103 0.0
104 0.0102016380284
105 0.0107309162913
106 0.0171499691683
Parts of speech and sentiment (especially negative sentiment) had the highest feature importances, followed by select features from the lsi-reduced tfidf.
Confusion matrix
[[186 326]
[ 77 411]]
O'Mahony
0 0.24
1 0.25
2 0.03
3 0.00
4 0.23
5 0.25
Frequency of numbers and uppercase letters had far lower feature importances (0.03 and 0.001, respectively) than review length and both positive and negative sentiment (each between 0.23 and 0.25).
Confusion matrix
[[205 307]
[123 365]]
Liu
0 0.023406501328
1 0.02385644445
2 0.0268205254886
3 0.0220956267528
4 0.0169747735056
5 0.0334339804366
6 0.0268314690447
7 0.0230837054343
8 0.017253685104
9 0.00400740665765
10 0.0270147808201
11 0.0286192985074
12 0.0363714112897 **
13 0.0295096139618
14 0.0175598672724
15 0.0140329487284
16 0.0207458067052
17 0.0311002725473
18 0.0257888371405
19 0.0338322278117
20 0.0161597481135
21 0.0135388610121
22 0.0234159131211
23 0.00502296048921
24 0.031540201536
25 0.0118261941559
26 0.0390150346887 **
27 0.0319100141881
28 0.0320277992587
29 0.0267831046043
30 0.0345671109485 **
31 0.0312426248805
32 0.0279471255402
33 0.0321005615636
34 0.00307033646303
35 0.0263321397068
36 0.0197861862317
37 0.017952093346
38 0.0301693533987
39 0.0282963471422
40 0.0349571066244 **
The feature importances were relatively even, but nouns, verbs, and sentiment (negative moreso than positive) were the strongest features.
Confusion matrix
[[231 281]
[137 351]]
Zhang
0 0.181327190208
1 0.210750593843
2 0.170654795781
3 0.171186977914
4 0.0423899681306
5 0.125378026195
6 0.0983124479292
Confusion matrix
[[110 402]
[102 386]]
tfidf unigrams
terrible
tfidf unigrams + lsi
Confusion matrix
[[ 82 430]
[ 21 467]]
sentiment
0 0.492033548109
1 0.507966451891
Confusion matrix
[[104 408]
[ 44 444]]
pos (basic pos)
0 0.257299910975
1 0.256458449943
2 0.251828365903
3 0.0
4 0.234413273179
Confusion matrix
[[201 311]
[121 367]]
Kim + Zhang
0 0.0141372047951
1 0.0109213771348
2 0.00962194915102
3 0.00863344706189
4 0.00804182288333
5 0.0134245119641
6 0.0103679625123
7 0.00571775789681
8 0.00928085843812
9 0.00743982529346
10 0.00710365276793
11 0.00858587954706
12 0.00810478367116
13 0.00730598110139
14 0.00763947925124
15 0.00616097783662
16 0.00964093038963
17 0.0100158048144
18 0.00341943755445
19 0.010089904332
20 0.0111197056206
21 0.0106746026279
22 0.00897530596089
23 0.00939976690416
24 0.016336830746
25 0.0105670387697
26 0.00730795312612
27 0.00971953881326
28 0.00953998279369
29 0.0092254591176
30 0.0100537076315
31 0.00783059770194
32 0.0110717119927
33 0.00854409484017
34 0.00765412965921
35 0.0107162319195
36 0.0134895477573
37 0.00627672805759
38 0.0122354729784
39 0.0068718225431
40 0.00842377644774
41 0.00942438965773
42 0.00722565498107
43 0.00760954976703
44 0.00740920688751
45 0.00701037784471
46 0.00791784635897
47 0.00837060521339
48 0.00885130638948
49 0.00723023577154
50 0.00923375044945
51 0.00748376670581
52 0.00828675275128
53 0.0075081296931
54 0.00839555902064
55 0.00916711635121
56 0.0104047959427
57 0.00625440584621
58 0.00748666594888
59 0.00920818239821
60 0.00632677727915
61 0.00615961868908
62 0.00906571278689
63 0.0077251910769
64 0.00830415683769
65 0.00753543404382
66 0.0092380664026
67 0.00724063369311
68 0.0100427190082
69 0.00632388756471
70 0.0123399868668
71 0.00754207641334
72 0.00893118063864
73 0.0097990332416
74 0.00905801033826
75 0.00570702948813
76 0.00932028693992
77 0.00583241850815
78 0.00713476401364
79 0.00922325736566
80 0.0142111755641
81 0.00676484012522
82 0.00807165240693
83 0.0088366571145
84 0.00994223101632
85 0.00638863985001
86 0.00918971516711
87 0.00787999404628
88 0.00693153025352
89 0.00754357586606
90 0.0102474602954
91 0.00826881487931
92 0.00945159212028
93 0.0080562419567
94 0.00806184579445
95 0.00712039865609
96 0.00706392203291
97 0.00806533302766
98 0.00936287604684
99 0.00459413221181
100 0.0187616284995
101 0.00870507383515
102 0.0108368249644
103 0.0
104 0.00954565901355
105 0.0129466627927
106 0.012550672827
107 0.0130041621357
108 0.0083652518193
109 0.00931624620138
110 0.0113703902468
111 0.00430201135273
112 0.00624555935073
113 0.00898113494819
Confusion matrix
[[204 308]
[ 91 397]]