-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
153 lines (121 loc) · 4.77 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
title: "parsent"
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
md_document:
toc: true
---
```{r, echo=FALSE}
desc <- suppressWarnings(readLines("DESCRIPTION"))
regex <- "(^Version:\\s+)(\\d+\\.\\d+\\.\\d+)"
loc <- grep(regex, desc)
ver <- gsub(regex, "\\2", desc[loc])
verbadge <- sprintf('<a href="https://img.shields.io/badge/Version-%s-orange.svg"><img src="https://img.shields.io/badge/Version-%s-orange.svg" alt="Version"/></a></p>', ver, ver)
verbadge <- ''
````
[![Build Status](https://travis-ci.org/trinker/parsent.svg?branch=master)](https://travis-ci.org/trinker/parsent)
[![Coverage Status](https://coveralls.io/repos/trinker/parsent/badge.svg?branch=master)](https://coveralls.io/r/trinker/parsent?branch=master)
`r verbadge`
```{r, echo=FALSE, message=FALSE}
library(knitr)
knit_hooks$set(htmlcap = function(before, options, envir) {
if(!before) {
paste('<p class="caption"><b><em>',options$htmlcap,"</em></b></p>",sep="")
}
})
knitr::opts_knit$set(self.contained = TRUE, cache = FALSE)
knitr::opts_chunk$set(fig.path = "tools/figure/")
```
![](tools/parsent_logo/r_parsent.png)
**parsent** is a collection of tools used to parse sentences. The package is a wrapper for the **NLP**/**openNLP** packages that simplifies and extends the user experience.
# Function Usage
Functions typically fall into the task category of (1) parsing, (2) converting, & (3) extracting. The main functions, task category, & descriptions are summarized in the table below:
| Function | Task | Description |
|---------------------------|------------|-----------------------------------------------------------|
| `parser` | parsing | Parse sentences into phrases |
| `parse_annotator` | parsing | Generate **OpenNLP** parser required by `parser` function |
| `as_tree` | converting | Convert `parser` output into tree form |
| `as_square_brace` | converting | Convert `parser` output in square brace form (vs. round) |
| `as_square_brace_latex` | converting | Convert `parser` output LaTeX ready form |
| `get_phrases` | extracting | Extract [phrases](https://en.wikipedia.org/wiki/Phrase_structure_grammar) from `parser` output |
| `get_phrase_type` | extracting | Extract phrases one step down the tree |
| `get_phrase_type_regex` | extracting | Extract phrases at any level in the tree (uses regex) |
| `get_leaves` | extracting | Extract the leaves (tokens or words) from a phrase |
| `take` | extracting | Select indexed elements from a vector |
# Installation
To download the development version of **parsent**:
Download the [zip ball](https://github.com/trinker/parsent/zipball/master) or [tar ball](https://github.com/trinker/parsent/tarball/master), decompress and run `R CMD INSTALL` on it, or use the **pacman** package to install the development version:
```r
if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh(c(
"trinker/textshape",
"trinker/coreNLPsetup",
"trinker/parsent"
))
```
# Contact
You are welcome to:
* submit suggestions and bug-reports at: <https://github.com/trinker/parsent/issues>
* send a pull request on: <https://github.com/trinker/parsent/>
* compose a friendly e-mail to: <tyler.rinker@gmail.com>
# Demonstration
## Load the Packages/Data
```{r, message=FALSE}
if (!require("pacman")) install.packages("pacman")
pacman::p_load(parsent, magrittr)
txt <- c(
"Really, I like chocolate because it is good. It smells great.",
"Robots are rather evil and most are devoid of decency.",
"He is my friend.",
"Clifford the big red dog ate my lunch.",
"Professor Johns can not teach",
"",
NA
)
```
## Create Annotator
```{r}
if(!exists('parse_ann')) {
parse_ann <- parse_annotator()
}
```
## Parsing
```{r}
(x <- parser(txt, parse.annotator = parse_ann))
```
Note that the user may choose to use CoreNLP as a backend by setting `engine = "coreNLP"`. To ensure that coreNLP is setup properly use `check_setup`.
## Plotting
```{r}
par(mar = c(0,0,0,.7) + 0.2)
plot(x[[2]])
```
```{r}
par(
mfrow = c(3, 2),
mar = c(0,0,1,1) + 0.1
)
invisible(lapply(x[1:5], plot))
```
## Get Subject, Verb, and Direct Object
### Subject
```{r}
get_phrase_type(x, "NP") %>%
take() %>%
get_leaves()
```
### Predicate Verb
```{r}
get_phrase_type_regex(x, "VP") %>%
take() %>%
get_phrase_type_regex("(VB|MD)") %>%
take() %>%
get_leaves()
```
### Direct Object
```{r}
get_phrase_type_regex(x, "VP") %>%
take() %>%
get_phrase_type_regex("NP") %>%
take() %>%
get_leaves()
```