This API wraps the Java boilerpipe library into an HTTP API to extract raw article text from HTML pages.
There are two ways to use the API. You can either pass a url or raw html:
curl -X POST http://localhost:3000/extract -H "Content-Type: application/json" -d '
{
"url": "http://techcrunch.com/2014/07/07/matterport-16m-dcm/"
}
'
curl -X POST http://localhost:3000/extract -H "Content-Type: application/json" -d '
{
"html": "YOUR HTML CODE HERE"
}
'
The easiest way to run the API is using Docker. A published version is available as blikk/boilerpipe-api
on Dockerhub.