Skip to content

Commit

Permalink
Updated docs
Browse files Browse the repository at this point in the history
  • Loading branch information
IonicaBizau committed Apr 28, 2016
1 parent b88d3b1 commit 9891e1a
Show file tree
Hide file tree
Showing 5 changed files with 359 additions and 0 deletions.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
*.swp
*.swo
*~
*.log
node_modules
66 changes: 66 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# :eight_spoked_asterisk: :stars: :sparkles: :dizzy: :star2: :star2: :sparkles: :dizzy: :star2: :star2: Contributing :star: :star2: :dizzy: :sparkles: :star: :star2: :dizzy: :sparkles: :stars: :eight_spoked_asterisk:

So, you want to contribute to this project! That's awesome. However, before
doing so, please read the following simple steps how to contribute. This will
make the life easier and will avoid wasting time on things which are not
requested. :sparkles:

## Discuss the changes before doing them
- First of all, open an issue in the repository, using the [bug tracker][1],
describing the contribution you would like to make, the bug you found or any
other ideas you have. This will help us to get you started on the right
foot.

- If it makes sense, add the platform and software information (e.g. operating
system, Node.JS version etc.), screenshots (so we can see what you are
seeing).

- It is recommended to wait for feedback before continuing to next steps.
However, if the issue is clear (e.g. a typo) and the fix is simple, you can
continue and fix it.

## Fixing issues
- Fork the project in your account and create a branch with your fix:
`some-great-feature` or `some-issue-fix`.

- Commit your changes in that branch, writing the code following the
[code style][2]. If the project contains tests (generally, the `test`
directory), you are encouraged to add a test as well. :memo:

- If the project contains a `package.json` or a `bower.json` file add yourself
in the `contributors` array (or `authors` in the case of `bower.json`;
if the array does not exist, create it):

```json
{
"contributors": [
"Your Name <and@email.address> (http://your.website)"
]
}
```

## Creating a pull request

- Open a pull request, and reference the initial issue in the pull request
message (e.g. *fixes #<your-issue-number>*). Write a good description and
title, so everybody will know what is fixed/improved.

- If it makes sense, add screenshots, gifs etc., so it is easier to see what
is going on.

## Wait for feedback
Before accepting your contributions, we will review them. You may get feedback
about what should be fixed in your modified code. If so, just keep committing
in your branch and the pull request will be updated automatically.

## Everyone is happy!
Finally, your contributions will be merged, and everyone will be happy! :smile:
Contributions are more than welcome!

Thanks! :sweat_smile:



[1]: https://github.com/IonicaBizau/scrape-it/issues

[2]: https://github.com/IonicaBizau/code-style
81 changes: 81 additions & 0 deletions DOCUMENTATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
## Documentation

You can see below the API reference of this module.

### `scrapeIt(url, opts, cb)`
A scraping module for humans.

#### Params
- **String|Object** `url`: The page url or request options.
- **Object|Array** `opts`: The options passed to `scrapeCheerio` method.
- **Function** `cb`: The callback function.

#### Return
- **Tinyreq** The request object.

### `scrapeIt.scrapeCheerio($input, opts, $)`
Scrapes the data in the provided element.

#### Params
- **Cheerio** `$input`: The input element.
- **Object** `opts`: An array or object containing the scraping information.
If you want to scrape a list, you have to use the `listItem` selector:

- `listItem` (String): The list item selector.
- `name` (String): The list name (e.g. `articles`).
- `data` (Object): The fields to include in the list objects:
- `<fieldName>` (Object|String): The selector or an object containing:
- `selector` (String): The selector.
- `convert` (Function): An optional function to change the value.
- `how` (Function|String): A function or function name to access the
value.
- `attr` (String): If provided, the value will be taken based on
the attribute name.
- `trim` (Boolean): If `false`, the value will *not* be trimmed
(default: `true`).
- `eq` (Number): If provided, it will select the *nth* element.
- `listItem` (Object): An object, keeping the recursive schema of
the `listItem` object. This can be used to create nested lists.

**Example**:
```js
{
listItem: ".article"
, name: "articles"
, data: {
createdAt: {
selector: ".date"
, convert: x => new Date(x)
}
, title: "a.article-title"
, tags: {
selector: ".tags"
, convert: x => x.split("|").map(c => c.trim()).slice(1)
}
, content: {
selector: ".article-content"
, how: "html"
}
}
}
```

If you want to collect specific data from the page, just use the same
schema used for the `data` field.

**Example**:
```js
{
title: ".header h1"
, desc: ".header h2"
, avatar: {
selector: ".header img"
, attr: "src"
}
}
```
- **Function** `$`: The Cheerio function.

#### Return
- **Object** The scrapped data.

21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License (MIT)

Copyright (c) 2016 Ionică Bizău <bizauionica@gmail.com> (http://ionicabizau.net)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
186 changes: 186 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@

[![scrape-it](https://i.imgur.com/j3Z0rbN.png)](#)

# scrape-it [![PayPal](https://img.shields.io/badge/%24-paypal-f39c12.svg)][paypal-donations] [![Version](https://img.shields.io/npm/v/scrape-it.svg)](https://www.npmjs.com/package/scrape-it) [![Downloads](https://img.shields.io/npm/dt/scrape-it.svg)](https://www.npmjs.com/package/scrape-it) [![Get help on Codementor](https://cdn.codementor.io/badges/get_help_github.svg)](https://www.codementor.io/johnnyb?utm_source=github&utm_medium=button&utm_term=johnnyb&utm_campaign=github)

> A Node.js scraper for humans.
## :cloud: Installation

```sh
$ npm i --save scrape-it
```


## :clipboard: Example



```js
const scrapeIt = require("scrape-it");

scrapeIt("http://ionicabizau.net", [
// Fetch the articles on the page (list)
{
listItem: ".article"
, name: "articles"
, data: {
createdAt: {
selector: ".date"
, convert: x => new Date(x)
}
, title: "a.article-title"
, tags: {
selector: ".tags"
, convert: x => x.split("|").map(c => c.trim()).slice(1)
}
, content: {
selector: ".article-content"
, how: "html"
}
}
}
, {
listItem: "li.page"
, name: "pages"
, data: {
title: "a"
, url: {
selector: "a"
, attr: "href"
}
}
}
// Fetch some additional data
, {
title: ".header h1"
, desc: ".header h2"
, avatar: {
selector: ".header img"
, attr: "src"
}
}
], (err, page) => {
console.log(err || page);
});
// { articles:
// [ { createdAt: Mon Mar 14 2016 00:00:00 GMT+0200 (EET),
// title: 'Pi Day, Raspberry Pi and Command Line',
// tags: [Object],
// content: '<p>Everyone knows (or should know)...a" alt=""></p>\n' },
// { createdAt: Thu Feb 18 2016 00:00:00 GMT+0200 (EET),
// title: 'How I ported Memory Blocks to modern web',
// tags: [Object],
// content: '<p>Playing computer games is a lot of fun. ...' },
// { createdAt: Mon Nov 02 2015 00:00:00 GMT+0200 (EET),
// title: 'How to convert JSON to Markdown using json2md',
// tags: [Object],
// content: '<p>I love and ...' } ],
// pages:
// [ { title: 'Blog', url: '/' },
// { title: 'About', url: '/about' },
// { title: 'FAQ', url: '/faq' },
// { title: 'Training', url: '/training' },
// { title: 'Contact', url: '/contact' } ],
// title: 'Ionică Bizău',
// desc: 'Web Developer, Linux geek and Musician',
// avatar: '/images/logo.png' }
```

## :memo: Documentation


### `scrapeIt(url, opts, cb)`
A scraping module for humans.

#### Params
- **String|Object** `url`: The page url or request options.
- **Object|Array** `opts`: The options passed to `scrapeCheerio` method.
- **Function** `cb`: The callback function.

#### Return
- **Tinyreq** The request object.

### `scrapeIt.scrapeCheerio($input, opts, $)`
Scrapes the data in the provided element.

#### Params
- **Cheerio** `$input`: The input element.
- **Object** `opts`: An array or object containing the scraping information.
If you want to scrape a list, you have to use the `listItem` selector:

- `listItem` (String): The list item selector.
- `name` (String): The list name (e.g. `articles`).
- `data` (Object): The fields to include in the list objects:
- `<fieldName>` (Object|String): The selector or an object containing:
- `selector` (String): The selector.
- `convert` (Function): An optional function to change the value.
- `how` (Function|String): A function or function name to access the
value.
- `attr` (String): If provided, the value will be taken based on
the attribute name.
- `trim` (Boolean): If `false`, the value will *not* be trimmed
(default: `true`).
- `eq` (Number): If provided, it will select the *nth* element.
- `listItem` (Object): An object, keeping the recursive schema of
the `listItem` object. This can be used to create nested lists.

**Example**:
```js
{
listItem: ".article"
, name: "articles"
, data: {
createdAt: {
selector: ".date"
, convert: x => new Date(x)
}
, title: "a.article-title"
, tags: {
selector: ".tags"
, convert: x => x.split("|").map(c => c.trim()).slice(1)
}
, content: {
selector: ".article-content"
, how: "html"
}
}
}
```

If you want to collect specific data from the page, just use the same
schema used for the `data` field.

**Example**:
```js
{
title: ".header h1"
, desc: ".header h2"
, avatar: {
selector: ".header img"
, attr: "src"
}
}
```
- **Function** `$`: The Cheerio function.

#### Return
- **Object** The scrapped data.



## :yum: How to contribute
Have an idea? Found a bug? See [how to contribute][contributing].


## :scroll: License

[MIT][license] © [Ionică Bizău][website]

[paypal-donations]: https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=RVXDDLKKLQRJW
[donate-now]: http://i.imgur.com/6cMbHOC.png

[license]: http://showalicense.com/?fullname=Ionic%C4%83%20Biz%C4%83u%20%3Cbizauionica%40gmail.com%3E%20(http%3A%2F%2Fionicabizau.net)&year=2016#license-mit
[website]: http://ionicabizau.net
[contributing]: /CONTRIBUTING.md
[docs]: /DOCUMENTATION.md

0 comments on commit 9891e1a

Please sign in to comment.