-
-
Notifications
You must be signed in to change notification settings - Fork 220
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
b88d3b1
commit 9891e1a
Showing
5 changed files
with
359 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
*.swp | ||
*.swo | ||
*~ | ||
*.log | ||
node_modules |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# :eight_spoked_asterisk: :stars: :sparkles: :dizzy: :star2: :star2: :sparkles: :dizzy: :star2: :star2: Contributing :star: :star2: :dizzy: :sparkles: :star: :star2: :dizzy: :sparkles: :stars: :eight_spoked_asterisk: | ||
|
||
So, you want to contribute to this project! That's awesome. However, before | ||
doing so, please read the following simple steps how to contribute. This will | ||
make the life easier and will avoid wasting time on things which are not | ||
requested. :sparkles: | ||
|
||
## Discuss the changes before doing them | ||
- First of all, open an issue in the repository, using the [bug tracker][1], | ||
describing the contribution you would like to make, the bug you found or any | ||
other ideas you have. This will help us to get you started on the right | ||
foot. | ||
|
||
- If it makes sense, add the platform and software information (e.g. operating | ||
system, Node.JS version etc.), screenshots (so we can see what you are | ||
seeing). | ||
|
||
- It is recommended to wait for feedback before continuing to next steps. | ||
However, if the issue is clear (e.g. a typo) and the fix is simple, you can | ||
continue and fix it. | ||
|
||
## Fixing issues | ||
- Fork the project in your account and create a branch with your fix: | ||
`some-great-feature` or `some-issue-fix`. | ||
|
||
- Commit your changes in that branch, writing the code following the | ||
[code style][2]. If the project contains tests (generally, the `test` | ||
directory), you are encouraged to add a test as well. :memo: | ||
|
||
- If the project contains a `package.json` or a `bower.json` file add yourself | ||
in the `contributors` array (or `authors` in the case of `bower.json`; | ||
if the array does not exist, create it): | ||
|
||
```json | ||
{ | ||
"contributors": [ | ||
"Your Name <and@email.address> (http://your.website)" | ||
] | ||
} | ||
``` | ||
|
||
## Creating a pull request | ||
|
||
- Open a pull request, and reference the initial issue in the pull request | ||
message (e.g. *fixes #<your-issue-number>*). Write a good description and | ||
title, so everybody will know what is fixed/improved. | ||
|
||
- If it makes sense, add screenshots, gifs etc., so it is easier to see what | ||
is going on. | ||
|
||
## Wait for feedback | ||
Before accepting your contributions, we will review them. You may get feedback | ||
about what should be fixed in your modified code. If so, just keep committing | ||
in your branch and the pull request will be updated automatically. | ||
|
||
## Everyone is happy! | ||
Finally, your contributions will be merged, and everyone will be happy! :smile: | ||
Contributions are more than welcome! | ||
|
||
Thanks! :sweat_smile: | ||
|
||
|
||
|
||
[1]: https://github.com/IonicaBizau/scrape-it/issues | ||
|
||
[2]: https://github.com/IonicaBizau/code-style |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
## Documentation | ||
|
||
You can see below the API reference of this module. | ||
|
||
### `scrapeIt(url, opts, cb)` | ||
A scraping module for humans. | ||
|
||
#### Params | ||
- **String|Object** `url`: The page url or request options. | ||
- **Object|Array** `opts`: The options passed to `scrapeCheerio` method. | ||
- **Function** `cb`: The callback function. | ||
|
||
#### Return | ||
- **Tinyreq** The request object. | ||
|
||
### `scrapeIt.scrapeCheerio($input, opts, $)` | ||
Scrapes the data in the provided element. | ||
|
||
#### Params | ||
- **Cheerio** `$input`: The input element. | ||
- **Object** `opts`: An array or object containing the scraping information. | ||
If you want to scrape a list, you have to use the `listItem` selector: | ||
|
||
- `listItem` (String): The list item selector. | ||
- `name` (String): The list name (e.g. `articles`). | ||
- `data` (Object): The fields to include in the list objects: | ||
- `<fieldName>` (Object|String): The selector or an object containing: | ||
- `selector` (String): The selector. | ||
- `convert` (Function): An optional function to change the value. | ||
- `how` (Function|String): A function or function name to access the | ||
value. | ||
- `attr` (String): If provided, the value will be taken based on | ||
the attribute name. | ||
- `trim` (Boolean): If `false`, the value will *not* be trimmed | ||
(default: `true`). | ||
- `eq` (Number): If provided, it will select the *nth* element. | ||
- `listItem` (Object): An object, keeping the recursive schema of | ||
the `listItem` object. This can be used to create nested lists. | ||
|
||
**Example**: | ||
```js | ||
{ | ||
listItem: ".article" | ||
, name: "articles" | ||
, data: { | ||
createdAt: { | ||
selector: ".date" | ||
, convert: x => new Date(x) | ||
} | ||
, title: "a.article-title" | ||
, tags: { | ||
selector: ".tags" | ||
, convert: x => x.split("|").map(c => c.trim()).slice(1) | ||
} | ||
, content: { | ||
selector: ".article-content" | ||
, how: "html" | ||
} | ||
} | ||
} | ||
``` | ||
|
||
If you want to collect specific data from the page, just use the same | ||
schema used for the `data` field. | ||
|
||
**Example**: | ||
```js | ||
{ | ||
title: ".header h1" | ||
, desc: ".header h2" | ||
, avatar: { | ||
selector: ".header img" | ||
, attr: "src" | ||
} | ||
} | ||
``` | ||
- **Function** `$`: The Cheerio function. | ||
|
||
#### Return | ||
- **Object** The scrapped data. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
The MIT License (MIT) | ||
|
||
Copyright (c) 2016 Ionică Bizău <bizauionica@gmail.com> (http://ionicabizau.net) | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,186 @@ | ||
|
||
[![scrape-it](https://i.imgur.com/j3Z0rbN.png)](#) | ||
|
||
# scrape-it [![PayPal](https://img.shields.io/badge/%24-paypal-f39c12.svg)][paypal-donations] [![Version](https://img.shields.io/npm/v/scrape-it.svg)](https://www.npmjs.com/package/scrape-it) [![Downloads](https://img.shields.io/npm/dt/scrape-it.svg)](https://www.npmjs.com/package/scrape-it) [![Get help on Codementor](https://cdn.codementor.io/badges/get_help_github.svg)](https://www.codementor.io/johnnyb?utm_source=github&utm_medium=button&utm_term=johnnyb&utm_campaign=github) | ||
|
||
> A Node.js scraper for humans. | ||
## :cloud: Installation | ||
|
||
```sh | ||
$ npm i --save scrape-it | ||
``` | ||
|
||
|
||
## :clipboard: Example | ||
|
||
|
||
|
||
```js | ||
const scrapeIt = require("scrape-it"); | ||
|
||
scrapeIt("http://ionicabizau.net", [ | ||
// Fetch the articles on the page (list) | ||
{ | ||
listItem: ".article" | ||
, name: "articles" | ||
, data: { | ||
createdAt: { | ||
selector: ".date" | ||
, convert: x => new Date(x) | ||
} | ||
, title: "a.article-title" | ||
, tags: { | ||
selector: ".tags" | ||
, convert: x => x.split("|").map(c => c.trim()).slice(1) | ||
} | ||
, content: { | ||
selector: ".article-content" | ||
, how: "html" | ||
} | ||
} | ||
} | ||
, { | ||
listItem: "li.page" | ||
, name: "pages" | ||
, data: { | ||
title: "a" | ||
, url: { | ||
selector: "a" | ||
, attr: "href" | ||
} | ||
} | ||
} | ||
// Fetch some additional data | ||
, { | ||
title: ".header h1" | ||
, desc: ".header h2" | ||
, avatar: { | ||
selector: ".header img" | ||
, attr: "src" | ||
} | ||
} | ||
], (err, page) => { | ||
console.log(err || page); | ||
}); | ||
// { articles: | ||
// [ { createdAt: Mon Mar 14 2016 00:00:00 GMT+0200 (EET), | ||
// title: 'Pi Day, Raspberry Pi and Command Line', | ||
// tags: [Object], | ||
// content: '<p>Everyone knows (or should know)...a" alt=""></p>\n' }, | ||
// { createdAt: Thu Feb 18 2016 00:00:00 GMT+0200 (EET), | ||
// title: 'How I ported Memory Blocks to modern web', | ||
// tags: [Object], | ||
// content: '<p>Playing computer games is a lot of fun. ...' }, | ||
// { createdAt: Mon Nov 02 2015 00:00:00 GMT+0200 (EET), | ||
// title: 'How to convert JSON to Markdown using json2md', | ||
// tags: [Object], | ||
// content: '<p>I love and ...' } ], | ||
// pages: | ||
// [ { title: 'Blog', url: '/' }, | ||
// { title: 'About', url: '/about' }, | ||
// { title: 'FAQ', url: '/faq' }, | ||
// { title: 'Training', url: '/training' }, | ||
// { title: 'Contact', url: '/contact' } ], | ||
// title: 'Ionică Bizău', | ||
// desc: 'Web Developer, Linux geek and Musician', | ||
// avatar: '/images/logo.png' } | ||
``` | ||
|
||
## :memo: Documentation | ||
|
||
|
||
### `scrapeIt(url, opts, cb)` | ||
A scraping module for humans. | ||
|
||
#### Params | ||
- **String|Object** `url`: The page url or request options. | ||
- **Object|Array** `opts`: The options passed to `scrapeCheerio` method. | ||
- **Function** `cb`: The callback function. | ||
|
||
#### Return | ||
- **Tinyreq** The request object. | ||
|
||
### `scrapeIt.scrapeCheerio($input, opts, $)` | ||
Scrapes the data in the provided element. | ||
|
||
#### Params | ||
- **Cheerio** `$input`: The input element. | ||
- **Object** `opts`: An array or object containing the scraping information. | ||
If you want to scrape a list, you have to use the `listItem` selector: | ||
|
||
- `listItem` (String): The list item selector. | ||
- `name` (String): The list name (e.g. `articles`). | ||
- `data` (Object): The fields to include in the list objects: | ||
- `<fieldName>` (Object|String): The selector or an object containing: | ||
- `selector` (String): The selector. | ||
- `convert` (Function): An optional function to change the value. | ||
- `how` (Function|String): A function or function name to access the | ||
value. | ||
- `attr` (String): If provided, the value will be taken based on | ||
the attribute name. | ||
- `trim` (Boolean): If `false`, the value will *not* be trimmed | ||
(default: `true`). | ||
- `eq` (Number): If provided, it will select the *nth* element. | ||
- `listItem` (Object): An object, keeping the recursive schema of | ||
the `listItem` object. This can be used to create nested lists. | ||
|
||
**Example**: | ||
```js | ||
{ | ||
listItem: ".article" | ||
, name: "articles" | ||
, data: { | ||
createdAt: { | ||
selector: ".date" | ||
, convert: x => new Date(x) | ||
} | ||
, title: "a.article-title" | ||
, tags: { | ||
selector: ".tags" | ||
, convert: x => x.split("|").map(c => c.trim()).slice(1) | ||
} | ||
, content: { | ||
selector: ".article-content" | ||
, how: "html" | ||
} | ||
} | ||
} | ||
``` | ||
|
||
If you want to collect specific data from the page, just use the same | ||
schema used for the `data` field. | ||
|
||
**Example**: | ||
```js | ||
{ | ||
title: ".header h1" | ||
, desc: ".header h2" | ||
, avatar: { | ||
selector: ".header img" | ||
, attr: "src" | ||
} | ||
} | ||
``` | ||
- **Function** `$`: The Cheerio function. | ||
|
||
#### Return | ||
- **Object** The scrapped data. | ||
|
||
|
||
|
||
## :yum: How to contribute | ||
Have an idea? Found a bug? See [how to contribute][contributing]. | ||
|
||
|
||
## :scroll: License | ||
|
||
[MIT][license] © [Ionică Bizău][website] | ||
|
||
[paypal-donations]: https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=RVXDDLKKLQRJW | ||
[donate-now]: http://i.imgur.com/6cMbHOC.png | ||
|
||
[license]: http://showalicense.com/?fullname=Ionic%C4%83%20Biz%C4%83u%20%3Cbizauionica%40gmail.com%3E%20(http%3A%2F%2Fionicabizau.net)&year=2016#license-mit | ||
[website]: http://ionicabizau.net | ||
[contributing]: /CONTRIBUTING.md | ||
[docs]: /DOCUMENTATION.md |