Updated docs

IonicaBizau · Apr 28, 2016 · 9891e1a · 9891e1a
1 parent b88d3b1
commit 9891e1a
Show file tree

Hide file tree

Showing 5 changed files with 359 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,5 @@
+*.swp
+*.swo
+*~
+*.log
+node_modules
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,66 @@
+# :eight_spoked_asterisk: :stars: :sparkles: :dizzy: :star2: :star2: :sparkles: :dizzy: :star2: :star2: Contributing :star: :star2: :dizzy: :sparkles:  :star: :star2: :dizzy: :sparkles: :stars: :eight_spoked_asterisk:
+
+So, you want to contribute to this project! That's awesome. However, before
+doing so, please read the following simple steps how to contribute. This will
+make the life easier and will avoid wasting time on things which are not
+requested. :sparkles:
+
+## Discuss the changes before doing them
+ - First of all, open an issue in the repository, using the [bug tracker][1],
+   describing the contribution you would like to make, the bug you found or any
+   other ideas you have. This will help us to get you started on the right
+   foot.
+
+ - If it makes sense, add the platform and software information (e.g. operating
+   system, Node.JS version etc.), screenshots (so we can see what you are
+   seeing).
+
+ - It is recommended to wait for feedback before continuing to next steps.
+   However, if the issue is clear (e.g. a typo) and the fix is simple, you can
+   continue and fix it.
+
+## Fixing issues
+ - Fork the project in your account and create a branch with your fix:
+   `some-great-feature` or `some-issue-fix`.
+
+ - Commit your changes in that branch, writing the code following the
+   [code style][2]. If the project contains tests (generally, the `test`
+   directory), you are encouraged to add a test as well. :memo:
+
+ - If the project contains a `package.json` or a `bower.json` file add yourself
+   in the `contributors` array (or `authors` in the case of `bower.json`;
+   if the array does not exist, create it):
+
+   ```json
+   {
+      "contributors": [
+         "Your Name <and@email.address> (http://your.website)"
+      ]
+   }
+   ```
+
+## Creating a pull request
+
+ - Open a pull request, and reference the initial issue in the pull request
+   message (e.g. *fixes #<your-issue-number>*). Write a good description and
+   title, so everybody will know what is fixed/improved.
+
+ - If it makes sense, add screenshots, gifs etc., so it is easier to see what
+   is going on.
+
+## Wait for feedback
+Before accepting your contributions, we will review them. You may get feedback
+about what should be fixed in your modified code. If so, just keep committing
+in your branch and the pull request will be updated automatically.
+
+## Everyone is happy!
+Finally, your contributions will be merged, and everyone will be happy! :smile:
+Contributions are more than welcome!
+
+Thanks! :sweat_smile:
+
+
+
+[1]: https://github.com/IonicaBizau/scrape-it/issues
+
+[2]: https://github.com/IonicaBizau/code-style
diff --git a/DOCUMENTATION.md b/DOCUMENTATION.md
@@ -0,0 +1,81 @@
+## Documentation
+
+You can see below the API reference of this module.
+
+### `scrapeIt(url, opts, cb)`
+A scraping module for humans.
+
+#### Params
+- **String|Object** `url`: The page url or request options.
+- **Object|Array** `opts`: The options passed to `scrapeCheerio` method.
+- **Function** `cb`: The callback function.
+
+#### Return
+- **Tinyreq** The request object.
+
+### `scrapeIt.scrapeCheerio($input, opts, $)`
+Scrapes the data in the provided element.
+
+#### Params
+- **Cheerio** `$input`: The input element.
+- **Object** `opts`: An array or object containing the scraping information.
+  If you want to scrape a list, you have to use the `listItem` selector:
+
+   - `listItem` (String): The list item selector.
+   - `name` (String): The list name (e.g. `articles`).
+   - `data` (Object): The fields to include in the list objects:
+      - `<fieldName>` (Object|String): The selector or an object containing:
+         - `selector` (String): The selector.
+         - `convert` (Function): An optional function to change the value.
+         - `how` (Function|String): A function or function name to access the
+           value.
+         - `attr` (String): If provided, the value will be taken based on
+           the attribute name.
+         - `trim` (Boolean): If `false`, the value will *not* be trimmed
+           (default: `true`).
+         - `eq` (Number): If provided, it will select the *nth* element.
+         - `listItem` (Object): An object, keeping the recursive schema of
+           the `listItem` object. This can be used to create nested lists.
+
+  **Example**:
+  ```js
+  {
+      listItem: ".article"
+    , name: "articles"
+    , data: {
+          createdAt: {
+              selector: ".date"
+            , convert: x => new Date(x)
+          }
+        , title: "a.article-title"
+        , tags: {
+              selector: ".tags"
+            , convert: x => x.split("|").map(c => c.trim()).slice(1)
+          }
+        , content: {
+              selector: ".article-content"
+            , how: "html"
+          }
+      }
+  }
+  ```
+
+  If you want to collect specific data from the page, just use the same
+  schema used for the `data` field.
+
+  **Example**:
+  ```js
+  {
+       title: ".header h1"
+     , desc: ".header h2"
+     , avatar: {
+           selector: ".header img"
+         , attr: "src"
+       }
+  }
+  ```
+- **Function** `$`: The Cheerio function.
+
+#### Return
+- **Object** The scrapped data.
+
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+The MIT License (MIT)
+
+Copyright (c) 2016 Ionică Bizău <bizauionica@gmail.com> (http://ionicabizau.net)
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,186 @@
+
+[![scrape-it](https://i.imgur.com/j3Z0rbN.png)](#)
+
+# scrape-it [![PayPal](https://img.shields.io/badge/%24-paypal-f39c12.svg)][paypal-donations] [![Version](https://img.shields.io/npm/v/scrape-it.svg)](https://www.npmjs.com/package/scrape-it) [![Downloads](https://img.shields.io/npm/dt/scrape-it.svg)](https://www.npmjs.com/package/scrape-it) [![Get help on Codementor](https://cdn.codementor.io/badges/get_help_github.svg)](https://www.codementor.io/johnnyb?utm_source=github&utm_medium=button&utm_term=johnnyb&utm_campaign=github)
+
+> A Node.js scraper for humans.
+
+## :cloud: Installation
+
+```sh
+$ npm i --save scrape-it
+```
+
+
+## :clipboard: Example
+
+
+
+```js
+const scrapeIt = require("scrape-it");
+
+scrapeIt("http://ionicabizau.net", [
+    // Fetch the articles on the page (list)
+    {
+        listItem: ".article"
+      , name: "articles"
+      , data: {
+            createdAt: {
+                selector: ".date"
+              , convert: x => new Date(x)
+            }
+          , title: "a.article-title"
+          , tags: {
+                selector: ".tags"
+              , convert: x => x.split("|").map(c => c.trim()).slice(1)
+            }
+          , content: {
+                selector: ".article-content"
+              , how: "html"
+            }
+        }
+    }
+  , {
+        listItem: "li.page"
+      , name: "pages"
+      , data: {
+            title: "a"
+          , url: {
+                selector: "a"
+              , attr: "href"
+            }
+        }
+    }
+    // Fetch some additional data
+  , {
+        title: ".header h1"
+      , desc: ".header h2"
+      , avatar: {
+            selector: ".header img"
+          , attr: "src"
+        }
+  }
+], (err, page) => {
+    console.log(err || page);
+});
+// { articles:
+//    [ { createdAt: Mon Mar 14 2016 00:00:00 GMT+0200 (EET),
+//        title: 'Pi Day, Raspberry Pi and Command Line',
+//        tags: [Object],
+//        content: '<p>Everyone knows (or should know)...a" alt=""></p>\n' },
+//      { createdAt: Thu Feb 18 2016 00:00:00 GMT+0200 (EET),
+//        title: 'How I ported Memory Blocks to modern web',
+//        tags: [Object],
+//        content: '<p>Playing computer games is a lot of fun. ...' },
+//      { createdAt: Mon Nov 02 2015 00:00:00 GMT+0200 (EET),
+//        title: 'How to convert JSON to Markdown using json2md',
+//        tags: [Object],
+//        content: '<p>I love and ...' } ],
+//   pages:
+//    [ { title: 'Blog', url: '/' },
+//      { title: 'About', url: '/about' },
+//      { title: 'FAQ', url: '/faq' },
+//      { title: 'Training', url: '/training' },
+//      { title: 'Contact', url: '/contact' } ],
+//   title: 'Ionică Bizău',
+//   desc: 'Web Developer,  Linux geek and  Musician',
+//   avatar: '/images/logo.png' }
+```
+
+## :memo: Documentation
+
+
+### `scrapeIt(url, opts, cb)`
+A scraping module for humans.
+
+#### Params
+- **String|Object** `url`: The page url or request options.
+- **Object|Array** `opts`: The options passed to `scrapeCheerio` method.
+- **Function** `cb`: The callback function.
+
+#### Return
+- **Tinyreq** The request object.
+
+### `scrapeIt.scrapeCheerio($input, opts, $)`
+Scrapes the data in the provided element.
+
+#### Params
+- **Cheerio** `$input`: The input element.
+- **Object** `opts`: An array or object containing the scraping information.
+  If you want to scrape a list, you have to use the `listItem` selector:
+
+   - `listItem` (String): The list item selector.
+   - `name` (String): The list name (e.g. `articles`).
+   - `data` (Object): The fields to include in the list objects:
+      - `<fieldName>` (Object|String): The selector or an object containing:
+         - `selector` (String): The selector.
+         - `convert` (Function): An optional function to change the value.
+         - `how` (Function|String): A function or function name to access the
+           value.
+         - `attr` (String): If provided, the value will be taken based on
+           the attribute name.
+         - `trim` (Boolean): If `false`, the value will *not* be trimmed
+           (default: `true`).
+         - `eq` (Number): If provided, it will select the *nth* element.
+         - `listItem` (Object): An object, keeping the recursive schema of
+           the `listItem` object. This can be used to create nested lists.
+
+  **Example**:
+  ```js
+  {
+      listItem: ".article"
+    , name: "articles"
+    , data: {
+          createdAt: {
+              selector: ".date"
+            , convert: x => new Date(x)
+          }
+        , title: "a.article-title"
+        , tags: {
+              selector: ".tags"
+            , convert: x => x.split("|").map(c => c.trim()).slice(1)
+          }
+        , content: {
+              selector: ".article-content"
+            , how: "html"
+          }
+      }
+  }
+  ```
+
+  If you want to collect specific data from the page, just use the same
+  schema used for the `data` field.
+
+  **Example**:
+  ```js
+  {
+       title: ".header h1"
+     , desc: ".header h2"
+     , avatar: {
+           selector: ".header img"
+         , attr: "src"
+       }
+  }
+  ```
+- **Function** `$`: The Cheerio function.
+
+#### Return
+- **Object** The scrapped data.
+
+
+
+## :yum: How to contribute
+Have an idea? Found a bug? See [how to contribute][contributing].
+
+
+## :scroll: License
+
+[MIT][license] © [Ionică Bizău][website]
+
+[paypal-donations]: https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=RVXDDLKKLQRJW
+[donate-now]: http://i.imgur.com/6cMbHOC.png
+
+[license]: http://showalicense.com/?fullname=Ionic%C4%83%20Biz%C4%83u%20%3Cbizauionica%40gmail.com%3E%20(http%3A%2F%2Fionicabizau.net)&year=2016#license-mit
+[website]: http://ionicabizau.net
+[contributing]: /CONTRIBUTING.md
+[docs]: /DOCUMENTATION.md