Skip to content

Latest commit

 

History

History
313 lines (192 loc) · 11.7 KB

README.md

File metadata and controls

313 lines (192 loc) · 11.7 KB

HNScraper

License

HNScraper is a scraping library for hackernews, written in swift. It allows you to fetch all the stories, comments and user informations directly from the website. It also supports logging in and vote on/favorite posts and comments.

This library is similar to the LibHN library. Actually, some parts of the project, such as the post, comment and user models and the parsing rules are basically a swift translation of the LibHN library.

Before going further, note that there is an official API that implements the fundamentals features such as grabbing the new, top and best stories, user informations and comments. However, it doesn't support login and any related functionalities and suffers from a lot of restrictions. If it fits your needs though, I would recommend you use the official API over this scraper.

Table of content

Installation

Pod

Ensure you have at least the following code in your Podfile:

use_frameworks!

target 'YourAppName' do
    pod 'HNScraper', '~> 0.2.1'
end

Run pod install in your project's folder.

Then just add import HNScraper wherever you need the scraper.

Manually

Just add all the .swift files from the HNScraper folder in your project.

Documentation

Completion handlers & error gesture

All the following actions are performed in the same way: you call a method on the right instance with the required parameters and a completion handler which will give you back the results of your request and, eventually, the produced error.

func getSomething(dependingOn: , completion: ((results?, error?) -> Void))

The possible errors are defined by HNScrapperError and HNLoginError . Those structures simplifies the error handling by classifying the most recurrent errors into distinct, self-explanatory error types. This abstraction allows you to handle common problems, such as no Internet connection, bad credentials, non-existing post-id, etc, without worrying about "low level" errors such as URLErrors and JSON errors.

Fetch list of posts

A list of posts is any of the HN pages defined in HNScraper.PostListPageName

news front new jobs asks shows newshows active best noob

To scrap the 30 first posts of one of them, you have to use getPostsList (from HNScraper) which will give you an array of HNPost objects and the link to the next page that you can use to fetch the 30 following items.

typealias PostListDownloadCompletionHandler = (([HNPost], String?, HNScrapperError?) -> Void)

func getPostsList(page: PostListPageName, completion: PostListDownloadCompletionHandler)

For example:

HNScraper.shared.getPostsList(page: .news) { (posts, linkForMore, error) in
	// Don't forget to handle the eventual error
	for post in posts {
		print(post.title)
	}
	// You also may want to save the linkForMore somewhere.
}

For the 30+ items, you have to use getMoreItems at which you pass the "link for more" you got with the 30 first items. This will also give you a list of HNPost instances and a link for the next page:

func getMoreItems(linkForMore: String, completionHandler: PostListDownloadCompletionHandler) 

For example:

HNScraper.shared.getMoreItems(linkForMore: "s") { (posts, linkForMore, error) in
	// do whatever you want with the stories
}

Fetch comments about a post

The comments are parsed from a discussion thread (at news.ycombinator.com/item?id=<post_id>). You can fetch those in 2 ways: either with

func getComments(ByPostId postId: String, buildHierarchy: Bool = true, completion: @escaping ((HNPost?, [HNComment], HNScraperError?) -> Void))

or with

func getComments(ForPost post: HNPost, buildHierarchy: Bool = true, completion: @escaping ((HNPost, [HNComment], HNScraperError?) -> Void))

The parameter buildHierarchy indicates if the comments have to be returned in nested (meaning that only the root comments are in the resulting array and they are pointing to their replies) or in linear (flat) format.

With the ByPostId method, the data about the post itself will be parsed to build a HNPost object that is passed to the completion closure.

With the ForPostmethod, the HNPost instance given to the completion closure is the same (unmodified) post you passed to the getCommentsmethod.

For a askHN type of posts, the first comment is the OP's ask itself.

As for job type of posts, there should be no comments.

Fetch informations about a user

You can get the karma, description and age of any user by giving its username to the getUsermethod:

func getUserFrom(Username username: String, completion: ((HNUser?, HNScraperError?) -> Void)?)

Fetch the submissions, comments and favorites of a user

Use the following methods (from HNScraper) according to which list you want to grab:

func getFavorites(ForUserWithUsername username: String, completion: @escaping PostListDownloadCompletionHandler)

func getSubmissions(ForUserWithUsername username: String, completion:  PostListDownloadCompletionHandler)

In the same way as for the list of post described earlier, the completion closure will give you a "link for more" that you can use to fetch more items (in the case there are more than 30 items to fetch of course). You can use the getMoreItems(linkForMore: completionHandler:) method as earlier.

Login / Logout

Those actions are handled by the singleton class HNLogin.

You can login with

func login(username: String, psw: String, completion: @escaping ((HNUser?, HTTPCookie?, HNLoginError?) -> Void))

Once a user has logged in, the HNLoginclass takes care to store the session cookie and make it available to the other classes. In addition, with that cookie saved, the html retrieved from the website by any of the requests made by the scrapper will be as if the user was logged in. Which means that it will contain all the upvote (and eventual downvote, if the user has more than 500 points) links, favorite links, comment links, etc.

The HNLoginclass comes with its own error enum, HNLoginError , which contains the badCredentials case. It's returned as an error in the case of, ... well, wrong credentials.

Example:

HNLogin.shared.login(username: "username", psw: "pass") { (user, cookie, error) in
	if let connected_user = user {
		print("logged in user: " + connected_user.username)
	} else {
		// Handle error
		if error == .badCredentials {
			print("wrong creds")
		} else {
			// Check other types of error
		}
	} 
}

You can logout a user by calling

func logout()

All this method does is delete the stored session cookie, which make the retrieved html from the HN website looking as it would to an unsigned visitor.

Up/Down/Un vote a post/comment

Simply use one of the following methods:

func upvote(Comment comment: HNComment, completion: ((HNScraperError?) -> Void))
func upvote(Post post: HNPost, completion: ((HNScraperError?) -> Void))
func unvote(Post post: HNPost, completion: ((HNScraperError?) -> Void))
func unvote(Comment comment: HNComment, completion: ((HNScraperError?) -> Void))

The user obviously needs to be logged in to do that. Otherwise, an error of type .notLoggedIn is passed to the completion closure.

If the error parameter of the completion closure is nil, then the action was succesfull. Otherwise there was a problem.

Note: an action is considered succesfull when the final state of the item is the one itended by the request. So if you try to unvote a post that hasn't been upvoted, there will be no error. Same if you try to upvote an upvoted item.

(Un)Favorite a post

Use:

func favorite(Post post: HNPost, completion: ((HNScraperError?) -> Void))
func unfavorite(Post post: HNPost, completion: ((HNScraperError?) -> Void)) 

Those methods works in the same ways as the ones for voting on items.

Again, the user needs to be logged in, otherwise an error of type .notLoggedIn is passed to the completion closure.

Structure of the project

Models

The scraper uses 3 models:

  • HNPost
  • HNComment
  • HNUser

Endpoints

There are basically 2 singleton classes that you'll use to make requests:

  • HNScraper
  • HNLogin

Parsing configuration file

hn.json contains most of the informations needed to parse every HN pages. It was introduced in LibHN to

Although some things changed/have been added in it, its structure is the same as in LibHN. The following section is an update of the original version of the LibHN documentation.

This file is downloaded and stored by the singleton class HNParseConfig.

Tests

There are tests for most of the methods in the HNScraperTests folder.

Contribution

Contribution of any kind is welcome.

If you spot an error, you think of an amelioration, you have a suggestion or you g just open an issue or post directly a PR.

Also, I'm not a native English speaker, so don't hesitate to correct some of my sentences :)

TODO

  • Complete the hn.json config file with the rest of hardcoded strings needed for parsing
  • Submit story
  • Post comments
  • Downvote
  • Edit account (about, options, mail, ...)
  • Search
  • Test for mac os apps

License

HNScraper is licensed under the standard MIT License.

Copyright (C) 2017-2018 by Stéphane Sercu

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.