Skip to content

Retrieve data from a facebook group/page and get the statistics.

License

Notifications You must be signed in to change notification settings

arthurdjn/facebook-hall-of-fame

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

8 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Bot logo

Scrape posts, reactions, comments, replies in facebook groups

Status GitHub Issues GitHub Pull Requests License


๐Ÿค– A Facebook robot used to scrape data in your favorite group. You can retrieve statistics per user, and publish/edit posts automatically, without an API key.

โš ๏ธ Disclaimer

Use only this project with people that consent to share personal data, or data publicly available. Hence, the developers of this tool won't be responsible for any misuse of data collected using this tool.

๐Ÿง About

The purpose of this API is to extract statistics from a group, for example the user who published the best post (post with the highest reactions). You can check the list of built-in statistics for more details.

This bot is used to retrieve data in public/private groups. You must be part of the group to scrape its content. Then, you can retrieve textual data and post information (user, date, reactions/types, comments, replies) within a group.

Status

Development Status Feature
Group finished
  • Public
  • Private
Post finished
  • Reactions
  • Users
  • Comments
Comment finished
  • Reactions
  • Users
  • Replies
API finished
  • Connect
  • Publish Posts
  • Edit Posts
Statistics finished
  • Best Posts
  • Best Comments
  • Best Replies
  • Most Posts
  • Most Comments
  • Most Replies
  • Most Reactions
  • etc.

๐ŸŽฅ Demo

You can also generate statistics with built-in functions and template:

๐Ÿ‘‘ ๐Œ๐ž๐ฆ๐ž๐ฌ ๐€๐œ๐ญ๐ฎ๐ฌ ๐Ÿ‘‘  
Here is a template example for a meme group.

๐Ÿ”ฅ ๐—š๐—น๐—ผ๐—ฏ๐—ฎ๐—น ๐—ฅ๐—ฎ๐—ป๐—ธ๐—ถ๐—ป๐—ด ๐Ÿ”ฅ  

๐Ÿ… ๐˜ฝ๐™š๐™จ๐™ฉ ๐™ˆ๐™š๐™ข๐™š๐™จ  
๐Ÿฅ‡ Top 1 User  
๐Ÿฅˆ Top 2 User  
๐Ÿฅ‰ Top 3 User  

๐‘’๐‘ก๐‘...

๐Ÿ”ฅ ๐—›๐—ผ๐—ป๐—ผ๐—ฟ๐˜€ ๐Ÿ”ฅ

๐Ÿ“ˆ ๐™ˆ๐™ค๐™จ๐™ฉ ๐˜ผ๐™˜๐™ฉ๐™ž๐™ซ๐™š    
๐Ÿฅ‡ Top 1 User  
๐Ÿฅˆ Top 2 User    
๐Ÿฅ‰ Top 3 User   

๐‘’๐‘ก๐‘...

๐Ÿ”ฅ ๐—ฅ๐—ฒ๐—ฎ๐—ฐ๐˜๐—ถ๐—ผ๐—ป๐˜€ ๐Ÿ”ฅ  

๐Ÿ˜† ๐™๐™ช๐™ฃ๐™ฃ๐™ž๐™š๐™จ๐™ฉ  
๐Ÿฅ‡ Top 1 User  
๐Ÿฅˆ Top 2 User  
๐Ÿฅ‰ Top 3 User  

๐‘’๐‘ก๐‘...

๐Ÿ•™ Message generated at 2020-12-12T19:44:10.600157

You can of course create your own template.

๐Ÿ’ญ How it works

The bot need a valid account to extract information. Then, it scrapes all posts in a Facebook group feed. For each post, it extracts the reactions, comments, replies and their respective reactions.

The scraping process is made with a Firefox webdriver, a.k.a geckodriver. You can download one here.

โ›๏ธ Built Using

  • BeautifulSoup
  • Selenium
  • Python 3.8

๐Ÿ Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Before using this project, make sure you have Python 3.8 and Git. You should also download a webdriver. You can find geckodriver here.

Installing

First, clone this repository using git in your terminal:

git clone https://github.com/arthurdjn/facebook-hall-of-fame

Then, install the dependencies using pip (should be installed with Anaconda) in your Anaconda terminal:

pip install -r requirements.txt

If you have any issues using the above command, try installing each package separately, using:

pip install NameOfPackage

๐ŸŽˆ Usage

โš ๏ธ Warning !

To block scraping process and protect user's data, facebook uses dynamic CSS sheets. Thus, when a facebook page is refreshed or loaded, the HTML elements id change over time. To bypass this issue, this package uses a table of known elements, so the bot can be aware when an id changed in the CSS when a page refreshes.

You will need to provide URLs to unique reactions. You can achieved that by creating a group, create multiple posts and associate for each a unique reaction (LOVE, AHAH, LIKE etc.). Then, click on the reaction and copy/paste its URL. Note that you should use the mobile version of facebook to retrive the URL and not the standard version of facebook.

API

Connect to the api:

# Global parameters
EXECUTABLE_PATH = "driver/geckodriver.exe"
REACTION2HREF = {
    "LIKE":  "/ufi/reaction/profile/browser/?ft_ent_identifier=",
    "LOVE":  "/ufi/reaction/profile/browser/?ft_ent_identifier=",
    "CARE":  "/ufi/reaction/profile/browser/?ft_ent_identifier=",
    "AHAH":  "/ufi/reaction/profile/browser/?ft_ent_identifier=",
    "WOW":   "/ufi/reaction/profile/browser/?ft_ent_identifier=",
    "SAD":   "/ufi/reaction/profile/browser/?ft_ent_identifier=",
    "ANGER": "/ufi/reaction/profile/browser/?ft_ent_identifier="
}

EMAIL = "your_email"
PASSWORD = "your_password"

# Connect to the API
from halloffame import HallOfFameAPI

api = HallOfFameAPI(executable_path=EXECUTABLE_PATH, reaction2href=REACTION2HREF)
api.login(EMAIL, PASSWORD)

# Initialize the table of reactions
api.init_reactions()

Then, connect to a group and start scraping:

# To retrieve everything (posts, comments, reactions)
posts = api.get_posts("your_group_id")

# To retrieve comments
comments = api.get_comments("your_group_id", "your_post_id")

# To retrieve reactions
reactions = api.get_reactions("your_post_id")

Statistics

Statistics Description
BEST-POST-REACTION Ordered list of posts by their number of reactions (all categories)).
BEST-COMMENT-REACTION Ordered list of comments by their number of reactions (all categories)).
BEST-REPLY-REACTION Ordered list of replies by their number of reactions (all categories)).
POST-COUNT Ordered list of user by their number of posts.
REACTION-COUNT Ordered list of user by their number of reactions.
COMMENT-REPLY-COUNT Ordered list of user by their number of comments and replies.
COMMENT-COUNT Ordered list of user by their number of comments only.
REPLY-COUNT Ordered list of user by their number of replies only.
REACTION-AHAH Ordered list of user by their number of AHAH reaction.
REACTION-LOVE Ordered list of user by their number of LOVE reaction.
REACTION-CARE Ordered list of user by their number of CARE reaction.
REACTION-WOW Ordered list of user by their number of WOW reaction.
REACTION-SAD Ordered list of user by their number of SAD reaction.
REACTION-ANGER Ordered list of user by their number of ANGER reaction.
REACTION-LIKE Ordered list of user by their number of LIKE reaction.

You can compute the statistics from a list posts of Post using:

from halloffame import get_top_stats

stats = get_top_stats(posts)
stats = {
    "BEST-POST-REACTION": [
        {
            "user_id": 97987,
            ...
        },
        ...
    ],
    ...
}

Templates

To apply statistics in a facebook post, you can use a template: it will fasten your workflow. Simply write the general structure of your text and wrap the elements that sill change (either stats or fonts) with << >> tags.

For example, to apply a bold font on the text this is a text, simply use <<BOLD>>this is a text<<BOLD>>. Same for statistics: <<TOP1-BEST-POST-REACTION>>. Note that BEST-POST-REACTION is a list, so to get the first user add the token TOP1 (and TOP2 for the second etc.). You can also use both together: <<BOLD>><<TOP1-BEST-POST-REACTION>><<BOLD>>

template = """
๐Ÿ‘‘ <<BOLD-SERIF>>Hall Of Fame<<BOLD-ITALIC>> ๐Ÿ‘‘  
Here is a template example for a meme group.

๐Ÿ”ฅ <<BOLD>>Rank<<BOLD>> ๐Ÿ”ฅ  

๐Ÿ… <<BOLD-ITALIC>>Best Memes<<BOLD-ITALIC>>  
๐Ÿฅ‡ <<TOP1-BEST-POST-REACTION>>  
๐Ÿฅˆ <<TOP2-BEST-POST-REACTION>>   
๐Ÿฅ‰ <<TOP3-BEST-POST-REACTION>>  

๐‘’๐‘ก๐‘...

๐Ÿ”ฅ <<BOLD>>Honors<<BOLD>> ๐Ÿ”ฅ

๐Ÿ“ˆ <<BOLD-ITALIC>>Most Active<<BOLD-ITALIC>>      
๐Ÿฅ‡ <<TOP1-POST-COUNT>>  
๐Ÿฅˆ <<TOP2-POST-COUNT>>  
๐Ÿฅ‰ <<TOP3-POST-COUNT>>   

๐‘’๐‘ก๐‘...

๐Ÿ”ฅ <<BOLD>>Reactions<<BOLD>> ๐Ÿ”ฅ  

๐Ÿ˜† <<BOLD-ITALIC>>Funniest<<BOLD-ITALIC>>  
๐Ÿฅ‡ <<TOP1-REACTION-AHAH>>  
๐Ÿฅˆ <<TOP2-REACTION-AHAH>>  
๐Ÿฅ‰ <<TOP3-REACTION-AHAH>>  

๐‘’๐‘ก๐‘...

๐Ÿ•™ Message generated at <<DATE-NOW>> 
"""

Then,

from halloffame import apply_template, get_top_stats

stats = get_top_stats(posts)
generated_text = apply_template(template, stats)