Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrapping problems with Facebook and Instagram #249

Closed
Huertas97 opened this issue Jul 3, 2021 · 2 comments
Closed

Scrapping problems with Facebook and Instagram #249

Huertas97 opened this issue Jul 3, 2021 · 2 comments
Labels
duplicate This issue or pull request already exists

Comments

@Huertas97
Copy link

Huertas97 commented Jul 3, 2021

I've tried to scrape some information from Facebook and Instagram with no success.
In the case of Facebook groups I got this result

snscrape -vv facebook-group aracnidosibericos

[Out]
2021-07-03 14:11:44.068 INFO snscrape.base Retrieving https://www.facebook.com/groups/aracnidosibericos/
2021-07-03 14:11:44.068 DEBUG snscrape.base ... with headers: {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36', 'Accept-Language': 'en-US,en;q=0.5'}
2021-07-03 14:11:44.068 DEBUG urllib3.connectionpool Starting new HTTPS connection (1): www.facebook.com:443
2021-07-03 14:11:44.197 DEBUG urllib3.connectionpool https://www.facebook.com:443 "GET /groups/aracnidosibericos/ HTTP/1.1" 302 0
2021-07-03 14:11:44.198 DEBUG urllib3.connectionpool Starting new HTTPS connection (1): m.facebook.com:443
2021-07-03 14:11:44.390 DEBUG urllib3.connectionpool https://m.facebook.com:443 "GET /groups/aracnidosibericos/ HTTP/1.1" 302 0
2021-07-03 14:11:44.636 DEBUG urllib3.connectionpool https://m.facebook.com:443 "GET /login.php?next=https%3A%2F%2Fm.facebook.com%2Fgroups%2Faracnidosibericos%2F&refsrc=deprecated&_rdr HTTP/1.1" 200 None
2021-07-03 14:11:44.645 DEBUG snscrape.base https://www.facebook.com/groups/aracnidosibericos/ retrieved successfully
2021-07-03 14:11:44.665 CRITICAL snscrape.cli Dumped stack and locals to /tmp/snscrape_locals_bpqai2h6
Traceback (most recent call last):
File "/usr/local/bin/snscrape", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/snscrape/cli.py", line 230, in main
for i, item in enumerate(scraper.get_items(), start = 1):
File "/usr/local/lib/python3.7/dist-packages/snscrape/modules/facebook.py", line 201, in get_items
raise snscrape.base.ScraperException('Code container ID marker not found (does the group exist?)')
snscrape.base.ScraperException: Code container ID marker not found (does the group exist?)

Snscrape base suceffuly catches the correct group's URL (https://www.facebook.com/groups/aracnidosibericos/) but an error raises.

In the case of Facebook and Instagram users search, no result is retrieved (I've used Cristiano Ronaldo account in this example):

snscrape -vv facebook-user Cristiano

[Out]
2021-07-03 14:06:44.121 INFO snscrape.modules.facebook Retrieving initial data
2021-07-03 14:06:44.122 INFO snscrape.base Retrieving https://www.facebook.com/Cristiano/
2021-07-03 14:06:44.122 DEBUG snscrape.base ... with headers: {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36', 'Accept-Language': 'en-US,en;q=0.5'}
2021-07-03 14:06:44.122 DEBUG urllib3.connectionpool Starting new HTTPS connection (1): www.facebook.com:443
2021-07-03 14:06:44.246 DEBUG urllib3.connectionpool https://www.facebook.com:443 "GET /Cristiano/ HTTP/1.1" 302 0
2021-07-03 14:06:44.248 DEBUG urllib3.connectionpool Starting new HTTPS connection (1): m.facebook.com:443
2021-07-03 14:06:44.602 DEBUG urllib3.connectionpool https://m.facebook.com:443 "GET /Cristiano/ HTTP/1.1" 302 0
2021-07-03 14:06:44.833 DEBUG urllib3.connectionpool https://m.facebook.com:443 "GET /login.php?next=https%3A%2F%2Fm.facebook.com%2FCristiano%2F&refsrc=deprecated&_rdr HTTP/1.1" 200 None
2021-07-03 14:06:44.950 DEBUG snscrape.base https://www.facebook.com/Cristiano/ retrieved successfully
2021-07-03 14:06:44.960 INFO snscrape.cli Done, found 0 results

!snscrape -vv instagram-user Cristiano

2021-07-03 14:18:32.571 INFO snscrape.modules.instagram Retrieving initial data
2021-07-03 14:18:32.572 INFO snscrape.base Retrieving https://www.instagram.com/Cristiano/
2021-07-03 14:18:32.572 DEBUG snscrape.base ... with headers: {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
2021-07-03 14:18:32.572 DEBUG urllib3.connectionpool Starting new HTTPS connection (1): www.instagram.com:443
2021-07-03 14:18:32.715 DEBUG urllib3.connectionpool https://www.instagram.com:443 "GET /Cristiano/ HTTP/1.1" 302 0
2021-07-03 14:18:32.904 DEBUG urllib3.connectionpool https://www.instagram.com:443 "GET /accounts/login/ HTTP/1.1" 200 20211
2021-07-03 14:18:32.910 DEBUG snscrape.base https://www.instagram.com/Cristiano/ retrieved successfully
2021-07-03 14:18:32.929 CRITICAL snscrape.cli Dumped stack and locals to /tmp/snscrape_locals_u20c35uq
Traceback (most recent call last):
File "/usr/local/bin/snscrape", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/snscrape/cli.py", line 230, in main
for i, item in enumerate(scraper.get_items(), start = 1):
File "/usr/local/lib/python3.7/dist-packages/snscrape/modules/instagram.py", line 114, in get_items
raise snscrape.base.ScraperException('Redirected to login page')
snscrape.base.ScraperException: Redirected to login page

Am I doing something wrong? I know that it would sound a little picky, but I would appreciate further documentation (cli and pythonic documentation)

@JustAnotherArchivist
Copy link
Owner

snscrape.base.ScraperException: Code container ID marker not found (does the group exist?)

Duplicate of #121

facebook-user

Yeah, thanks, that's a bug. Filed separately as #250.

snscrape.base.ScraperException: Redirected to login page

Duplicate of #165

documentation

Yeah, I've been meaning to write documentation for a while (#6 and #7) but haven't found the time yet.

@JustAnotherArchivist JustAnotherArchivist added the duplicate This issue or pull request already exists label Jul 3, 2021
@kientranasia
Copy link

kientranasia commented Aug 11, 2021

Is it fixed? So I have a same issue !

> snscrape -vv facebook-group tamsu.content
2021-08-11 11:15:31.140  INFO  snscrape.base  Retrieving https://upload.facebook.com/groups/tamsu.content/?sorting_setting=CHRONOLOGICAL
2021-08-11 11:15:31.140  DEBUG  snscrape.base  ... with headers: {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36', 'Accept-Language': 'en-US,en;q=0.5'}
2021-08-11 11:15:31.142  DEBUG  urllib3.connectionpool  Starting new HTTPS connection (1): upload.facebook.com:443
2021-08-11 11:15:31.587  DEBUG  urllib3.connectionpool  https://upload.facebook.com:443 "GET /groups/tamsu.content/?sorting_setting=CHRONOLOGICAL HTTP/1.1" 302 0
2021-08-11 11:15:31.593  DEBUG  urllib3.connectionpool  Starting new HTTPS connection (1): www.facebook.com:443
2021-08-11 11:15:32.040  DEBUG  urllib3.connectionpool  https://www.facebook.com:443 "GET /groups/tamsu.content/?sorting_setting=CHRONOLOGICAL HTTP/1.1" 302 0
2021-08-11 11:15:32.048  DEBUG  urllib3.connectionpool  Starting new HTTPS connection (1): m.facebook.com:443
2021-08-11 11:15:33.914  DEBUG  urllib3.connectionpool  https://m.facebook.com:443 "GET /groups/tamsu.content/?sorting_setting=CHRONOLOGICAL HTTP/1.1" 200 None
2021-08-11 11:15:33.988  DEBUG  snscrape.base  https://upload.facebook.com/groups/tamsu.content/?sorting_setting=CHRONOLOGICAL retrieved successfully
2021-08-11 11:15:34.105  CRITICAL  snscrape._cli  Dumped stack and locals to C:\Users\***\AppData\Local\Temp\snscrape_locals_pnr4d006
Traceback (most recent call last):
  File "C:\Users\***\AppData\Roaming\Python\Python39\Scripts\snscrape-script.py", line 33, in <module>
    sys.exit(load_entry_point('snscrape==0.3.5.dev121+gf9a3faf', 'console_scripts', 'snscrape')())
  File "C:\Users\***\AppData\Roaming\Python\Python39\site-packages\snscrape\_cli.py", line 280, in main
    for i, item in enumerate(scraper.get_items(), start = 1):
  File "C:\Users\***\AppData\Roaming\Python\Python39\site-packages\snscrape\modules\facebook.py", line 322, in get_items
    raise snscrape.base.ScraperException('Code container ID marker not found (does the group exist?)')
snscrape.base.ScraperException: Code container ID marker not found (does the group exist?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

3 participants