I noticed that you don't use tweepy (https://github.com/twintproject/twint#requi...

detaro · on April 13, 2020

Tweepy is for API access, this is a scraper.

faizshah · on April 13, 2020

And the main advantage is you don’t need to authenticate and you aren’t rate limited.

_____smurf_____ · on April 13, 2020

A question I asked before, but I get different answers. what are the -legal- limitations of scraping data when we have an a limited access API

faizshah · on April 16, 2020

The problem is that there isn't a straight answer to this, see this recent thread: https://news.ycombinator.com/item?id=22180559

It kind of comes down to how well you can defend yourself from it being called a DOS attack (follow politeness standards and robots.txt), from violating their copyright (generally not problematic if you don't distribute the data), and from violating their terms of service (this is key in the case of twitter and reddit, carefully read their TOS).

However, the scraping of public information like in the case of tweets or reddit posts is the less problematic part. It's when you distribute the data or aggregations of the data that it could be problematic to use scraped public information.