Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I noticed that you don't use tweepy (https://github.com/twintproject/twint#requirements). Can you highlight the difference?


Tweepy is for API access, this is a scraper.


And the main advantage is you don’t need to authenticate and you aren’t rate limited.


A question I asked before, but I get different answers. what are the -legal- limitations of scraping data when we have an a limited access API


The problem is that there isn't a straight answer to this, see this recent thread: https://news.ycombinator.com/item?id=22180559

It kind of comes down to how well you can defend yourself from it being called a DOS attack (follow politeness standards and robots.txt), from violating their copyright (generally not problematic if you don't distribute the data), and from violating their terms of service (this is key in the case of twitter and reddit, carefully read their TOS).

However, the scraping of public information like in the case of tweets or reddit posts is the less problematic part. It's when you distribute the data or aggregations of the data that it could be problematic to use scraped public information.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: