Working Around API Request Limits with Bulk IP proxy

( Note: this post is purely for experimentation and learning purposes. Sites have rate limits specifically to avoid data collection, because their data is propietary and core to their business. Additionally, on smaller sites especially, excessive scraping can DDOS the site and cause expensive resource utilization. Please respect a web site’s wishes. )

Most websites have API request limits based on IP which make scraping or bulk data collection on their sites prohibitive.

Scraping and data collection requires a LOT of requests for each new page of data which would be rather quickly rate limited, slowing down your data collection to unsatisfactory speeds.

So what’s the solution? Send each request through a unique IP!

There exists companies which specialize in Bulk IP proxying, and have bulk amounts of IP’s, which, for a small fee, will let you proxy all of your requests through their network which, if all goes well, should give you unique IP per request goodness.

I found one of these IP providers called IPRoyal.

IPRoyal claims to have a network of 8 million residential IP’s!!!

I signed up for IPRoyal and forked over the 7 bucks to see what happens.


To begin, let’s run a non-proxied reddit.com query with a curl statment to see how quickly we get rate limited.

As everyone knows, 200 status code is a successful request and 429 status code is a blocked reqest due to rate limit.

Let’s see if we will get blocked by reddit.

(Note: These are async responses, so the responses are unordered. Had to async these for speed. )

nick@nick-XPS-9315:~$ for i in {1..1000}; do /bin/bash -c 'curl  -sL -o /dev/null -w "%{http_code} " https://www.reddit.com &'; done;
nick@nick-XPS-9315:~$ 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 200 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 429 200 429 429 429 429 429 429 429 429 429 429 429 429 429 429 200 429 429 200 429 200 200 429 200 200 200 429 200 429 429 200 429 200 200 200 429 429 200 429 200 200 200 429....

Rate limited almost immediately lol.


So let’s try with the IPRoyal network.

So, for 7 bucks IPRoyal gives you 1gig worth of data.

To use their network you are provided with a hash/url combo which you can configure as a proxy via proxy application or combine for curl.

Now let’s try with the IPRoyal proxy. :) (-x being the curl proxy flag)

IPRoyalProxy="http://<someBigHash>@geo.iproyal.com:12321"
nick@nick-XPS-9315:~$ for i in {1..1000}; do /bin/bash -c "curl  -s -o /dev/null -w \"%{http_code} \" -I -x $IPRoyalProxy -L https://www.reddit.com &"; done;
nick@nick-XPS-9315:~$ 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 000 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 .....


Not a single 429!!
IPRoyal is working as advertised.
In the next post we’re going to combine this with some sort of scraper or web spider to try and scrape all of the comment’s of a Reddit user.


#networking