Consider that you are extracting products from flipkart and you also want to extract other 100 products from all the categories, however you are incapable to utilize this technique as it grabs the initial 15 products from the page.
Flipkart is having a feature named infinite scrolling therefore there are no pagination (like? page=2, page=3) within the URL. In case, it had the feature we might have entered a value in the “while loop” as well as incremented page values like we have given below.
page_count = 0 while page_count < 5: url = "http://example.com?page=%d" %(page_count) # scraping code... page_count += 1
Now, let’s get back to the infinite scrolling.
“Ajax” allows any website of using infinite scrolling. However, the ajax request has the URL from which products gets loaded on the similar pages on scroll.
To observe the URL.
- Open a page in the Google Chrome
- After that, go to a console and right click as well as allow LogXMLHttpRequests.
- Now reload a page as well as scroll down slowly. While the new products get populated, you would see various URLs called after “XHR finished loading: GET” and click on it. Flipkart has various kinds of URLs. The one that you are searching for begins with “flipkart.com/lc/pr/pv1/spotList1/spot1/productList?p=blahblahblah&lots_of_crap”
- Then left click on the URL and this would be highlighted within a Network tab of Chrome dev tools. From that, you could copy the URL or open that in the new window. (here is the image)
Whenever you open a link in a new tab then you would see something like that with about 15-20 products every page.
1. You can observe that merely 15 products again! However, we want all these products”.
So, just check a URL there like Get a parameter called ?start=(any number) Now for the initial 20 products, set a number to 0; and for the next 20, get a number to 21 as well as in case, there are 15 products every page with 0, 16, 31 etc. Iterate the URL in a while loop including we have showed you before and you will be done.
2. Again facing any problem the where are the images?
Just right click to view the page source about the URL, you would see the tag having data-src=”” attribute; which is your product’s image..
It is an example about Flipkart.com only. Various websites might have various Ajax URLs and various get parameters on a URL.
Some websites might also have the “JSON” responses within Ajax URLs. In case, you get them you don’t have to utilize scraping; only access the JSON response including any JSON API that you have utilized before.
In case of any doubts please make your comments in the below section or contact X-Byte Enterprise Crawling or ask for a free quote!