how to scrape amazon data without an api

We all know that Amazon has its own an API, but sometimes, it’s good not to use this API (request limit is a big reason), so let’s see how to extract Amazon data without an API!

Initially, we would require to install Python. Python has many web data scraping packages, Selenium and BeautifulSoup are some, in the tutorial we will use Selenium.

Amazon Website

Initially, let’s make a way to Amazon’s site as well as look at the product. Just see the laptop example.

amazon website

You can see that there are lots of data just in the screenshot including title, ratings, prices, and more. Let’s say we want to extract the title, pricing, ratings, etc. and how we will scrape them.

Development of a Code

Starting with the Python environment, we wish to import a Selenium package, Panda package, as well as a webdriver manager we use for the product, we perform that by utilizing the following lines of code:

#IMPORT THESE PACKAGES
import selenium
from selenium import webdriver
import pandas as pd
#OPTIONAL PACKAGE, BUY MAYBE NEEDED
from webdriver_manager.chrome import ChromeDriverManager

Next, we wish to install as well as declare the driver as well as point that to the website and the driver is a web browser we are using. To do that, use this code:

#THIS INITIALIZES THE DRIVER (AKA THE WEB BROWSER)
driver = webdriver.Chrome(ChromeDriverManager().install())
#THIS PRETTY MUCH TELLS THE WEB BROWSER WHICH WEBSITE TO GO TO
driver.get('https://www.amazon.com/Acer-Display-Graphics-Keyboard-A515-43-R19L/dp/B07RF1XD36/ref=sr_1_3?dchild=1&keywords=laptop&qid=1618857971&sr=8-3')

The driver.get function tells a browser that website we wish to go to. After that, we wish to provide 2 variables including title and pricing, it will hold text of the values from a website, this will make sense in the second, then we would use a Selenium function named driver.find_element_by_xpath() for getting text from a website as well as store that within those variables, that is how we will do that:

#TITLE OF PRODUCT
Title = driver.find_element_by_xpath('PASTE THE FULL XPATH HERE').text
#PRICE OF PRODUCT
Price = driver.find_element_by_xpath('PASTE THE FULL XPATH HERE').text
#NUMBER OF RATINGS
Rating = driver.find_element_by_xpath('PASTE THE FULL XPATH HERE').text

After that, we wish to get complete xpath, for doing that, we wish to open a web browser and go to the particular web page as well as right click on any text of the title and click inspect > to look at highlighted portion of inspector console > and then right click on that as well as click copy > and click on the copy full xpath, utilize the given image as the source:

xpath Copy

Then, we wish to copy- paste the xpath in between quotes inside a Title variable beyond, that variable will look something like that:

Title = driver.find_element_by_xpath(‘/html/body/div[2]/div[3] We then want to /div[9]/div[4]/div[4]/div[1]/div/h1/span’).text

Wonderful! Now, let’s do similar thing for pricing, just right click on any text on a price, then click on inspect > look at highlighted part on an inspector console > right click on that as well as click copy > click on the copy complete xpath, utilize the given image as a source:

xpath 2

After that, paste it in a price variable given above, the pricing variable would look like that:

Price = driver.find_element_by_xpath(‘/html/body/div[2]/div[3]/div[9]/div[4]/div[4]/div[10]/div[1]/div/table/tbody/tr/td[2]/span[1]’).text

In the end, let’s do that total ratings also, just right click on any of these text on ratings, and click on inspect > look at highlighted part on an inspector console > then right click on it as well as click on copy > click on copy complete xpath, utilize the provided image as a source:

xpath 3
Rating = driver.find_element_by_xpath(‘/html/body/div[2]/div[3]/div[8]/div[4]/div[4]/div[3]/div/span[3]/a/span’).text

Great! Now, we just need to create empty Pandas data frames using variable names and append it to the data frame. For doing this, utilize the given codes:

#CREATES A EMPTY DATAFRAME
data1 = {'Title':[], 'Price':[], 'Rating':[],}
fulldf = pd.DataFrame(data1)

Nearly done! Now, append data from a variable into other variable and append data into the panda data frame and utilize the given lines of codes to perform that:

#APPENDING THE DATA PULLED FROM ABOVE INTO THE EXISTING DATAFRAME
row = [Title, Price, Rating]
fulldf.loc[len(fulldf)] = row

Amazing! That is all of a code (having some additional) we have developed in the project:

#IMPORT THESE PACKAGES
import selenium
from selenium import webdriver
import pandas as pd
#OPTIONAL PACKAGE, BUY MAYBE NEEDED
from webdriver_manager.chrome import ChromeDriverManager
#THIS INITIALIZES THE DRIVER (AKA THE WEB BROWSER)
driver = webdriver.Chrome(ChromeDriverManager().install())
#THIS PRETTY MUCH TELLS THE WEB BROWSER WHICH WEBSITE TO GO TO
driver.get(‘https://www.amazon.com/Acer-Display-Graphics-Keyboard-A515-43-R19L/dp/B07RF1XD36/ref=sr_1_3?dchild=1&keywords=laptop&qid=1618857971&sr=8-3')
#TITLE OF PRODUCT
Title = driver.find_element_by_xpath(‘/html/body/div[2]/div[3]/div[9]/div[4]/div[4]/div[1]/div/h1/span’).text
#PRICE OF PRODUCT
Price = driver.find_element_by_xpath(‘/html/body/div[2]/div[3]/div[9]/div[4]/div[4]/div[10]/div[1]/div/table/tbody/tr/td[2]/span[1]’).text
Rating = driver.find_element_by_xpath(‘/html/body/div[2]/div[3]/div[8]/div[4]/div[4]/div[3]/div/span[3]/a/span’).text
#PRINTS OUT THE DATA PULLED FROM ABOVE
print(Title)
print(Price)
print(Rating)
#CREATES A EMPTY DATAFRAME
data1 = {'Title':[], 'Price':[], 'Rating':[],}
fulldf = pd.DataFrame(data1)
#APPENDING THE DATA PULLED FROM ABOVE INTO THE EXISTING DATAFRAME
row = [Title, Price, Rating]
fulldf.loc[len(fulldf)] = row

Run the Program

Two main ways are there for running this program, the first one is running a .py file in the command prompt or terminal or run the program line by line. Nevertheless, whenever you run a program, you would see a Chrome browser popping up on the display, navigating an Amazon product page as well as title and price would print on the Python console!

running the program

Overwhelming! You have extracted data from Amazon! You can look into different ways to improve the project, make the front-end with Streamlit! We hope that you have enjoyed reading the blog! In case, you have any thoughts, suggestions, or comments, just write down in the given section below. You can also contact for all your Amazon Data Scraping requirements or ask for a free quote!