How Web Scraping is Used in Extracting MercadoLibre Data Using Selenium?

The blog will show you how to use Selenium, a strong library whose main feature is the ability to automate the interface of any website. Because most websites are so dynamic nowadays, they do not show in a flat or static order.

Learn the following steps:

Explain how to acquire the automated Chrome Driver
Import the required libraries from Selenium
Discuss the different modes for finding elements in Selenium
Explore the different options for retrieving the HTML paths for each element
Use it on the MercadoLibre website.

Initially, ensure to download the automized web browser from the following page.

https://chromedriver.chromium.org/downloads

You can establish a new notebook on Jupyter and recollect the relevant libraries and settings after downloading the Chromedriver:


import jovian
import pandas as pd
import random
from time import sleep
!pip install selenium --upgrade
from selenium import webdriver

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException

The following four lines of code are crucial because they allow us to avoid errors while retrieving information and calculating time.

Inspecting the Website

Ways of Getting Elements


find_element_by_xpath
find_element_by_css_selector
find_element_by_name
find_element_by_id
find_element_by_class_name

Check out the step wise guide to get the required information

# For using Selenium we must call the driver, also is recommended to accept the cookings
driver = webdriver.Chrome('/Users/nicolasbenavides/Downloads/chromedriver')
driver.get('https://www.mercadolibre.com.co/')
sleep(random.uniform(2.0, 3.0))
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#newCookieDisclaimerButton"))).click()

It is critical to identify the uploaded driver and copy the path as shown above.
As you may be aware, most websites assign cookies, and we must accept the most recent code created for us.

# For using Selenium we must call the driver, also is recommended to accept the cookings
driver = webdriver.Chrome('/Users/nicolasbenavides/Downloads/chromedriver')
driver.get('https://www.mercadolibre.com.co/')
sleep(random.uniform(2.0, 3.0))
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#newCookieDisclaimerButton"))).click()

driver.find_element_by_xpath('//p[text()="Computación"]').click()
sleep(random.uniform(2.0, 3.0))

Let’s have the marketing computer information. To do so, we’ll look for the term label in the Computers box.

# For using Selenium we must call the driver, also is recommended to accept the cookings
driver = webdriver.Chrome('/Users/nicolasbenavides/Downloads/chromedriver')
driver.get('https://www.mercadolibre.com.co/')
sleep(random.uniform(2.0, 3.0))
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#newCookieDisclaimerButton"))).click()

# Click on PC's
driver.find_element_by_xpath('//p[text()="Computación"]').click()
sleep(random.uniform(2.0, 3.0))

# Click on Laptops box
driver.find_element_by_xpath('//h3[text()="Portátiles"]').click()

# Create the list where the chosen elements will be stored and define the next click button.
main_list= []
boton = driver.find_element_by_xpath('//span[text()="Siguiente"]')

We define the list to preserve the information in the past two code lines, as well as the next click button.

# In this step it is required to create a loop to access the information depending on the boxes content an ther class link.

for i in range(3):
    main_box = driver.find_elements_by_xpath('//div[@class="ui-search-result__content-wrapper"]')
    
    for pc in main_box:
        precio = pc.find_element_by_xpath('.//span[@class="price-tag-fraction"]').get_attribute("innerHTML")
        
        descripcion = pc.find_element_by_xpath('.//h2[@class="ui-search-item__title"]').get_attribute("innerHTML")
        
        try:
            vat_tag = pc.find_element_by_xpath('.//span[@class="ui-search-styled-label ui-search-item__highlight-label__text"]').get_attribute("innerHTML")
        
            if vat_tag == 'MÁS VENDIDO':
                   vat_tag = None
        except NoSuchElementException:
            vat_tag = None 
            
        
        main_list.append({'Product Description': descripcion, 
                          'Price' : precio,
                          'VAT exclusion': vat_tag})

The following parameters will be defined for this block code.

You can set up a range of 3 for a loop to get the price and description product for the intended search.

Determine whether the product is exempt from Value-added Tax (VAT).

To obtain all of the information required, the Xpath must be specified; this cannot be accomplished by just copying the Xpath.

The VAT information could not be received if the NoSuchElement had not been listed previously.

Check the Below Screenshots of How the Information will Look:

Finally, we could use the stored data to create a DataFrame:


df_mercadolibre = pd.DataFrame(main_list)
df_mercadolibre

	Product Description	Price	VAT exclusion
0	Portátil Dell V3 3400 I5-1135g7 Ram 16gb Ssd 1...	2.670.000	None
1	Portatil Hp 245 G7 Amd 3020e 8gb 1tb 14 PuLG R...	1.249.900	EXENTO DE IVA
2	Portátil Lenovo Thinkbook Core I3 10ma 12gb 1t...	1.979.000	None
3	Portátil Huawei D15 Corei5 +16g+512g+morral+ M...	3.999.900	None
4	Portatil Lenovo Gaming 10300h Gtx 1650 M2 256+...	3.900.000	None
...	...	...	...
308	Celular Samsung Galaxy S20 Fan Edition 256gb R...	2.429.000	None
309	Celular Xiaomi Poco X3 / 64gb + Forro+mi Band ...	1.129.000	None
310	iPhone 12 Pro Max 256gb Nuevo-sellado-garantia	5.449.999	None
311	Celular Realme 7 Pro 8gb/128gb 4500 Mah	1.039.900	None
312	Celular Xiaomi Poco X3 Pro 128gb 6ram 48mp +...	1.499.000	None

Summary

To begin, we installed the Selenium libraries that are required to execute Selenium.
Using Selenium, we study how to use the basic command to find elements using XPath.
Last but not least, for data loading, defined waiting times were required.
We create loops to collect all of the laptops’ specifications and prices.
The desired data was preserved in the lists that were constructed.
The DataFrame was built with Pandas.

If you are in search of someone who can scrape the MercadoLibre Data using Selenium, then contact X-Byte Enterprise Crawling today or request for a quote!!

Inspecting the Website

Check out the step wise guide to get the required information

Check the Below Screenshots of How the Information will Look:

Summary

About Us

Services

Industries

Quick Links

How Web Scraping Is Used In Extracting Mercadolibre Data Using Selenium?

December 20, 2021

Inspecting the Website

Check out the step wise guide to get the required information

Check the Below Screenshots of How the Information will Look:

Summary