How Web Scraping Is Used In Extracting Mercadolibre Data Using Selenium

The blog will show you how to use Selenium, a strong library whose main feature is the ability to automate the interface of any website. Because most websites are so dynamic nowadays, they do not show in a flat or static order.

Learn the following steps:

  • Explain how to acquire the automated Chrome Driver
  • Import the required libraries from Selenium
  • Discuss the different modes for finding elements in Selenium
  • Explore the different options for retrieving the HTML paths for each element
  • Use it on the MercadoLibre website.

Initially, ensure to download the automized web browser from the following page.

Initially Ensure To Download The

You can establish a new notebook on Jupyter and recollect the relevant libraries and settings after downloading the Chromedriver:


import jovian
import pandas as pd
import random
from time import sleep
!pip install selenium --upgrade
from selenium import webdriver

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException

The following four lines of code are crucial because they allow us to avoid errors while retrieving information and calculating time.

Inspecting the Website
inspecting the website
inspecting the website

Ways of Getting Elements


find_element_by_xpath
find_element_by_css_selector
find_element_by_name
find_element_by_id
find_element_by_class_name
Check out the step wise guide to get the required information
# For using Selenium we must call the driver, also is recommended to accept the cookings
driver = webdriver.Chrome('/Users/nicolasbenavides/Downloads/chromedriver')
driver.get('https://www.mercadolibre.com.co/')
sleep(random.uniform(2.0, 3.0))
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#newCookieDisclaimerButton"))).click()
  • It is critical to identify the uploaded driver and copy the path as shown above.
  • As you may be aware, most websites assign cookies, and we must accept the most recent code created for us.
# For using Selenium we must call the driver, also is recommended to accept the cookings
driver = webdriver.Chrome('/Users/nicolasbenavides/Downloads/chromedriver')
driver.get('https://www.mercadolibre.com.co/')
sleep(random.uniform(2.0, 3.0))
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#newCookieDisclaimerButton"))).click()

driver.find_element_by_xpath('//p[text()="Computación"]').click()
sleep(random.uniform(2.0, 3.0))

Let’s have the marketing computer information. To do so, we’ll look for the term label in the Computers box.

# For using Selenium we must call the driver, also is recommended to accept the cookings
driver = webdriver.Chrome('/Users/nicolasbenavides/Downloads/chromedriver')
driver.get('https://www.mercadolibre.com.co/')
sleep(random.uniform(2.0, 3.0))
WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#newCookieDisclaimerButton"))).click()

# Click on PC's
driver.find_element_by_xpath('//p[text()="Computación"]').click()
sleep(random.uniform(2.0, 3.0))

# Click on Laptops box
driver.find_element_by_xpath('//h3[text()="Portátiles"]').click()

# Create the list where the chosen elements will be stored and define the next click button.
main_list= []
boton = driver.find_element_by_xpath('//span[text()="Siguiente"]')

We define the list to preserve the information in the past two code lines, as well as the next click button.

# In this step it is required to create a loop to access the information depending on the boxes content an ther class link.

for i in range(3):
    main_box = driver.find_elements_by_xpath('//div[@class="ui-search-result__content-wrapper"]')
    
    for pc in main_box:
        precio = pc.find_element_by_xpath('.//span[@class="price-tag-fraction"]').get_attribute("innerHTML")
        
        descripcion = pc.find_element_by_xpath('.//h2[@class="ui-search-item__title"]').get_attribute("innerHTML")
        
        try:
            vat_tag = pc.find_element_by_xpath('.//span[@class="ui-search-styled-label ui-search-item__highlight-label__text"]').get_attribute("innerHTML")
        
            if vat_tag == 'MÁS VENDIDO':
                   vat_tag = None
        except NoSuchElementException:
            vat_tag = None 
            
        
        main_list.append({'Product Description': descripcion, 
                          'Price' : precio,
                          'VAT exclusion': vat_tag})

The following parameters will be defined for this block code.

You can set up a range of 3 for a loop to get the price and description product for the intended search.

Determine whether the product is exempt from Value-added Tax (VAT).

To obtain all of the information required, the Xpath must be specified; this cannot be accomplished by just copying the Xpath.

The VAT information could not be received if the NoSuchElement had not been listed previously.

Check the Below Screenshots of How the Information will Look:
Check The Below Screenshots Of How The Information Will Loo
Check The Below Screenshots Of How The Information Will Look
Check The Below Screenshots Of How The Information Will Look

Finally, we could use the stored data to create a DataFrame:


df_mercadolibre = pd.DataFrame(main_list)
df_mercadolibre
Product Description Price VAT exclusion
0 Portátil Dell V3 3400 I5-1135g7 Ram 16gb Ssd 1... 2.670.000 None
1 Portatil Hp 245 G7 Amd 3020e 8gb 1tb 14 PuLG R... 1.249.900 EXENTO DE IVA
2 Portátil Lenovo Thinkbook Core I3 10ma 12gb 1t... 1.979.000 None
3 Portátil Huawei D15 Corei5 +16g+512g+morral+ M... 3.999.900 None
4 Portatil Lenovo Gaming 10300h Gtx 1650 M2 256+... 3.900.000 None
... ... ... ...
308 Celular Samsung Galaxy S20 Fan Edition 256gb R... 2.429.000 None
309 Celular Xiaomi Poco X3 / 64gb + Forro+mi Band ... 1.129.000 None
310 iPhone 12 Pro Max 256gb Nuevo-sellado-garantia 5.449.999 None
311 Celular Realme 7 Pro 8gb/128gb 4500 Mah 1.039.900 None
312 Celular Xiaomi Poco X3 Pro 128gb 6ram 48mp +... 1.499.000 None
Summary
  • To begin, we installed the Selenium libraries that are required to execute Selenium.
  • Using Selenium, we study how to use the basic command to find elements using XPath.
  • Last but not least, for data loading, defined waiting times were required.
  • We create loops to collect all of the laptops’ specifications and prices.
  • The desired data was preserved in the lists that were constructed.
  • The DataFrame was built with Pandas.

If you are in search of someone who can scrape the MercadoLibre Data using Selenium, then contact X-Byte Enterprise Crawling today or request for a quote!!