The blog will show you how to use Selenium, a strong library whose main feature is the ability to automate the interface of any website. Because most websites are so dynamic nowadays, they do not show in a flat or static order.
Learn the following steps:
- Explain how to acquire the automated Chrome Driver
- Import the required libraries from Selenium
- Discuss the different modes for finding elements in Selenium
- Explore the different options for retrieving the HTML paths for each element
- Use it on the MercadoLibre website.
Initially, ensure to download the automized web browser from the following page.
You can establish a new notebook on Jupyter and recollect the relevant libraries and settings after downloading the Chromedriver:
import jovian
import pandas as pd
import random
from time import sleep
!pip install selenium --upgrade
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
The following four lines of code are crucial because they allow us to avoid errors while retrieving information and calculating time.
Inspecting the Website
Ways of Getting Elements
find_element_by_xpath
find_element_by_css_selector
find_element_by_name
find_element_by_id
find_element_by_class_name
Check out the step wise guide to get the required information
# For using Selenium we must call the driver, also is recommended to accept the cookings driver = webdriver.Chrome('/Users/nicolasbenavides/Downloads/chromedriver') driver.get('https://www.mercadolibre.com.co/') sleep(random.uniform(2.0, 3.0)) WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#newCookieDisclaimerButton"))).click()
- It is critical to identify the uploaded driver and copy the path as shown above.
- As you may be aware, most websites assign cookies, and we must accept the most recent code created for us.
# For using Selenium we must call the driver, also is recommended to accept the cookings driver = webdriver.Chrome('/Users/nicolasbenavides/Downloads/chromedriver') driver.get('https://www.mercadolibre.com.co/') sleep(random.uniform(2.0, 3.0)) WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#newCookieDisclaimerButton"))).click() driver.find_element_by_xpath('//p[text()="Computación"]').click() sleep(random.uniform(2.0, 3.0))
Let’s have the marketing computer information. To do so, we’ll look for the term label in the Computers box.
# For using Selenium we must call the driver, also is recommended to accept the cookings driver = webdriver.Chrome('/Users/nicolasbenavides/Downloads/chromedriver') driver.get('https://www.mercadolibre.com.co/') sleep(random.uniform(2.0, 3.0)) WebDriverWait(driver,20).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"button#newCookieDisclaimerButton"))).click() # Click on PC's driver.find_element_by_xpath('//p[text()="Computación"]').click() sleep(random.uniform(2.0, 3.0)) # Click on Laptops box driver.find_element_by_xpath('//h3[text()="Portátiles"]').click() # Create the list where the chosen elements will be stored and define the next click button. main_list= [] boton = driver.find_element_by_xpath('//span[text()="Siguiente"]')
We define the list to preserve the information in the past two code lines, as well as the next click button.
# In this step it is required to create a loop to access the information depending on the boxes content an ther class link. for i in range(3): main_box = driver.find_elements_by_xpath('//div[@class="ui-search-result__content-wrapper"]') for pc in main_box: precio = pc.find_element_by_xpath('.//span[@class="price-tag-fraction"]').get_attribute("innerHTML") descripcion = pc.find_element_by_xpath('.//h2[@class="ui-search-item__title"]').get_attribute("innerHTML") try: vat_tag = pc.find_element_by_xpath('.//span[@class="ui-search-styled-label ui-search-item__highlight-label__text"]').get_attribute("innerHTML") if vat_tag == 'MÁS VENDIDO': vat_tag = None except NoSuchElementException: vat_tag = None main_list.append({'Product Description': descripcion, 'Price' : precio, 'VAT exclusion': vat_tag})
The following parameters will be defined for this block code.
You can set up a range of 3 for a loop to get the price and description product for the intended search.
Determine whether the product is exempt from Value-added Tax (VAT).
To obtain all of the information required, the Xpath must be specified; this cannot be accomplished by just copying the Xpath.
The VAT information could not be received if the NoSuchElement had not been listed previously.
Check the Below Screenshots of How the Information will Look:
Finally, we could use the stored data to create a DataFrame:
df_mercadolibre = pd.DataFrame(main_list)
df_mercadolibre
Product Description | Price | VAT exclusion | |
---|---|---|---|
0 | Portátil Dell V3 3400 I5-1135g7 Ram 16gb Ssd 1... | 2.670.000 | None |
1 | Portatil Hp 245 G7 Amd 3020e 8gb 1tb 14 PuLG R... | 1.249.900 | EXENTO DE IVA |
2 | Portátil Lenovo Thinkbook Core I3 10ma 12gb 1t... | 1.979.000 | None |
3 | Portátil Huawei D15 Corei5 +16g+512g+morral+ M... | 3.999.900 | None |
4 | Portatil Lenovo Gaming 10300h Gtx 1650 M2 256+... | 3.900.000 | None |
... | ... | ... | ... |
308 | Celular Samsung Galaxy S20 Fan Edition 256gb R... | 2.429.000 | None |
309 | Celular Xiaomi Poco X3 / 64gb + Forro+mi Band ... | 1.129.000 | None |
310 | iPhone 12 Pro Max 256gb Nuevo-sellado-garantia | 5.449.999 | None |
311 | Celular Realme 7 Pro 8gb/128gb 4500 Mah | 1.039.900 | None |
312 | Celular Xiaomi Poco X3 Pro 128gb 6ram 48mp +... | 1.499.000 | None |
Summary
- To begin, we installed the Selenium libraries that are required to execute Selenium.
- Using Selenium, we study how to use the basic command to find elements using XPath.
- Last but not least, for data loading, defined waiting times were required.
- We create loops to collect all of the laptops’ specifications and prices.
- The desired data was preserved in the lists that were constructed.
- The DataFrame was built with Pandas.
If you are in search of someone who can scrape the MercadoLibre Data using Selenium, then contact X-Byte Enterprise Crawling today or request for a quote!!