
- Google Shopping Ads
- Google Standard Website Ads
Logic:
- Import libraries for working with.
- Add user-agent for fake real-user visits.
- Enter the search queries.
- Have HTML response.
- Have HTML code.
- Discover and specify where to extract data.
- Repeat over that till nothing left.
Google might block the requests if:
- Recognize script as the script, e.g. python-requests.
- There’re so many requests from single IP address.
- Not working like the human. Fundamentally everything above
There’re many ways to tag along blocking scripts from Google:
- Use referrer or Python-requests Session Objects.
- Use customized headers -User Agents and list of different user agents.
- Use headless browsers or browser auto frameworks like Pyppeteer or Selenium.
- Use proxies as well as rotate them.
- Use CAPTCHA solving services.
- Use request delays much slower.\
Shopping Ads

import requests, lxml, urllib.parse
from bs4 import BeautifulSoup
# Adding User-agent (default user-agent from requests library is 'python-requests')
# https://github.com/psf/requests/blob/589c4547338b592b1fb77c65663d8aa6fbb7e38b/requests/utils.py#L808-L814
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3538.102 Safari/537.36 Edge/18.19582"
}
# Search query
params = {'q': 'сoffee buy'}
# Getting HTML response
html = requests.get(f'https://www.google.com/search?q=',
headers=headers,
params=params).text
# Getting HTML code from BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
# Looking for container that has all necessary data findAll() or find_all()
for container in soup.findAll('div', class_='RnJeZd top pla-unit-title'):
# Scraping title
title = container.text
# Creating beginning of the link to join afterwards
startOfLink = 'https://www.googleadservices.com/pagead'
# Scraping end of the link to join afterwards
endOfLink = container.find('a')['href']
# Combining (joining) relative and absolute URL's (adding begining and end link)
ad_link = urllib.parse.urljoin(startOfLink, endOfLink)
# Printing each title and link on a new line
print(f'{title}\n{ad_link}\n')
# Output
'''
Jot Ultra Coffee Triple | Ultra Concentrated
https://www.googleadservices.com/aclk?sa=l&ai=DChcSEwiP0dmfvcbwAhX48OMHHYyRBuoYABABGgJ5bQ&sig=AOD64_0x-PlrWek-JFlDTSo7E9Z7YhUOjg&ctype=5&q=&ved=2ahUKEwjhr9GfvcbwAhXHQs0KHQCbCAUQww96BAgCED4&adurl=
MUD\WTR | A Healthier Coffee Alternative, 30 servings
https://www.googleadservices.com/aclk?sa=l&ai=DChcSEwiP0dmfvcbwAhX48OMHHYyRBuoYABAJGgJ5bQ&sig=AOD64_3gltZJ6kPrxic5o8yUO5cuJrHXnw&ctype=5&q=&ved=2ahUKEwjhr9GfvcbwAhXHQs0KHQCbCAUQww96BAgCEEg&adurl=
Jot Ultra Coffee Double | 2 bottles = 28 cups
https://www.googleadservices.com/aclk?sa=l&ai=DChcSEwiP0dmfvcbwAhX48OMHHYyRBuoYABAHGgJ5bQ&sig=AOD64_3hD0JWZSLr8NUgoTW5K0HMzdFvng&ctype=5&q=&ved=2ahUKEwjhr9GfvcbwAhXHQs0KHQCbCAUQww96BAgCEE4&adurl=
'''
Note: At times, there would be zero results as Google didn’t indicate ads at script runtime. Just run that again.
Standard Website Ads

import requests, lxml, urllib.parse
from bs4 import BeautifulSoup
# Adding user-agent to fake real user visit
headers = {
"User-Agent":
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3538.102 Safari/537.36 Edge/18.19582"
}
# Search query
params = {'q': 'coffee buy'}
# HTML response
html = requests.get(f'https://www.google.com/search?q=',
headers=headers,
params=params).text
# HTML code from BeautifulSoup
soup = BeautifulSoup(html, 'lxml')
# Looking for container that has needed data and iterating over it
for container in soup.findAll('span', class_='Zu0yb LWAWHf qzEoUe'):
# Using .text since in 'span' there's no other text other than link
ad_link = container.text
# Printing links
print(ad_link)
# Output
'''
https://www.coffeeam.com/
https://www.sfbaycoffee.com/
https://www.onyxcoffeelab.com/
https://www.enjoybettercoffee.com/
https://www.klatchroasting.com/
https://www.pachamamacoffee.com/
Home
'''
Use Google Ads Results API
Instead, you can perform the same things using Google Ad Results API from X-Byte, except you don’t need to consider solving CAPTCHA in case you send so many requests, getting proxies, reduces development complexities, and offers easy data manipulation.
This is a paid API.
Code to integrate:
import os
from serpapi import GoogleSearch
params = {
"engine": "google",
"q": "kitchen table",
"api_key": os.getenv("API_KEY"),
"no_cache":"true" # add this param if it throws an error
}
search = GoogleSearch(params)
results = search.get_dict()
for ad in results['ads']: # shopping ads -> ['shopping_results']
shopping_ad = ad['tracking_link'] # shopping ads -> ['link']
print(shopping_ad)
# Output for regular ads
'''
https://www.google.com/aclk?sa=l&ai=DChcSEwje1bnojtHwAhWRhMgKHY0kC1oYABAPGgJxdQ&ae=2&sig=AOD64_2ZH32FlwxW1XqO9V49i2L8J5qy2A&q&adurl
https://www.google.com/aclk?sa=l&ai=DChcSEwje1bnojtHwAhWRhMgKHY0kC1oYABAMGgJxdQ&ae=2&sig=AOD64_2l1PVJAqbVmrcu8UpkGPVk-VK3UA&q&adurl
https://www.google.com/aclk?sa=l&ai=DChcSEwje1bnojtHwAhWRhMgKHY0kC1oYABAQGgJxdQ&sig=AOD64_2DDuyRZUcFi04jfneAzwnOQBuLtw&q&adurl
'''
# Output for shopping ads
'''
https://www.google.com/aclk?sa=l&ai=DChcSEwijuI27jtHwAhVA5uMHHUUWAWkYABAEGgJ5bQ&ae=2&sig=AOD64_2zCyytR6tDeB3BjdOX5sFQQKwOAA&ctype=5&q=&ved=2ahUKEwjh9oO7jtHwAhUId6wKHa8mByUQ5bgDegQIARA8&adurl=
https://www.google.com/aclk?sa=l&ai=DChcSEwijuI27jtHwAhVA5uMHHUUWAWkYABAFGgJ5bQ&ae=2&sig=AOD64_2HeGVTNF91vkSHjg-wRDtC1ouATw&ctype=5&q=&ved=2ahUKEwjh9oO7jtHwAhUId6wKHa8mByUQ5bgDegQIARBI&adurl=
https://www.google.com/aclk?sa=l&ai=DChcSEwijuI27jtHwAhVA5uMHHUUWAWkYABAGGgJ5bQ&ae=2&sig=AOD64_1n4ztvwQxiSMInwgntgY-WyVc2eQ&ctype=5&q=&ved=2ahUKEwjh9oO7jtHwAhUId6wKHa8mByUQ5bgDegQIARBY&adurl=
'''
In case, you have any queries or anything isn’t working properly or you need to write some other codes, feel free to contact X-Byte Enterprise Crawling or ask for a free quote!





