how to extract google ad results using python

There’re two kinds of ad results available having different layouts:

Logic:

  • Import libraries for working with.
  • Add user-agent for fake real-user visits.
  • Enter the search queries.
  • Have HTML response.
  • Have HTML code.
  • Discover and specify where to extract data.
  • Repeat over that till nothing left.

Google might block the requests if:

  • Recognize script as the script, e.g. python-requests.
  • There’re so many requests from single IP address.
  • Not working like the human. Fundamentally everything above

There’re many ways to tag along blocking scripts from Google:

  • Use referrer or Python-requests Session Objects.
  • Use customized headers -User Agents and list of different user agents.
  • Use headless browsers or browser auto frameworks like Pyppeteer or Selenium.
  • Use proxies as well as rotate them.
  • Use CAPTCHA solving services.
  • Use request delays much slower.\

Shopping Ads

shopping ads jpg
import requests, lxml, urllib.parse
from bs4 import BeautifulSoup

# Adding User-agent (default user-agent from requests library is 'python-requests')
# https://github.com/psf/requests/blob/589c4547338b592b1fb77c65663d8aa6fbb7e38b/requests/utils.py#L808-L814
headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3538.102 Safari/537.36 Edge/18.19582"
}

# Search query
params = {'q': 'сoffee buy'}

# Getting HTML response
html = requests.get(f'https://www.google.com/search?q=',
                    headers=headers,
                    params=params).text

# Getting HTML code from BeautifulSoup
soup = BeautifulSoup(html, 'lxml')

# Looking for container that has all necessary data findAll() or find_all()
for container in soup.findAll('div', class_='RnJeZd top pla-unit-title'):
  # Scraping title
  title = container.text

  # Creating beginning of the link to join afterwards
  startOfLink = 'https://www.googleadservices.com/pagead'
  # Scraping end of the link to join afterwards
  endOfLink = container.find('a')['href']
  # Combining (joining) relative and absolute URL's (adding begining and end link)
  ad_link = urllib.parse.urljoin(startOfLink, endOfLink)

  # Printing each title and link on a new line
  print(f'{title}\n{ad_link}\n')


# Output
''' 
Jot Ultra Coffee Triple | Ultra Concentrated
https://www.googleadservices.com/aclk?sa=l&ai=DChcSEwiP0dmfvcbwAhX48OMHHYyRBuoYABABGgJ5bQ&sig=AOD64_0x-PlrWek-JFlDTSo7E9Z7YhUOjg&ctype=5&q=&ved=2ahUKEwjhr9GfvcbwAhXHQs0KHQCbCAUQww96BAgCED4&adurl=
MUD\WTR | A Healthier Coffee Alternative, 30 servings
https://www.googleadservices.com/aclk?sa=l&ai=DChcSEwiP0dmfvcbwAhX48OMHHYyRBuoYABAJGgJ5bQ&sig=AOD64_3gltZJ6kPrxic5o8yUO5cuJrHXnw&ctype=5&q=&ved=2ahUKEwjhr9GfvcbwAhXHQs0KHQCbCAUQww96BAgCEEg&adurl=
Jot Ultra Coffee Double | 2 bottles = 28 cups
https://www.googleadservices.com/aclk?sa=l&ai=DChcSEwiP0dmfvcbwAhX48OMHHYyRBuoYABAHGgJ5bQ&sig=AOD64_3hD0JWZSLr8NUgoTW5K0HMzdFvng&ctype=5&q=&ved=2ahUKEwjhr9GfvcbwAhXHQs0KHQCbCAUQww96BAgCEE4&adurl=
'''
Note: At times, there would be zero results as Google didn’t indicate ads at script runtime. Just run that again.

 

Standard Website Ads

standard website ads
import requests, lxml, urllib.parse
from bs4 import BeautifulSoup

# Adding user-agent to fake real user visit
headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3538.102 Safari/537.36 Edge/18.19582"
}

# Search query
params = {'q': 'coffee buy'}

# HTML response
html = requests.get(f'https://www.google.com/search?q=',
                    headers=headers,
                    params=params).text
# HTML code from BeautifulSoup
soup = BeautifulSoup(html, 'lxml')

# Looking for container that has needed data and iterating over it 
for container in soup.findAll('span', class_='Zu0yb LWAWHf qzEoUe'):
  # Using .text since in 'span' there's no other text other than link
  ad_link = container.text
  # Printing links
  print(ad_link)

# Output
'''
https://www.coffeeam.com/
https://www.sfbaycoffee.com/
https://www.onyxcoffeelab.com/
https://www.enjoybettercoffee.com/
https://www.klatchroasting.com/
https://www.pachamamacoffee.com/
https://www.bulletproof.com/
'''

Use Google Ads Results API

Instead, you can perform the same things using Google Ad Results API from X-Byte, except you don’t need to consider solving CAPTCHA in case you send so many requests, getting proxies, reduces development complexities, and offers easy data manipulation.

This is a paid API.

Code to integrate:

import os
from serpapi import GoogleSearch

params = {
  "engine": "google",
  "q": "kitchen table",
  "api_key": os.getenv("API_KEY"),
  "no_cache":"true" # add this param if it throws an error
}

search = GoogleSearch(params)
results = search.get_dict()

for ad in results['ads']: # shopping ads -> ['shopping_results']
  shopping_ad = ad['tracking_link'] # shopping ads -> ['link']
  print(shopping_ad)

# Output for regular ads
'''
https://www.google.com/aclk?sa=l&ai=DChcSEwje1bnojtHwAhWRhMgKHY0kC1oYABAPGgJxdQ&ae=2&sig=AOD64_2ZH32FlwxW1XqO9V49i2L8J5qy2A&q&adurl
https://www.google.com/aclk?sa=l&ai=DChcSEwje1bnojtHwAhWRhMgKHY0kC1oYABAMGgJxdQ&ae=2&sig=AOD64_2l1PVJAqbVmrcu8UpkGPVk-VK3UA&q&adurl
https://www.google.com/aclk?sa=l&ai=DChcSEwje1bnojtHwAhWRhMgKHY0kC1oYABAQGgJxdQ&sig=AOD64_2DDuyRZUcFi04jfneAzwnOQBuLtw&q&adurl
'''
# Output for shopping ads
'''
https://www.google.com/aclk?sa=l&ai=DChcSEwijuI27jtHwAhVA5uMHHUUWAWkYABAEGgJ5bQ&ae=2&sig=AOD64_2zCyytR6tDeB3BjdOX5sFQQKwOAA&ctype=5&q=&ved=2ahUKEwjh9oO7jtHwAhUId6wKHa8mByUQ5bgDegQIARA8&adurl=
https://www.google.com/aclk?sa=l&ai=DChcSEwijuI27jtHwAhVA5uMHHUUWAWkYABAFGgJ5bQ&ae=2&sig=AOD64_2HeGVTNF91vkSHjg-wRDtC1ouATw&ctype=5&q=&ved=2ahUKEwjh9oO7jtHwAhUId6wKHa8mByUQ5bgDegQIARBI&adurl=
https://www.google.com/aclk?sa=l&ai=DChcSEwijuI27jtHwAhVA5uMHHUUWAWkYABAGGgJ5bQ&ae=2&sig=AOD64_1n4ztvwQxiSMInwgntgY-WyVc2eQ&ctype=5&q=&ved=2ahUKEwjh9oO7jtHwAhUId6wKHa8mByUQ5bgDegQIARBY&adurl=
'''

In case, you have any queries or anything isn’t working properly or you need to write some other codes, feel free to contact X-Byte Enterprise Crawling or ask for a free quote!