Python & BeautifulSoup for Data Scraping
So today my biggest challenge was to get the information from a table in a URL. These informations will form the base of an online mapping project. So the aim is to get name of veterinary clinics in Turkey, Their Phone Number & Their Addresses. However, I only managed to get merged data. Like;
NamePhoneAddress (they are all merged)
Example: Bayat Veteriner Kliniği0322 491 26 73Kurtuluş No:1
I decided to use Facebook as an open collaboration platform and called some friends to help me solve this issue. After a long collaborative chat which, reminded us all of IRC chat rooms, we ended up solving the problem in the code.
Finally we got it all working fine with commas in between features separating all the fields.
Region, City, Name, Address, Phone
Example: Gölbaşı Veteriner Klinikleri, Ankara, Murat Veteriner Kliniği, Bahçelievler Cemal Gürsel 125.Sok. No:212/A, 0312 484 66 53
So here is the Python code which we co-created with Python using BeautifulSoup
from bs4 import BeautifulSoup
a = requests.get("MY_URL")
soup = BeautifulSoup(a.text, 'lxml')
rows = soup.find_all("div", class_="entry clearfix post")
data = [["District","Name","Address","Phone"]]
for row in rows:
district = ""
for elem in row.find_all():
if elem.name == "h2":
district = elem.text.replace(" Veteriner Klinikleri, CITY_NAME","")
elif elem.name == "tr" and len(elem.find_all("td")) > 0:
for td in elem.find_all("td"):
for item in data:
print ", ".join(item)