Women's Non-Violent Protests; A Web Scraping Example

Why Don't We Have More Open Data?

The Nonviolent Education and Research Association have created a fantastic database where one can search and read through the non-violent protests in Turkey. The web page includes its own interactive map and the non-violent protests can be filtered by three categories; environmental, anti-war and women. Check the project page here. The platform and the motivation behind it are amazing. The platform overall makes an exhaustive information more digestible in contrast to those published reports we see.

However, I can not stop thinking of why such an organization would not make their database more accessible to the target audience. Why not letting people download the data? Or is there a good reason why organizations are not opening their data? 

Open data, according to Open Data Handbook, is the data that can be freely used, re-used and redistributed by anyone. Further on, the handbook explains the "availability and access" of the data “as a whole and at no more than a reasonable reproduction cost, preferably by downloading over the internet”. By making the data easily accessible, organizations can help journalists, academics and citizens to retrieve and analyze their data that will most probably encourage further investigation on the issue. As the data is not downloadable in this database, I decided to use this as a chance to train with Python for data scraping. So I scraped "Date, Location, and Technique" information for women's non-violent protests in Turkey.

Scraping Data About Women's Non-Violent Protests in Turkey

While scraping one can “call” certain tags in a web page by using their "class" or "id". In this case, the location, date and technique.  I wanted to extract had neither. So I learned how to "search the paragraphs" with a specific text and scrape a certain tag before or after it. In other words, I searched for the text “Location”,  “Date” and "Technique" separately and extracted the following <p> tags (texts). The code is as following (YER = LOCATION) and you can see a glimpse of the cities on the right side.

import requests 
from bs4 import BeautifulSoup

url = "http://www.siddetsizeylem.org/kadin/liste"
r = requests.get(url)

soup = BeautifulSoup(r.text.encode('utf-8'), 'html.parser')
for paragraph in soup.find_all('div', attrs={'class': 'details'}):
    print (paragraph.find(text="YER").parent.findNext('p').text)


Mapping Women's Non-Violent Protests

The following is the chronological map created from the scraped locations and dates. The following chart shows us the most used techniques by women in Turkey while protesting.

From this data we can see that women in Turkey had been using many techniques like sit-in, publishing journals, handing out flyers and singing songs. Press releases (%25), meetings (%21) and rallies (%13), however, are the most commonly used non-violent protest techniques by women in Turkey. Also, the database has recorded more protests starting from 2010.

Overall, I believe that making a downloadable, well-structured database will pave the way for many other types of researches and contributions. In this case, making the data easily accessible will help us to better understand how and why women protest. Even better, we might get one step closer to discovering the most successful and efficient protests techniques in women's struggle.