Skip to content
Generic filters
Exact matches only

Analyzing the Gender Disparity Among Higher Academia in Computer Science / Engineering

I met a female computer scientist for the first time in college. She was my professor for my summer lab research! Prior to this, I never actually met a female engineer or computer scientist. As I continued my education, I began to notice the small number of women professors in my college’s computer science/engineering department. If my school’s department looks like this, what do the other CS departments in the US look like? Specifically, how does the representation of women CS Faculty compare to men CS Faculty in the US?

I gathered samples from 7 different CS university departments below.

If repeated, I would aim for a larger sample size.

I wrote the following functions below to web scrap data from MIT, Stanford, and Cal. The other schools’ data were scrapped by my fellow teammates.

I initialized the gender column to be all male and then changed it to female accordingly for each school. There may be some bias as I manually determined if the individual is a male or female based on their picture provided by the university.

from requests import get
from bs4 import BeautifulSoup
import re
import pandas as pd
import urllib.request
import numpy as np
def lst_data(website: str, tag: str, attrs_key: str, attrs_txt: str):
response = get(website)
html = BeautifulSoup(response.text, ‘html.parser’)
name_data = html.find_all(tag, attrs={attrs_key: re.compile(attrs_txt)})
return name_data
#names = [first name, last name]
def index_values(names, name_data):
lst = []
for name in names:
name_str = [str(x) for x in name_data]
new_list = [name_str.index(x) for x in name_str if, x)]
return lst
#initialize all as male and change to female accordingly
def make_df(name_lst, school, female_lst):
df = pd.DataFrame({‘Name’: name_lst, ‘School’: school, ‘Gender’: ‘male’})
df.index = df[‘Name’]
df.loc[female_lst, ‘Gender’] = ‘female’
df = df.reset_index(drop=True)
return df

The following is an example of how I scrapped faculty names from Stanford.

name_data = lst_data(‘', ‘a’, ‘href’, ‘^http’)# Returns index values [8,67]. Use to index name_data
index_values([‘Maneesh Agrawala’, ‘Matei Zaharia’], name_data)
lst = []
for faculty in name_data[8:68]:
#female faculty names
female_lst = [‘Jeannette Bohg’, ‘Emma Brunskill’, ‘Chelsea Finn’, ‘Monica Lam’, ‘Karen Liu’, ‘Dorsa Sadigh’,
‘Caroline Trippel’, ‘Jennifer Widom’, ‘Mary Wootters’]
stanford_df = make_df(lst, ‘Stanford’, female_lst)

After collecting and cleaning the appropriate data, I found the proportion of female faculty members within the CS departments of each school. The rest of the code to process the data is linked in my github at the end of the article.

  • Why 1 sample T-Test: Sample size < 30 as only 7 schools and we have an unknown population standard deviation
  • Sample: Proportion of female faculty members within the CS departments of each school (Figure 1)
Figure 1 — Proportion of CS Female Faculty at Relative University
  • Null Hypothesis: p = 0.5, since we are testing if the percentage of CS female faculty is equal to the percentage of CS male faculty, so CS female faculty = 50%
  • Significance Level (alpha): 5%, which is a 5% chance we reject the null hypothesis when it is actually true
from scipy.stats import ttest_1samptset, pval = ttest_1samp(x, 0.5) #x = sample 
print(‘t-statistic:’, tset)
print(‘pval:’, pval)
if pval < 0.05: # alpha value is 0.05 or 5%

Using a 5% significance level, we get a p-value of 2.82e-07 from our hypothesis test. The probability of observing the sample data (Figure 1), given the null hypothesis is true, is 0.0000282%. Since the p-value is less than the significance level, we reject the null hypothesis. The test suggests that the percentage of women in CS faculty positions is not equal to the percentage of men in these positions.

Why is that? How does the gender demographics for qualified candidates for CS faculty positions look like?