Goal

Diversity, unconscious bias in the workplace and, in general, the way companies treat their employees are a very important topic.

Data science can help discover potential discriminations by looking at the data and see if there are segments of employees that are treated worse.

Challenge Description

There has been lots of talking about diversity in the workplace, especially in technology. The Head of HR at your company is very concerned about that and has asked you to analyze internal data about employees and see whether results suggest that the company is treating all its employees fairly or not.

Specifically, she gave you the following tasks:

  • In the company there are 6 levels. Identify, for each employee, her corresponding level.
  • How many people each employee manages? Consider that if John directly manages 2 people and these two people manage 5 people each, then we conclude that John manages 12 people.
  • Build a model to predict the salary of each employee.
  • Describe the main factors impacting employee salaries. Do you think the company has been treating all its employees fairly? What are the next steps you would suggest to the Head of HR?

PS: you can assume the data for this challenge is clean(e.g. no types, no mismatch when performing joins)

Data

  employee_id boss_id dept
0 46456 175361 sales
1 104708 29733 HR
2 120853 41991 sales
3 142630 171266 HR
4 72711 198240 sales
  employee_id signing_bonus salary degree_level sex yrs_experience
0 138719 0 273000.0 Master M 2
1 3192 0 301000.0 Bachelor F 1
2 114657 0 261000.0 Master F 2
3 29039 0 86000.0 High_School F 4
4 118607 0 126000.0 Bachelor F 3

Skills Covered

  • Data Wrangling(Pandas)
  • Data Visualization(Seaborn, Matplotlib, etc)
  • Machine Learning(Sklearn, RandomForest, etc)
  • Insight Extraction(Feature Importance, PDP Plot)

Interesting Findings

Footnote

This project is my solution to one of the Data Science Challenges by Giulio Palombo (Datasets are not provided here)

URL to the notebook