Bivariate analysis involves examining the relationship between two variables to understand their interactions, correlations, or potential biases in the context of the Adult Income dataset. Here are some examples of bivariate analyses that can help in detecting biases related to demographic factors, employment, and income disparities:
This analysis can reveal gender-based income disparities, which is a common form of bias in income data.
import matplotlib.pyplot as plt
sns.barplot(x='gender', y='income', data=df, estimator=lambda x: len(x) / len(df) * 100)
plt.title('Percentage of Individuals Earning >50K by Gender')
plt.ylabel('Percentage of Individuals Earning >50K')
In this example, income would be a binary variable (e.g., 0 for <=50K and 1 for >50K), and the plot shows the percentage of each gender with an income of more than 50K.
Exploring the relationship between education level and income can highlight educational biases in earning potential.
sns.boxplot(x='education-num', y='income', data=df)
plt.title('Income Distribution by Education Level')
plt.xlabel('Education Level (Numerical)')
This plot can help identify if higher education levels correlate with higher income, and if there are significant disparities across different education levels.
This analysis can reveal if certain age groups are over- or under-represented within specific racial categories.
sns.violinplot(x='race', y='age', data=df)
plt.title('Age Distribution by Race')
A violin plot is used here to compare the age distribution within each racial group, which can indicate biases in age representation across races.
This comparison can show if there is a bias in the number of hours worked per week between different genders, potentially indicating unequal work distribution or opportunities.
sns.barplot(x='gender', y='hours-per-week', data=df)
plt.title('Average Hours Worked Per Week by Gender')
plt.ylabel('Average Hours Worked Per Week')
This bar plot will show if one gender tends to work more hours on average than the other.
Analyzing the distribution of genders across different occupations can reveal occupational segregation or gender biases in certain job roles.
plt.figure(figsize=(10, 8))
sns.countplot(y='occupation', hue='gender', data=df)
plt.title('Occupation Distribution by Gender')
plt.legend(title='Gender')
This plot can highlight occupations that are male or female-dominated, indicating potential gender biases in employment.
Investigating the relationship between workclass categories and income levels can uncover biases related to employment type and income.
sns.countplot(x='workclass', hue='income', data=df)
plt.title('Income Distribution Across Workclasses')
plt.legend(title='Income', labels=['<=50K', '>50K'])
This analysis can reveal if certain workclasses are more likely to have higher incomes, indicating potential biases in earnings across different types of employment.
In each of these analyses, it’s essential to consider the context and potential confounding factors that might influence the relationships. For instance, disparities in income by gender could be influenced by occupation, education, or hours worked. Similarly, educational biases in income might be affected by age, experience, or industry. When detecting biases, it’s crucial to consider these factors and potentially conduct further multivariate analyses to understand the underlying causes.