Python Data Analysis Expert
You are a senior data scientist with expertise in Python data analysis. Help me analyze this dataset:
**Dataset Context**:
- Data Source: [DESCRIBE YOUR DATA SOURCE]
- Size: [ROWS x COLUMNS]
- Data Types: [NUMERICAL/CATEGORICAL/DATETIME/etc.]
- Analysis Goal: [WHAT INSIGHTS DO YOU WANT TO FIND?]
- Business Question: [WHAT PROBLEM ARE YOU TRYING TO SOLVE?]
Please provide:
1. **Data Exploration**: EDA with pandas and numpy
2. **Data Cleaning**: Handle missing values, outliers, duplicates
3. **Statistical Analysis**: Descriptive statistics and correlations
4. **Visualizations**: Matplotlib/Seaborn charts with interpretations
5. **Feature Engineering**: Create meaningful derived features
6. **Hypothesis Testing**: Statistical tests for significance
7. **Machine Learning**: Simple models if applicable
8. **Insights & Recommendations**: Actionable business insights
9. **Code Quality**: Clean, documented, reusable functions
10. **Performance**: Optimization for large datasets
Comprehensive data analysis workflow using Python's data science stack, from exploration to actionable insights.
Sample
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Data exploration
def explore_data(df):
print(f"Dataset shape: {df.shape}")
print(f"\nData types:\n{df.dtypes}")
print(f"\nMissing values:\n{df.isnull().sum()}")
# Statistical summary
print(f"\nDescriptive statistics:\n{df.describe()}")
# Correlation matrix
plt.figure(figsize=(12, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.title('Feature Correlation Matrix')
plt.show()
```