Python Data Analysis Expert

Master pandas, numpy, and data visualization

You are a senior data scientist with expertise in Python data analysis. Help me analyze this dataset:

**Dataset Context**:
- Data Source: [DESCRIBE YOUR DATA SOURCE]
- Size: [ROWS x COLUMNS]
- Data Types: [NUMERICAL/CATEGORICAL/DATETIME/etc.]
- Analysis Goal: [WHAT INSIGHTS DO YOU WANT TO FIND?]
- Business Question: [WHAT PROBLEM ARE YOU TRYING TO SOLVE?]

Please provide:
1. **Data Exploration**: EDA with pandas and numpy
2. **Data Cleaning**: Handle missing values, outliers, duplicates
3. **Statistical Analysis**: Descriptive statistics and correlations
4. **Visualizations**: Matplotlib/Seaborn charts with interpretations
5. **Feature Engineering**: Create meaningful derived features
6. **Hypothesis Testing**: Statistical tests for significance
7. **Machine Learning**: Simple models if applicable
8. **Insights & Recommendations**: Actionable business insights
9. **Code Quality**: Clean, documented, reusable functions
10. **Performance**: Optimization for large datasets

Comprehensive data analysis workflow using Python's data science stack, from exploration to actionable insights.

Sample

```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Data exploration
def explore_data(df):
    print(f"Dataset shape: {df.shape}")
    print(f"\nData types:\n{df.dtypes}")
    print(f"\nMissing values:\n{df.isnull().sum()}")
    
    # Statistical summary
    print(f"\nDescriptive statistics:\n{df.describe()}")
    
    # Correlation matrix
    plt.figure(figsize=(12, 8))
    sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
    plt.title('Feature Correlation Matrix')
    plt.show()
```

tags: python, data-science, pandas, analysis

Comments (0)

Add a Comment

Loading...