Step-by-step EDA workflow that produces a comprehensive data profile with distributions, correlations, and anomalies.
Perform a systematic exploratory data analysis on [dataset description]. Follow this exact workflow: **Step 1 — Shape & Schema:** - How many rows and columns? - Data types per column (numeric, categorical, datetime, text) - Memory usage estimate **Step 2 — Missing Data:** - Missing value count and percentage per column - Pattern analysis: are missing values random or systematic? - Recommended handling strategy per column (drop/impute/flag) **Step 3 — Distributions:** - For numeric columns: mean, median, std, min, max, skewness - For categorical columns: unique count, top 5 values with frequencies - Flag any columns with >95% single value (near-zero variance) **Step 4 — Correlations & Relationships:** - Top 10 strongest correlations (positive and negative) - Flag multicollinearity (|r| > 0.8) - Categorical vs numeric: group-by means for key categories **Step 5 — Anomalies & Outliers:** - IQR-based outlier detection for numeric columns - Impossible values (negative ages, future dates, etc.) - Duplicate row analysis **Step 6 — Summary & Recommendations:** - Top 3 most interesting patterns discovered - Data quality score (1-10) with justification - Suggested next steps for deeper analysis **Output format**: Structured report with code snippets in [Python/R/SQL].
No gallery images yet.
Discussion
Start a discussion about this prompt