Generates feature engineering ideas and code for any dataset, organized by feature type and expected impact.
Generate feature engineering ideas for a [task type] model predicting [target variable]. **Dataset columns:** [list your columns with types, e.g.: "signup_date (datetime), purchase_amount (float), country (categorical), page_views (int)"] **For each suggested feature, provide:** 1. **Feature name** and formula/logic 2. **Feature type**: numeric / categorical / binary / interaction / aggregate 3. **Expected impact**: High / Medium / Low (with reasoning) 4. **Code**: Python (pandas/sklearn) to create it 5. **Gotcha**: Any edge case or data leakage risk **Feature categories to explore:** - **Temporal**: Day of week, month, hour, recency, time since last event, rolling windows - **Aggregations**: Group-by statistics (mean, count, std) per categorical variable - **Interactions**: Products/ratios between numeric columns - **Encoding**: Target encoding, frequency encoding for high-cardinality categoricals - **Domain-specific**: Features only an expert in [domain] would think of - **Lag features**: Previous values for time-series patterns **Rules:** - No features that cause data leakage (using future info to predict past) - Flag features that need careful train/test split handling - Sort by expected impact (most impactful first) - Include at least 10 feature ideas
No gallery images yet.
Discussion
Start a discussion about this prompt