Feature Engineering — Transform Raw Data for ML

Generates feature engineering ideas and code for any dataset, organized by feature type and expected impact.

by Promptsy Team

654 views178 copies

+76

Machine Learning #feature-engineering #machine-learning #system-prompt

Prompt Discussion

Prompt

Generate feature engineering ideas for a [task type] model predicting [target variable].

**Dataset columns:**
[list your columns with types, e.g.: "signup_date (datetime), purchase_amount (float), country (categorical), page_views (int)"]

**For each suggested feature, provide:**
1. **Feature name** and formula/logic
2. **Feature type**: numeric / categorical / binary / interaction / aggregate
3. **Expected impact**: High / Medium / Low (with reasoning)
4. **Code**: Python (pandas/sklearn) to create it
5. **Gotcha**: Any edge case or data leakage risk

**Feature categories to explore:**
- **Temporal**: Day of week, month, hour, recency, time since last event, rolling windows
- **Aggregations**: Group-by statistics (mean, count, std) per categorical variable
- **Interactions**: Products/ratios between numeric columns
- **Encoding**: Target encoding, frequency encoding for high-cardinality categoricals
- **Domain-specific**: Features only an expert in [domain] would think of
- **Lag features**: Previous values for time-series patterns

**Rules:**
- No features that cause data leakage (using future info to predict past)
- Flag features that need careful train/test split handling
- Sort by expected impact (most impactful first)
- Include at least 10 feature ideas

Compatible models

Claude 4 Opus GPT-4o DeepSeek V3

Gallery (0)

No gallery images yet.

Version history

Discussion

Start a discussion about this prompt