StandardScaler / MinMaxScaler, imputation of missing values, or one-hot encoding of categoricals.
Beyond the default settings, there are several strategies you can use to potentially push performance further. This guide covers feature engineering, feature selection, preprocessing configuration, and common pitfalls to avoid.
Feature Engineering
Feature engineering is one of the most impactful ways to improve TabPFN’s performance. The goal is to encode domain knowledge that TabPFN cannot learn from raw columns alone.Domain-Specific Features
Create features that capture known relationships in your data:- Ratios:
price / area,revenue / headcount - Interactions:
weight / height**2(BMI),voltage * current(power) - Group aggregations: mean, count, or standard deviation of a numeric column grouped by a categorical (e.g., average spend per customer segment)
Datetime Features
TabPFN cannot interpret raw datetime objects. Extract structured features instead:The TabPFN API automatically detects and embeds date features. This manual extraction is primarily needed when using the local package.
Text and String Features
The best approach depends on cardinality and semantic content:- Low cardinality: Feed directly to TabPFN, which auto-encodes strings as categoricals
- Medium/High cardinality: Use
CountVectorizerorTfidfVectorizerwith dimensionality reduction (PCA or TruncatedSVD) - Semantic content: Use TabPFN API that automatically handles semantic text encoding.
Feature Selection
When your dataset has many features (especially beyond 500), feature selection can improve both performance and speed.Why It Helps
TabPFN uses transformer attention over all features. Irrelevant or noisy features dilute the model’s attention budget and can reduce predictive power, especially as feature count grows.Approaches
Greedy feature selection - remove features individually and check performance. This works particularly well on smaller data with low computational costs. Mutual information filtering — rank features by mutual information with the target and keep the top-k:Tuning Preprocessing Transforms
TabPFN’s internal preprocessing pipeline is one of the most powerful tuning levers. Each estimator in the ensemble cycles through a list of preprocessing configurations, creating diversity.PREPROCESS_TRANSFORMS
Control how features are transformed before being fed to the transformer.Configuration Options
| Field | Default | Options |
|---|---|---|
name | (required) | "quantile_uni", "squashing_scaler_default", "safepower", "quantile_uni_coarse", "kdi", "robust", "none" |
categorical_name | "none" | "none", "numeric", "onehot", "ordinal", "ordinal_shuffled", "ordinal_very_common_categories_shuffled" |
append_original | False | True, False, "auto" |
max_features_per_estimator | 500 | int — subsamples features if above this limit |
global_transformer_name | None | None, "svd", "svd_quarter_components" |
Target Transforms (Regression)
For regression tasks, you can control how the target variabley is transformed. This is especially useful for skewed targets:
| Transform | When to Use |
|---|---|
"none" | Symmetric, well-behaved targets |
"safepower" | Skewed targets (handles negatives) |
"quantile_norm" | Heavily skewed or multi-modal targets |
"quantile_uni" | Alternative to quantile_norm |
"1_plus_log" | Non-negative targets with large range |
Other Inference Settings
POLYNOMIAL_FEATURES: Generates interaction features. Can help when interactions matter but increases feature count quadratically.FINGERPRINT_FEATURE: Adds a hash-based row identifier. Useful by default; try disabling if you have very few features.OUTLIER_REMOVAL_STD: Removes extreme outliers before fitting. Lower values are more aggressive.SUBSAMPLE_SAMPLES: Subsample training rows for faster iteration during experimentation.
Tuning Model Parameters
softmax_temperature
Controls prediction sharpness (classification only):- Lower values (e.g.,
0.7): sharper, more confident predictions — useful when accuracy is already high - Higher values (e.g.,
1.2): softer, more calibrated predictions — useful when probability calibration matters
If you use
tuning_config={"calibrate_temperature": True}, the temperature is tuned automatically and overrides this value.Metric Tuning
For metrics that are sensitive to decision thresholds (F1, balanced accuracy, precision, recall), use the built-in metric tuning:Handling Imbalanced Data
- Set
balance_probabilities=Trueas a quick heuristic for imbalanced datasets - For more control, use
eval_metric="balanced_accuracy"with threshold tuning
Escalation Path
When the default TabPFN does not meet your needs, try these approaches in roughly this order:Feature engineering
Add domain features, extract datetime components, encode text meaningfully. This is usually the highest-impact change.
Hyperparameter optimization
Use the HPO extension for automated search over the TabPFN hyperparameter space.
AutoTabPFN ensembles
Use the AutoTabPFN extension for an automatically tuned ensemble of TabPFN models. Typically gives a few percent boost.
Fine-tuning
Fine-tune the pretrained model on your data when you have a specialized domain or distribution shift.
Related
Fine-Tuning
Adapt TabPFN’s pretrained weights to your domain.
AutoTabPFN Ensembles
Automated ensembling for maximum accuracy.
Hyperparameter Optimization
Bayesian optimization over TabPFN’s hyperparameter space.