Skip to main content
TabPFN works well out of the box, handles many things natively that traditional ML pipelines require and we recommend to feed in the data as raw as possible as additional processing can hurt performance. We suggest avoiding additional scaling with StandardScaler / MinMaxScaler, imputation of missing values, or one-hot encoding of categoricals. Beyond the default settings, there are several strategies you can use to potentially push performance further. This guide covers feature engineering, feature selection, preprocessing configuration, and common pitfalls to avoid.

Feature Engineering

Feature engineering is one of the most impactful ways to improve TabPFN’s performance. The goal is to encode domain knowledge that TabPFN cannot learn from raw columns alone.

Domain-Specific Features

Create features that capture known relationships in your data:
  • Ratios: price / area, revenue / headcount
  • Interactions: weight / height**2 (BMI), voltage * current (power)
  • Group aggregations: mean, count, or standard deviation of a numeric column grouped by a categorical (e.g., average spend per customer segment)

Datetime Features

TabPFN cannot interpret raw datetime objects. Extract structured features instead:
df["year"] = df["date"].dt.year
df["month"] = df["date"].dt.month
df["dayofweek"] = df["date"].dt.dayofweek
df["hour"] = df["date"].dt.hour

# Cyclical encoding for periodic features
import numpy as np
df["month_sin"] = np.sin(2 * np.pi * df["month"] / 12)
df["month_cos"] = np.cos(2 * np.pi * df["month"] / 12)
For datasets with a time dimension, also consider adding a running index feature (sequential 0, 1, 2, …) to help TabPFN detect trends.
The TabPFN API automatically detects and embeds date features. This manual extraction is primarily needed when using the local package.

Text and String Features

The best approach depends on cardinality and semantic content:
  • Low cardinality: Feed directly to TabPFN, which auto-encodes strings as categoricals
  • Medium/High cardinality: Use CountVectorizer or TfidfVectorizer with dimensionality reduction (PCA or TruncatedSVD)
  • Semantic content: Use TabPFN API that automatically handles semantic text encoding.

Feature Selection

When your dataset has many features (especially beyond 500), feature selection can improve both performance and speed.

Why It Helps

TabPFN uses transformer attention over all features. Irrelevant or noisy features dilute the model’s attention budget and can reduce predictive power, especially as feature count grows.

Approaches

Greedy feature selection - remove features individually and check performance. This works particularly well on smaller data with low computational costs. Mutual information filtering — rank features by mutual information with the target and keep the top-k:
from sklearn.feature_selection import mutual_info_classif, SelectKBest

selector = SelectKBest(mutual_info_classif, k=50)
X_train_selected = selector.fit_transform(X_train, y_train)
X_test_selected = selector.transform(X_test)
PCA / TruncatedSVD — reduce dimensionality while retaining variance:
from sklearn.decomposition import PCA

pca = PCA(n_components=50)
X_train_reduced = pca.fit_transform(X_train)
X_test_reduced = pca.transform(X_test)

Tuning Preprocessing Transforms

TabPFN’s internal preprocessing pipeline is one of the most powerful tuning levers. Each estimator in the ensemble cycles through a list of preprocessing configurations, creating diversity.

PREPROCESS_TRANSFORMS

Control how features are transformed before being fed to the transformer.

Configuration Options

FieldDefaultOptions
name(required)"quantile_uni", "squashing_scaler_default", "safepower", "quantile_uni_coarse", "kdi", "robust", "none"
categorical_name"none""none", "numeric", "onehot", "ordinal", "ordinal_shuffled", "ordinal_very_common_categories_shuffled"
append_originalFalseTrue, False, "auto"
max_features_per_estimator500int — subsamples features if above this limit
global_transformer_nameNoneNone, "svd", "svd_quarter_components"
For optimal diversity, use as many different preprocessing transforms as you have estimators (default 8). Each estimator cycles through the list.

Target Transforms (Regression)

For regression tasks, you can control how the target variable y is transformed. This is especially useful for skewed targets:
from tabpfn import TabPFNRegressor

model = TabPFNRegressor(
    inference_config={
        "REGRESSION_Y_PREPROCESS_TRANSFORMS": (
            "none",
            "safepower",
            "quantile_norm",
            "quantile_uni",
            "1_plus_log"
        ),
    },
)
TransformWhen to Use
"none"Symmetric, well-behaved targets
"safepower"Skewed targets (handles negatives)
"quantile_norm"Heavily skewed or multi-modal targets
"quantile_uni"Alternative to quantile_norm
"1_plus_log"Non-negative targets with large range
Adding more transforms to the tuple increases ensemble diversity, which helps when the target distribution is non-trivial.

Other Inference Settings

model = TabPFNClassifier(
    inference_config={
        "POLYNOMIAL_FEATURES": "no",       # "no", int, or "all" for O(n^2) interactions
        "FINGERPRINT_FEATURE": True,        # hash-based row identifier
        "OUTLIER_REMOVAL_STD": "auto",      # "auto" (12.0), None, or float
        "SUBSAMPLE_SAMPLES": None,          # None, int, float, or list
    },
)
  • POLYNOMIAL_FEATURES: Generates interaction features. Can help when interactions matter but increases feature count quadratically.
  • FINGERPRINT_FEATURE: Adds a hash-based row identifier. Useful by default; try disabling if you have very few features.
  • OUTLIER_REMOVAL_STD: Removes extreme outliers before fitting. Lower values are more aggressive.
  • SUBSAMPLE_SAMPLES: Subsample training rows for faster iteration during experimentation.

Tuning Model Parameters

softmax_temperature

Controls prediction sharpness (classification only):
  • Lower values (e.g., 0.7): sharper, more confident predictions — useful when accuracy is already high
  • Higher values (e.g., 1.2): softer, more calibrated predictions — useful when probability calibration matters
model = TabPFNClassifier(softmax_temperature=0.8)
If you use tuning_config={"calibrate_temperature": True}, the temperature is tuned automatically and overrides this value.

Metric Tuning

For metrics that are sensitive to decision thresholds (F1, balanced accuracy, precision, recall), use the built-in metric tuning:
model = TabPFNClassifier(
    eval_metric="f1",
    tuning_config={
        "calibrate_temperature": True,
        "tune_decision_thresholds": True,
    },
)

Handling Imbalanced Data

  • Set balance_probabilities=True as a quick heuristic for imbalanced datasets
  • For more control, use eval_metric="balanced_accuracy" with threshold tuning
balance_probabilities does not always help. In some cases it can balance predictions at the cost of overall predictive power. Test both settings.

Escalation Path

When the default TabPFN does not meet your needs, try these approaches in roughly this order:
1

Feature engineering

Add domain features, extract datetime components, encode text meaningfully. This is usually the highest-impact change.
2

Feature selection

If you have many features (100+), try filtering to the most informative ones.
3

Metric tuning

Use eval_metric and tuning_config to optimize for your specific evaluation metric.
4

Preprocessing transforms

Experiment with different PREPROCESS_TRANSFORMS and target transforms.
5

Hyperparameter optimization

Use the HPO extension for automated search over the TabPFN hyperparameter space.
6

AutoTabPFN ensembles

Use the AutoTabPFN extension for an automatically tuned ensemble of TabPFN models. Typically gives a few percent boost.
7

Fine-tuning

Fine-tune the pretrained model on your data when you have a specialized domain or distribution shift.

Fine-Tuning

Adapt TabPFN’s pretrained weights to your domain.

AutoTabPFN Ensembles

Automated ensembling for maximum accuracy.

Hyperparameter Optimization

Bayesian optimization over TabPFN’s hyperparameter space.