Skip to main content
Fine-tuning updates TabPFN’s pretrained transformer parameters using gradient descent on your dataset. This retains TabPFN’s learned priors while aligning the model more closely with your target data distribution. You can fine-tune both:

When to Fine-Tune

Fine-tuning is not always necessary. TabPFN’s in-context learning already adapts to your data at inference time. Fine-tuning adds value in specific scenarios:

Good Candidates for Fine-Tuning

Niche or specialized domains

Your data represents a distribution not well-covered by TabPFN’s pretraining priors — e.g., molecular properties, specialized sensor data, or domain-specific financial instruments.

Consistent data schema

You have a stable schema that you’ll predict on repeatedly. Fine-tuning amortizes the upfront cost across many future predictions.

Large training sets (10k+ rows)

With more data, fine-tuning can learn meaningful adaptations without overfitting.

Multiple related tables

You have a family of related datasets (e.g., multiple experiments, regional variants) and want to fine-tune a single model across them.

When Fine-Tuning is Less Likely to Help

  • On very small datasets (< 1000 rows), the risk of overfitting outweighs adaptation benefits. Try feature engineering or AutoTabPFN ensembles instead.
  • If baseline TabPFN is already within a few percent of your target metric, the simpler approaches in Tips & Tricks often close the gap with less effort.
  • On datasets with gradual temporal distribution shifts and many features, fine-tuning can be less stable. Make sure your train/validation split respects the time ordering.

Decision Flowchart

1

Run baseline TabPFN

Evaluate default TabPFNClassifier or TabPFNRegressor on your task.
2

Try quick wins first

Apply feature engineering, metric tuning, and preprocessing tuning — these are faster to iterate on.
3

Try AutoTabPFN or HPO

4

Fine-tune when plateau'd

If performance has plateau’d and you have sufficient data (1000+ rows), fine-tuning can push past the ceiling by adapting the model’s internal representations.

Getting Started

Fine-tuning shares the same interface as TabPFNClassifier and TabPFNRegressor.

1. Prepare Your Dataset

Load and split your data into train and test sets. Use a proper validation strategy: for time-dependent data, use temporal splits rather than random splits.
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

2. Configure and Train

from tabpfn.finetuning import FinetunedTabPFNClassifier

finetuned_clf = FinetunedTabPFNClassifier(
    device="cuda",
    epochs=30,
    learning_rate=1e-5,
)

finetuned_clf.fit(X_train, y_train)
By default, fine-tuning splits off 10% of the training data for validation and uses early stopping (patience of 8 epochs). You can also provide your own validation set, which is useful for temporal data or other cases where a random split isn’t appropriate:
finetuned_clf.fit(X_train, y_train, X_val=X_val, y_val=y_val)

3. Predict

y_pred = finetuned_clf.predict(X_test)
y_pred_proba = finetuned_clf.predict_proba(X_test)

Hyperparameters

Core Parameters

ParameterDefaultDescription
epochs30Number of fine-tuning epochs. More epochs allow deeper adaptation but risk overfitting.
learning_rate1e-5Step size for gradient updates. Lower values are safer but slower to converge.
device"cuda"GPU is strongly recommended. Fine-tuning on CPU is very slow.

Tuning Guidelines

Learning rate:
  • Start with 1e-5 (the default). This is conservative and preserves pretrained knowledge.
  • For larger datasets (10k+ rows), you can try 3e-5 to 1e-4 for faster convergence.
  • If you see training loss spike or diverge, reduce the learning rate.
Epochs:
  • 10–30 epochs is a good starting range for most datasets.
  • For high-accuracy tasks where you’re fine-tuning carefully, use more epochs (50–100) with a lower learning rate to allow gradual adaptation without destroying pretrained representations.
  • Monitor validation loss to detect overfitting — stop if validation performance degrades.
Fine-tuning requires GPU acceleration. While it will run on CPU, training times will be impractical for most use cases.

Multi-GPU Fine-Tuning

Fine-tuning supports multi-GPU training via PyTorch DDP (Distributed Data Parallel). This is auto-detected when launched with torchrun:
torchrun --nproc-per-node=4 your_finetuning_script.py
No code changes are needed. The DDP setup is handled internally based on the LOCAL_RANK environment variable that torchrun sets. Note that .fit() should only be called once per torchrun session.

How It Works

TabPFN performs in-context learning: during inference, it processes both training data and test samples in a single forward pass, using attention to identify relevant patterns. Fine-tuning adapts the transformer’s weights so that the attention mechanism more accurately reflects the similarity structure of your specific data. Concretely, after fine-tuning:
  • The query representations of test samples and key representations of training samples produce dot products that better reflect their target similarity.
  • This allows the fine-tuned model to more appropriately weight relevant in-context samples when making predictions.
The fine-tuning process decouples the preprocessing pipeline to generate transformed tensors that mirror the preprocessing configurations used during inference, ensuring the model optimizes on the exact same data variations it encounters when making predictions.

Best Practices

Before fine-tuning, establish a baseline with the default TabPFNClassifier or TabPFNRegressor. Fine-tuning should measurably improve on this baseline — if it doesn’t, the simpler model is preferable.
Split a held-out validation set and monitor performance across epochs. For time-series or temporal data, use a temporal split rather than random cross-validation.
Begin with the defaults (epochs=30, learning_rate=1e-5). Only increase aggressiveness if you see clear room for improvement without signs of overfitting.
Fine-tuning and feature engineering are complementary. Good features make fine-tuning more effective by giving the model better signal to adapt to.
With fewer than ~1000 rows, fine-tuning can overfit quickly. Use fewer epochs, a lower learning rate, or consider whether AutoTabPFN ensembles might be more appropriate.

Enterprise Fine-Tuning

For organizations with proprietary datasets, Prior Labs offers an enterprise fine-tuning program that includes:
  • Fine-tuning on your organization’s data corpus for a customized, high-performance model
  • Support for fine-tuning across collections of related datasets
  • Optimized training infrastructure

Enterprise Fine-Tuning

Learn more about fine-tuning TabPFN for your organization.

Tips & Tricks

Quick wins to try before fine-tuning.

AutoTabPFN Ensembles

Automated ensembling as an alternative to fine-tuning.

Hyperparameter Optimization

Automated search over TabPFN’s hyperparameter space.

GitHub Examples

See more examples and fine-tuning utilities in our TabPFN GitHub repository.