FinetunedTabPFNClassifier— for classification tasksFinetunedTabPFNRegressor— for regression tasks
When to Fine-Tune
Fine-tuning is not always necessary. TabPFN’s in-context learning already adapts to your data at inference time. Fine-tuning adds value in specific scenarios:Good Candidates for Fine-Tuning
Niche or specialized domains
Your data represents a distribution not well-covered by TabPFN’s pretraining priors — e.g., molecular properties, specialized sensor data, or domain-specific financial instruments.
Consistent data schema
You have a stable schema that you’ll predict on repeatedly. Fine-tuning amortizes the upfront cost across many future predictions.
Large training sets (10k+ rows)
With more data, fine-tuning can learn meaningful adaptations without overfitting.
Multiple related tables
You have a family of related datasets (e.g., multiple experiments, regional variants) and want to fine-tune a single model across them.
When Fine-Tuning is Less Likely to Help
- On very small datasets (< 1000 rows), the risk of overfitting outweighs adaptation benefits. Try feature engineering or AutoTabPFN ensembles instead.
- If baseline TabPFN is already within a few percent of your target metric, the simpler approaches in Tips & Tricks often close the gap with less effort.
- On datasets with gradual temporal distribution shifts and many features, fine-tuning can be less stable. Make sure your train/validation split respects the time ordering.
Decision Flowchart
Try quick wins first
Apply feature engineering, metric tuning, and preprocessing tuning — these are faster to iterate on.
Try AutoTabPFN or HPO
If you need more, try AutoTabPFN ensembles or hyperparameter optimization.
Getting Started
Fine-tuning shares the same interface asTabPFNClassifier and TabPFNRegressor.
1. Prepare Your Dataset
Load and split your data into train and test sets. Use a proper validation strategy: for time-dependent data, use temporal splits rather than random splits.2. Configure and Train
3. Predict
Hyperparameters
Core Parameters
| Parameter | Default | Description |
|---|---|---|
epochs | 30 | Number of fine-tuning epochs. More epochs allow deeper adaptation but risk overfitting. |
learning_rate | 1e-5 | Step size for gradient updates. Lower values are safer but slower to converge. |
device | "cuda" | GPU is strongly recommended. Fine-tuning on CPU is very slow. |
Tuning Guidelines
Learning rate:- Start with
1e-5(the default). This is conservative and preserves pretrained knowledge. - For larger datasets (10k+ rows), you can try
3e-5to1e-4for faster convergence. - If you see training loss spike or diverge, reduce the learning rate.
10–30epochs is a good starting range for most datasets.- For high-accuracy tasks where you’re fine-tuning carefully, use more epochs (50–100) with a lower learning rate to allow gradual adaptation without destroying pretrained representations.
- Monitor validation loss to detect overfitting — stop if validation performance degrades.
Multi-GPU Fine-Tuning
Fine-tuning supports multi-GPU training via PyTorch DDP (Distributed Data Parallel). This is auto-detected when launched withtorchrun:
LOCAL_RANK environment variable that torchrun sets. Note that .fit() should only be called once per torchrun session.
How It Works
TabPFN performs in-context learning: during inference, it processes both training data and test samples in a single forward pass, using attention to identify relevant patterns. Fine-tuning adapts the transformer’s weights so that the attention mechanism more accurately reflects the similarity structure of your specific data. Concretely, after fine-tuning:- The query representations of test samples and key representations of training samples produce dot products that better reflect their target similarity.
- This allows the fine-tuned model to more appropriately weight relevant in-context samples when making predictions.
Best Practices
Always compare against baseline
Always compare against baseline
Before fine-tuning, establish a baseline with the default
TabPFNClassifier or TabPFNRegressor. Fine-tuning should measurably improve on this baseline — if it doesn’t, the simpler model is preferable.Use proper validation
Use proper validation
Split a held-out validation set and monitor performance across epochs. For time-series or temporal data, use a temporal split rather than random cross-validation.
Start conservative, then adjust
Start conservative, then adjust
Begin with the defaults (
epochs=30, learning_rate=1e-5). Only increase aggressiveness if you see clear room for improvement without signs of overfitting.Combine with feature engineering
Combine with feature engineering
Fine-tuning and feature engineering are complementary. Good features make fine-tuning more effective by giving the model better signal to adapt to.
Watch for overfitting on small data
Watch for overfitting on small data
With fewer than ~1000 rows, fine-tuning can overfit quickly. Use fewer epochs, a lower learning rate, or consider whether AutoTabPFN ensembles might be more appropriate.
Enterprise Fine-Tuning
For organizations with proprietary datasets, Prior Labs offers an enterprise fine-tuning program that includes:- Fine-tuning on your organization’s data corpus for a customized, high-performance model
- Support for fine-tuning across collections of related datasets
- Optimized training infrastructure
Enterprise Fine-Tuning
Learn more about fine-tuning TabPFN for your organization.
Related
Tips & Tricks
Quick wins to try before fine-tuning.
AutoTabPFN Ensembles
Automated ensembling as an alternative to fine-tuning.
Hyperparameter Optimization
Automated search over TabPFN’s hyperparameter space.
GitHub Examples
See more examples and fine-tuning utilities in our TabPFN GitHub repository.