README.md
8.4 KB · 137 lines · markdown Raw
1 ---
2 license: other
3 license_name: tabpfn-2.5-license-v1.1
4 license_link: LICENSE
5 extra_gated_fields:
6 Organization: text
7 Role:
8 type: select
9 options:
10 - Field practitioners
11 - Researcher
12 - Student
13 Use-case: text
14 May we contact you about future updates?: checkbox
15 extra_gated_button_content: Agree to license terms and send request to access repo.
16 extra_gated_description: "Model weights released under\_`tabpfn-2.5-license-v1.1`. This license is designed to be permissive for research and internal evaluation. It *explicitly allows* testing, evaluation, and internal benchmarking, so an organization can download the model and run preliminary assessments on its own datasets.\nThe key restriction is that the model, its derivatives, and its outputs cannot be used for any commercial or production purpose. This includes, but is not limited to, revenue-generating products, competitive benchmarking for procurement, client deliverables, or using the model’s results for internal commercial decision-making.\nFor all production use cases, we offer a *Commercial Enterprise License*. This provides access to our proprietary high-speed inference engine, dedicated support, integration tooling, and other internal models. Please contact us at sales@priorlabs.ai for commercial licensing inquiries."
17 pipeline_tag: tabular-classification
18 tags:
19 - chemistry
20 - biology
21 - finance
22 - legal
23 - climate
24 - medical
25 ---
26 ### Model Overview
27 TabPFN-2.5 is a transformer-based foundation model that uses in-context-learning to solve tabular prediction problems in a forward pass.
28 Inference code can be found at [https://github.com/PriorLabs/tabPFN](https://github.com/PriorLabs/tabPFN).
29
30 ### Getting started
31 First, install the inference package:
32 ```{bash}
33 pip install tabpfn
34 ```
35
36 Fitting a classifier and predicting looks like this:
37
38 ```{python}
39 from sklearn.datasets import load_breast_cancer
40 from sklearn.model_selection import train_test_split
41 from tabpfn import TabPFNClassifier
42
43 # Load data
44 X, y = load_breast_cancer(return_X_y=True)
45 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)
46
47 # Initialize a classifier
48 clf = TabPFNClassifier() # Uses TabPFN 2.5 weights, finetuned on real data.
49 clf.fit(X_train, y_train)
50
51
52 # Predict probabilities
53 prediction_probabilities = clf.predict_proba(X_test)
54 # Predict labels
55 predictions = clf.predict(X_test)
56 print("Accuracy", accuracy_score(y_test, predictions))
57 ```
58
59 For more examples (e.g. how to train a regressor), see the github repo: [https://github.com/PriorLabs/tabPFN](https://github.com/PriorLabs/tabPFN)!
60
61 ### Developers & Affiliations
62 Developed by Prior Labs.
63
64 ### Intended Use
65 Regression and classification tasks with ≤50 000 samples and ≤2000 features in structured tabular format.
66
67 ### Not Intended Use
68 - Not suitable for unstructured data (text, images); use API version for textual features.
69 - Not tested for >50 000 samples or > 2000 features.
70
71 ### Model Architecture
72 Transformer with TabPFNv2-like alternating attention with 18-24 layers
73
74 ### Training Data and Priors
75 - TabPFN-2.5: trained purely on synthetic tabular tasks
76 - Real-TabPFN-2.5: continued pre-training on real-world datasets (for details please see Appendix C.1 of the model tech report).
77
78 ### Performance Benchmarks
79 Evaluated on proprietary benchmark collection, TabArena, and RealCause (for a causal version), in each of which it yields new SOTA results by a wide margin. Please see the model tech report for details.
80
81 ### Different checkpoints
82 Beyond the default checkpoints (`tabpfn-v2.5-classifier-v2.5_default.ckpt` and `tabpfn-v2.5-regressor-v2.5_default.ckpt`), the other available checkpoints are experimental and worse on average, and we recommend to always start with the defaults.
83 They can be used as part of an ensembling or hyperparameter optimization system (and are used automatically in `AutoTabPFN` [here](https://github.com/PriorLabs/tabpfn-extensions/tree/main/src/tabpfn_extensions/post_hoc_ensembles)) or tried out manually.
84 Their name suffixes refer to what we expect them to be good at.
85
86 <details>
87 <summary>More detail on each TabPFN-2.5 checkpoint</summary>
88
89 We add the 🌍 emoji for checkpoints finetuned on real datasets. See the [TabPFN-2.5 paper](https://arxiv.org/abs/2511.08667) for the list of 43 datasets.
90
91 - `tabpfn-v2.5-classifier-v2.5_default.ckpt` 🌍: default classification checkpoint, finetuned on real-data.
92 - `tabpfn-v2.5-classifier-v2.5_default-2.ckpt`: best classification synthetic checkpoint. Use this to get the default TabPFN-2.5 classification model without real-data finetuning.
93 - `tabpfn-v2.5-classifier-v2.5_large-features-L.ckpt`: specialized for larger features (up to 500) and small samples (< 5K).
94 - `tabpfn-v2.5-classifier-v2.5_large-features-XL.ckpt`: specialized for larger features (up to 1000, could support `max_features_per_estimator=1000`).
95 - `tabpfn-v2.5-classifier-v2.5_large-samples.ckpt`: specialized for larger sample sizes (larger than 30K)
96 - `tabpfn-v2.5-classifier-v2.5_real.ckpt` 🌍: other real-data finetuned classification checkpoint. Pretty good overall but bad on large features (>100-200).
97 - `tabpfn-v2.5-classifier-v2.5_real-large-features.ckpt` 🌍: other real-data finetuned classification checkpoint, worse on large samples (> 10K)
98 - `tabpfn-v2.5-classifier-v2.5_real-large-samples-and-features.ckpt` 🌍: identical to `tabpfn-v2.5-classifier-v2.5_default.ckpt`
99 - `tabpfn-v2.5-classifier-v2.5_variant.ckpt`: pretty good but bad on large features (> 100-200).
100 - `tabpfn-v2.5-regressor-v2.5_default.ckpt`: default regression checkpoint, trained on synthetic data only.
101 - `tabpfn-v2.5-regressor-v2.5_low-skew.ckpt`: variant specialized at low target skew data (but quite bad on average).
102 - `tabpfn-v2.5-regressor-v2.5_quantiles.ckpt`: variant which might be interesting for quantile / distribution estimation, though the default should still be prioritized for this.
103 - `tabpfn-v2.5-regressor-v2.5_real.ckpt` 🌍: finetuned on real-data. Best checkpoint among the checkpoints finetuned on real data. For regression we recommend the synthetic-only checkpoint as a default, but this checkpoint is quite a bit better on some datasets.
104 - `tabpfn-v2.5-regressor-v2.5_real-variant.ckpt` 🌍: other regression variant finetuned on real data.
105 - `tabpfn-v2.5-regressor-v2.5_small-samples.ckpt`: variant slightly better on small (< 3K) samples.
106 - `tabpfn-v2.5-regressor-v2.5_variant.ckpt`: other variant, no clear specialty but can be better on a few datasets.
107
108 </details>
109
110 ### Ethical Considerations
111 Having been trained purely on synthetic datasets, TabPFN-2.5 is free from dataset leakage from the pretraining stage.
112 However, like for any other tabular prediction method, when applied to high-risk use cases, users should ensure that the labelled data is free of biases.
113 For Real-TabPFN-2.5, you can find the dataset list in Appendix C.1 of the model tech report.
114
115 ### Limitations
116 Performance can degrade when applied to >50000 data points and/or 2000 features.
117
118 ### Licensing
119 Model weights released under tabpfn-2.5-license-v1.1.
120
121 The license is designed to be permissive for research and limited internal evaluation. It *explicitly allows* testing, evaluation, and internal benchmarking, so an organization can download the model and run preliminary assessments on its own datasets.
122 The key restriction is that the model, its derivatives, and its outputs cannot be used for any commercial or production purpose. This includes, but is not limited to, revenue-generating products, competitive benchmarking for procurement, client deliverables, or using the model’s results for internal commercial decision-making.
123 For all production use cases, we offer a *Commercial Enterprise License*. This provides access to our proprietary high-speed inference engine, dedicated support, integration tooling, and other internal models.
124 Please contact us at sales@priorlabs.ai for commercial licensing inquiries.
125
126 ### Version
127 v1.0: initial release.
128
129 ### Citation
130 ```
131 @misc{TabPFN-2.5,\
132       title={TabPFN-2.5: Advancing the State of the Art in\
133 Tabular Foundation Models},\
134       author={Léo Grinsztajn and Klemens Flöge and Oscar Key and Felix Birkel and Brendan Roof and Phil Jund and Benjamin Jäger and Adrian Hayler and Dominik Safaric and Simone Alessi, Felix Jablonski and Mihir Manium and Rosen Yu and Anurag Garg and Jake Robertson and Shi Bin (Liam) Hoo and Vladyslav Moroshan and Magnus Bühler and Lennart Purucker and Clara Cornu and Lilly Charlotte Wehrhahn and Alessandro Bonetto and Sauraj Gambhir and Noah Hollmann and Frank Hutter},\
135       year={2025}\
136 }
137 ```