Hostname: page-component-76fb5796d-qxdb6 Total loading time: 0 Render date: 2024-04-29T23:09:15.370Z Has data issue: false hasContentIssue false

On Finetuning Large Language Models

Published online by Cambridge University Press:  28 November 2023

Yu Wang*
Affiliation:
Fudan Institute for Advanced Study in Social Sciences, Fudan University, Shanghai, China

Abstract

A recent paper by Häffner et al. (2023, Political Analysis 31, 481–499) introduces an interpretable deep learning approach for domain-specific dictionary creation, where it is claimed that the dictionary-based approach outperforms finetuned language models in predictive accuracy while retaining interpretability. We show that the dictionary-based approach’s reported superiority over large language models, BERT specifically, is due to the fact that most of the parameters in the language models are excluded from finetuning. In this letter, we first discuss the architecture of BERT models, then explain the limitations of finetuning only the top classification layer, and lastly we report results where finetuned language models outperform the newly proposed dictionary-based approach by 27% in terms of $R^2$ and 46% in terms of mean squared error once we allow these parameters to learn during finetuning. Researchers interested in large language models, text classification, and text regression should find our results useful. Our code and data are publicly available.

Type
Letter
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of the Society for Political Methodology

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Edited by: John Doe

References

Bestvater, S. E., and Monroe, B. L.. 2023. “Sentiment Is Not Stance: Target-Aware Opinion Classification for Political Text Analysis.” Political Analysis 31 (2): 235256. https://doi.org/10.1017/pan.2022.10 CrossRefGoogle Scholar
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K.. 2019. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” In Proceedings of NAACL-HLT, edited by Burstein, J., Doran, C., and Solorio, T., 41714186. Minneapolis: Association for Computational Linguistics.Google Scholar
Ding, N., et al. 2023. “Parameter-Efficient Fine-Tuning of Large-Scale Pre-Trained Language Models.” Nature Machine Intelligence 5: 220235.CrossRefGoogle Scholar
Dodge, J., Ilharco, G., Schwartz, R., Farhadi, A., Hajishirzi, H., and Smith, N.. 2020. “Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping.” Preprint, arXiv:2002.06305.Google Scholar
Häffner, S., Hofer, M., Nagl, M., and Walterskirchen, J.. 2023. “Introducing an Interpretable Deep Learning Approach to Domain-Specific Dictionary Creation: A Use Case for Conflict Prediction.” Political Analysis 31 (4): 481499. https://doi.org/10.1017/pan.2023.7 CrossRefGoogle Scholar
Houlsby, N., et al. 2019. “Parameter-Efficient Transfer Learning for NLP.” Proceedings of the 36th International Conference on Machine Learning, 27902799. PMLR.Google Scholar
Howard, J., and Ruder, S.. 2018. “Universal Language Model Fine-Tuning for Text Classification.” Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 328339. Melbourne: Association for Computational Linguistics.Google Scholar
Hu, Y., et al. 2022. “ConfliBERT: A Pre-Trained Language Model for Political Conflict and Violence.” Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics, 54695482, Seattle: Association for Computational Linguistics.Google Scholar
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R.. 2020. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations. ICLR.Google Scholar
Liu, Y., et al. 2019. “RoBERTa: A Robustly Optimized BERT Pretraining Approach.” Preprint, arXiv:1907.11692.Google Scholar
Mosbach, M., Andriushchenko, M., and Klakow, D.. 2021. “On the Stability of Fine-Tuning BERT: Misconceptions, Explanations, and Strong Baselines.” ICLR.Google Scholar
Wang, Y. 2019. “Comparing Random Forest with Logistic Regression for Predicting Class-Imbalanced Civil War Onset Data: A Comment.” Political Analysis 21 (1): 107110.CrossRefGoogle Scholar
Wang, Y. 2023a. “Replication Data for: On Finetuning Large Language Models.” Harvard Dataverse. https://doi.org/10.7910/DVN/7PCLRI CrossRefGoogle Scholar
Wang, Y. 2023b. “Topic Classification for Political Texts with Pretrained Language Models.” Political Analysis 31 (4): 662668.CrossRefGoogle Scholar
Zhang, T., Wu, F., Katiyar, A., Weinberger, K. Q., and Artzi, Y.. 2021. “Revisiting Few-Sample BERT Fine-Tuning.” ICLR.Google Scholar
Supplementary material: PDF

Wang supplementary material

Appendix

Download Wang supplementary material(PDF)
PDF 194.7 KB
Supplementary material: Link
Link