温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,汇文网负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
网站客服:3074922707
GPT
技术
报告
2023
98
GPT-4 Technical ReportOpenAIAbstractWe report the development of GPT-4,a large-scale,multimodal model which canaccept image and text inputs and produce text outputs.While less capable thanhumans in many real-world scenarios,GPT-4 exhibits human-level performanceon various professional and academic benchmarks,including passing a simulatedbar exam with a score around the top 10%of test takers.GPT-4 is a Transformer-based model pre-trained to predict the next token in a document.The post-trainingalignment process results in improved performance on measures of factuality andadherence to desired behavior.A core component of this project was developinginfrastructure and optimization methods that behave predictably across a widerange of scales.This allowed us to accurately predict some aspects of GPT-4sperformance based on models trained with no more than 1/1,000th the compute ofGPT-4.1IntroductionThis technical report presents GPT-4,a large multimodal model capable of processing image andtext inputs and producing text outputs.Such models are an important area of study as they have thepotential to be used in a wide range of applications,such as dialogue systems,text summarization,and machine translation.As such,they have been the subject of substantial interest and progress inrecent years 128.One of the main goals of developing such models is to improve their ability to understand and generatenatural language text,particularly in more complex and nuanced scenarios.To test its capabilitiesin such scenarios,GPT-4 was evaluated on a variety of exams originally designed for humans.Inthese evaluations it performs quite well and often outscores the vast majority of human test takers.For example,on a simulated bar exam,GPT-4 achieves a score that falls in the top 10%of test takers.This contrasts with GPT-3.5,which scores in the bottom 10%.On a suite of traditional NLP benchmarks,GPT-4 outperforms both previous large language modelsand most state-of-the-art systems(which often have benchmark-specifi c training or hand-engineering).On the MMLU benchmark 29,30,an English-language suite of multiple-choice questions covering57 subjects,GPT-4 not only outperforms existing models by a considerable margin in English,butalso demonstrates strong performance in other languages.On translated variants of MMLU,GPT-4surpasses the English-language state-of-the-art in 24 of 26 languages considered.We discuss thesemodel capability results,as well as model safety improvements and results,in more detail in latersections.This report also discusses a key challenge of the project,developing deep learning infrastructure andoptimization methods that behave predictably across a wide range of scales.This allowed us to makepredictions about the expected performance of GPT-4(based on small runs trained in similar ways)that were tested against the fi nal run to increase confi dence in our training.Despite its capabilities,GPT-4 has similar limitations to earlier GPT models 1,31,32:it is not fullyreliable(e.g.can suffer from“hallucinations”),has a limited context window,and does not learnPlease cite this work as“OpenAI(2023).Full authorship contribution statements appear at the end of thedocument.from experience.Care should be taken when using the outputs of GPT-4,particularly in contextswhere reliability is important.GPT-4s capabilities and limitations create signifi cant and novel safety challenges,and we believecareful study of these challenges is an important area of research given the potential societal impact.This report includes an extensive system card(after the Appendix)describing some of the risks weforesee around bias,disinformation,over-reliance,privacy,cybersecurity,proliferation,and more.It also describes interventions we made to mitigate potential harms from the deployment of GPT-4,including adversarial testing with domain experts,and a model-assisted safety pipeline.2Scope and Limitations of this Technical ReportThis report focuses on the capabilities,limitations,and safety properties of GPT-4.GPT-4 is aTransformer-style model 33 pre-trained to predict the next token in a document,using both publiclyavailable data(such as internet data)and data licensed from third-party providers.The model wasthen fi ne-tuned using Reinforcement Learning from Human Feedback(RLHF)34.Given boththe competitive landscape and the safety implications of large-scale models like GPT-4,this reportcontains no further details about the architecture(including model size),hardware,training compute,dataset construction,training method,or similar.We are committed to independent auditing of our technologies,and shared some initial steps andideas in this area in the system card accompanying this release.2We plan to make further technicaldetails available to additional third parties who can advise us on how to weigh the competitive andsafety considerations above against the scientifi c value of further transparency.3Predictable ScalingA large focus of the GPT-4 project