大语言模型训练与微调工具及平台的技术研究
高效微调技术与参数高效微调(PEFT)框架
这组文献聚焦于如何降低大语言模型微调的算力门槛。涵盖了核心算法(如QLoRA)、统一的微调工作流工具(LLaMA-Factory、PEFT-Factory)以及多种适配器集成框架(LLM-Adapters),并对LoRA/QLoRA在不同模型上的性能表现进行了对比实验。
- QLoRA: Efficient Finetuning of Quantized LLMs(Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, Luke Zettlemoyer, 2023, arXiv (Cornell University))
- LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models(Yaowei Zheng, Richong Zhang, Junhao Zhang, YeYanhan YeYanhan, Zheyan Luo, 2024, No journal)
- PEFT-Factory: Unified Parameter-Efficient Fine-Tuning of Autoregressive Large Language Models(Robert Belanec, Ivan Srba, Mária Bieliková, 2025, ArXiv.org)
- LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models(Zhiqiang Hu, Lei Wang, Yihuai Lan, Wanyu Xu, Ee‐Peng Lim, Lidong Bing, Xing Xu, Soujanya Poria, Roy Lee, 2023, No journal)
- Analyzing LLAMA3 Performance on Classification Task Using LoRA and QLoRA Techniques(Rajvardhan Patil, Priyanka Khot, Venkat N. Gudivada, 2025, Applied Sciences)
- METRIC-BASED COMPARISON OF FINE-TUNED LLAMA 2 AND MIXTRAL LARGE LANGUAGE MODELS FOR INSTRUCTION TASKS(Bohdan M. Pavlyshenko, Ivan Bulka, 2024, Electronics and Information Technologies)
领域适应性定制与特定任务能力增强
该组论文探讨了如何通过微调或数据构建,使通用大模型具备特定领域或特定功能的能力。研究方向包括:中文语言能力的增强、外部工具(API)的使用能力、代码审查自动化、推荐系统对齐以及中国古诗词生成等垂直应用场景。
- Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca(Yiming Cui, Ziqing Yang, Xin Yao, 2023, arXiv (Cornell University))
- ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs(Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, Maosong Sun, Sun, Maosong, 2023, arXiv (Cornell University))
- LLaMA-Reviewer: Advancing Code Review Automation with Large Language Models through Parameter-Efficient Fine-Tuning(Junyi Lu, Lei Yu, LI Xiao-jia, Yang Li, Chun Zuo, 2023, No journal)
- TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation(Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, Xiangnan He, 2023, No journal)
- A Simple Approach of Chinese Poetry Generation Using Pre-trained LLMs(Yei-Zen Tang, Zhengchen Li, J. Y. Yu, Yang Liu, 2025, No journal)
企业级部署策略、知识注入与安全强化
这部分文献侧重于大模型在企业环境中的落地实践。讨论了微调与检索增强生成(RAG)在知识注入上的效果对比、企业私有数据微调的实践指南、差异化隐私(Differential Privacy)下的安全微调方案,以及大模型在企业信息化中的整体挑战。
- Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs(Oded Ovadia, Menachem Brief, Moshik Mishaeli, Oren Elisha, 2024, No journal)
- Fine Tuning LLM for Enterprise: Practical Guidelines and Recommendations(Mathav Raj J, Kushala VM, Harikrishna Warrier, Yogesh Kumar Gupta, 2024, arXiv (Cornell University))
- Hardening LLM Fine-Tuning: From Differentially Private Data Selection to Trustworthy Model Quantization(Zehang Deng, Ruoxi Sun, Minhui Xue, Wanlun Ma, Sheng Wen, Surya Nepal, Yang Xiang, 2025, IEEE Transactions on Information Forensics and Security)
- 大语言模型在企业信息化中的应用探讨(刘浩东, 2025, 电子商务评论)
模型架构演进、对齐技术与多模态综述
该组文献提供了更宏观的技术视角,涉及大模型体系结构的演进趋势。包括ChatGLM等国产模型家族的迭代经验、基于Transformer的NLP模型发展史,以及从单模态向通用多模态基础模型演进的深度综述。
- ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools(Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Gang Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, M. Liu, Minlie Huang, Peng Zhang, Qinkai Zheng, Rui Lu, Shuaiqi Duan, Shudan Zhang, Shulin Cao, Shuxun Yang, Weng Lam Tam, Wenyi Zhao, Xiao Liu, Xia Xiao, Xiaohan Zhang, Xiaotao Gu, LV Xin, Xinghan Liu, Xinyi Liu, Xinyue Yang, Xixuan Song, Xunkai Zhang, Yifan An, Yifan Xu, Yilin Niu, Yuantao Yang, Yueyan Li, Yushi Bai, Yuxiao Dong, Zehan Qi, Zhaoyu Wang, Zhen Yang, Zhengxiao Du, Zhenyu Hou, Zihan Wang, Hou, Zhenyu, Wang, Zihan, 2024, arXiv (Cornell University))
- 基于Transformer的自然语言处理模型综述(赖鸣姝, 2023, 人工智能与机器人研究)
- Multimodal Foundation Models: From Specialists to General-Purpose Assistants(Chunyuan Li, Zhe Gan, Zhengyuan Yang, Jianwei Yang, Linjie Li, Lijuan Wang, Jianfeng Gao, 2024, Foundations and Trends® in Computer Graphics and Vision)
本组参考文献系统地展示了大语言模型从底层训练算法、高效微调工具链、特定领域能力扩展,到企业级落地策略及安全防御的技术全景。研究重点已从单纯追求模型参数规模演进为:1. 开发如LLaMA-Factory和PEFT-Factory等集成化工具平台;2. 探索LoRA、QLoRA等参数高效型微调方案以降低资源消耗;3. 深入研究模型在特定任务(工具调用、代码审查、推荐等)中的对齐与泛化;4. 权衡微调与RAG在知识更新中的优劣,并兼顾数据隐私安全。
总计18篇相关文献
Large Language Models (LLMs) have demonstrated remarkable performance across diverse domains, thereby prompting researchers to explore their potential for use in recommendation systems. Initial attempts have leveraged the exceptional capabilities of LLMs, such as rich knowledge and strong generalization through In-context Learning, which involves phrasing the recommendation task as prompts. Nevertheless, the performance of LLMs in recommendation tasks remains suboptimal due to a substantial disparity between the training tasks for LLMs and recommendation tasks, as well as inadequate recommendation data during pre-training. To bridge the gap, we consider building a Large Recommendation Language Model by tunning LLMs with recommendation data. To this end, we propose an efficient and effective Tuning framework for Aligning LLMs with Recommendations, namely TALLRec. We have demonstrated that the proposed TALLRec framework can significantly enhance the recommendation capabilities of LLMs in the movie and book domains, even with a limited dataset of fewer than 100 samples. Additionally, the proposed framework is highly efficient and can be executed on a single RTX 3090 with LLaMA-7B. Furthermore, the fine-tuned LLM exhibits robust cross-domain generalization. Our code and data are available at https://github.com/SAI990323/TALLRec.
The success of large language models (LLMs), like GPT-4 and ChatGPT, has led to the development of numerous cost-effective and accessible alternatives that are created by finetuning open-access LLMs with task-specific data (e.g., ChatDoctor) or instruction data (e.g., Alpaca). Among the various fine-tuning methods, adapter-based parameter-efficient fine-tuning (PEFT) is undoubtedly one of the most attractive topics, as it only requires fine-tuning a few external parameters instead of the entire LLMs while achieving comparable or even better performance. To enable further research on PEFT methods of LLMs, this paper presents LLM-Adapters, an easy-to-use framework that integrates various adapters into LLMs and can execute these adapter-based PEFT methods of LLMs for different tasks. The framework includes state-of-the-art open-access LLMs such as LLaMA, BLOOM, and GPT-J, as well as widely used adapters such as Series adapters, Parallel adapter, Prompt-based learning and Reparametrization-based methods. Moreover, we conduct extensive empirical studies on the impact of adapter types, placement locations, and hyper-parameters to the best design for each adapter-based methods. We evaluate the effectiveness of the adapters on fourteen datasets from two different reasoning tasks, Arithmetic Reasoning and Commonsense Reasoning. The results demonstrate that using adapter-based PEFT in smaller-scale LLMs (7B) with few extra trainable parameters yields comparable, and in some cases superior, performance to powerful LLMs (175B) in zero-shot inference on simple math reasoning datasets.
We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained on ten trillions of tokens mostly in Chinese and English, along with a small set of corpus from 24 languages, and aligned primarily for Chinese and English usage. The high-quality alignment is achieved via a multi-stage post-training process, which involves supervised fine-tuning and learning from human feedback. Evaluations show that GLM-4 1) closely rivals or outperforms GPT-4 in terms of general metrics such as MMLU, GSM8K, MATH, BBH, GPQA, and HumanEval, 2) gets close to GPT-4-Turbo in instruction following as measured by IFEval, 3) matches GPT-4 Turbo (128K) and Claude 3 for long context tasks, and 4) outperforms GPT-4 in Chinese alignments as measured by AlignBench. The GLM-4 All Tools model is further aligned to understand user intent and autonomously decide when and which tool(s) touse -- including web browser, Python interpreter, text-to-image model, and user-defined functions -- to effectively complete complex tasks. In practical applications, it matches and even surpasses GPT-4 All Tools in tasks like accessing online information via web browsing and solving math problems using Python interpreter. Over the course, we have open-sourced a series of models, including ChatGLM-6B (three generations), GLM-4-9B (128K, 1M), GLM-4V-9B, WebGLM, and CodeGeeX, attracting over 10 million downloads on Hugging face in the year 2023 alone. The open models can be accessed through https://github.com/THUDM and https://huggingface.co/THUDM.
The automation of code review activities, a long-standing pursuit in software engineering, has been primarily addressed by numerous domain-specific pre-trained models. Despite their success, these models frequently demand extensive resources for pre-training from scratch. In contrast, Large Language Models (LLMs) provide an intriguing alternative, given their remarkable capabilities when supplemented with domain-specific knowledge. However, their potential for automating code review tasks remains largely unexplored.In response to this research gap, we present LLaMA-Reviewer, an innovative framework that leverages the capabilities of LLaMA, a popular LLM, in the realm of code review. Mindful of resource constraints, this framework employs parameter-efficient fine-tuning (PEFT) methods, delivering high performance while using less than 1% of trainable parameters.An extensive evaluation of LLaMA-Reviewer is conducted on two diverse, publicly available datasets. Notably, even with the smallest LLaMA base model consisting of 6.7B parameters and a limited number of tuning epochs, LLaMA-Reviewer equals the performance of existing code-review-focused models.The ablation experiments provide insights into the influence of various fine-tuning process components, including input representation, instruction tuning, and different PEFT methods. To foster continuous progress in this field, the code and all PEFT-weight plugins have been made open-source.
Despite the advancements of open-source large language models (LLMs), e.g., LLaMA, they remain significantly limited in tool-use capabilities, i.e., using external tools (APIs) to fulfill human instructions. The reason is that current instruction tuning largely focuses on basic language tasks but ignores the tool-use domain. This is in contrast to the excellent tool-use capabilities of state-of-the-art (SOTA) closed-source LLMs, e.g., ChatGPT. To bridge this gap, we introduce ToolLLM, a general tool-use framework encompassing data construction, model training, and evaluation. We first present ToolBench, an instruction-tuning dataset for tool use, which is constructed automatically using ChatGPT. Specifically, the construction can be divided into three stages: (i) API collection: we collect 16,464 real-world RESTful APIs spanning 49 categories from RapidAPI Hub; (ii) instruction generation: we prompt ChatGPT to generate diverse instructions involving these APIs, covering both single-tool and multi-tool scenarios; (iii) solution path annotation: we use ChatGPT to search for a valid solution path (chain of API calls) for each instruction. To enhance the reasoning capabilities of LLMs, we develop a novel depth-first search-based decision tree algorithm. It enables LLMs to evaluate multiple reasoning traces and expand the search space. Moreover, to evaluate the tool-use capabilities of LLMs, we develop an automatic evaluator: ToolEval. Based on ToolBench, we fine-tune LLaMA to obtain an LLM ToolLLaMA, and equip it with a neural API retriever to recommend appropriate APIs for each instruction. Experiments show that ToolLLaMA demonstrates a remarkable ability to execute complex instructions and generalize to unseen APIs, and exhibits comparable performance to ChatGPT. Our ToolLLaMA also demonstrates strong zero-shot generalization ability in an out-of-distribution tool-use dataset: APIBench.
Critical infrastructures are increasingly integrating artificial intelligence (AI) technologies, including large language models (LLMs), into essential systems and services that are vital to societal functioning. Fine-tuning LLMs for specific domain tasks are crucial for their effective deployment in these contexts, but this process must carefully address both privacy and security concerns. Without proper safeguards, such integration can introduce additional risks, such as data leakage during training and diminished model trustworthiness due to the need for model compression to operate within limited bandwidth and computational capacity constraints. In this paper, we propose <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Hardening LLM Fine-tuning framework</i> (HARDLLM), which addresses these challenges through two key components: (<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i</i>) we develop a differentially private data selection method that ensures privacy protection by training the model exclusively on sampled and synthesized public data, thereby preventing any direct use of private data and enhancing leakage resilience throughout the training process, and (<italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ii</i>) we introduce a trustworthiness-aware model quantization approach to improve LLMs performance, such as reducing toxicity, enhancing adversarial robustness, and mitigating stereotypes, while maintaining negligible impact on model utility. Experimental results show that, the proposed algorithm ensures differential privacy when privacy budget is set at ϵ = 0.5, with only a 1% drop in accuracy, while other state-of-the-art methods experience an accuracy drop of at least 20% under the same privacy budget. Additionally, our quantization approach improves the trustworthiness of fine-tuned LLMs by an average of 3-4%, with only a negligible utility loss (approximately 1%) at a 50% compression rate.
This monograph presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities, focusing on the transition from specialist models to generalpurpose assistants. The research landscape encompasses five core topics, categorized into two classes. (i) We start with a survey of well-established research areas: multimodal foundation models pre-trained for specific purposes, including two topics – methods of learning vision backbones for visual understanding and text-to-image generation. (ii) Then, we present recent advances in exploratory, open research areas: multimodal foundation models that aim to play the role of general-purpose assistants, including three topics – unified vision models inspired by large language models (LLMs), end-to-end training of multimodal LLMs, and chaining multimodal tools with LLMs. The target audiences of the monograph are researchers, graduate students, and professionals in computer vision and vision-language multimodal communities who are eager to learn the basics and recent advances in multimodal foundation models.
We present QLoRA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. QLoRA backpropagates gradients through a frozen, 4-bit quantized pretrained language model into Low Rank Adapters~(LoRA). Our best model family, which we name Guanaco, outperforms all previous openly released models on the Vicuna benchmark, reaching 99.3% of the performance level of ChatGPT while only requiring 24 hours of finetuning on a single GPU. QLoRA introduces a number of innovations to save memory without sacrificing performance: (a) 4-bit NormalFloat (NF4), a new data type that is information theoretically optimal for normally distributed weights (b) double quantization to reduce the average memory footprint by quantizing the quantization constants, and (c) paged optimziers to manage memory spikes. We use QLoRA to finetune more than 1,000 models, providing a detailed analysis of instruction following and chatbot performance across 8 instruction datasets, multiple model types (LLaMA, T5), and model scales that would be infeasible to run with regular finetuning (e.g. 33B and 65B parameter models). Our results show that QLoRA finetuning on a small high-quality dataset leads to state-of-the-art results, even when using smaller models than the previous SoTA. We provide a detailed analysis of chatbot performance based on both human and GPT-4 evaluations showing that GPT-4 evaluations are a cheap and reasonable alternative to human evaluation. Furthermore, we find that current chatbot benchmarks are not trustworthy to accurately evaluate the performance levels of chatbots. A lemon-picked analysis demonstrates where Guanaco fails compared to ChatGPT. We release all of our models and code, including CUDA kernels for 4-bit training.
Large language models (LLMs), consisting of billions and trillions of parameters, have demonstrated exceptional ability in natural language understanding (NLU) and natural language generation (NLG) tasks. Increases in their numbers of parameters and model sizes have resulted in better performance and accuracy. However, models with such enormous numbers of parameters incur significant computational costs and resources, making them challenging to fine tune and adapt to a specific downstream task. Several parameter-efficient fine-tuning (PEFT) techniques have been proposed to address this issue. This study demonstrates the improvement obtained over the base LLaMA3-8B model using two prominent PEFT techniques: LoRA and QLoRA. We use the sequence classification task of sentiment analysis to conduct the experiments. Additionally, we analyze the effects of hyperparameter adjustments (r and α) on the model’s performance. We examine the tradeoff between efficiency and memory savings obtained using the quantized LoRA (QLoRA) technique. We also investigate and compare the performance changes of LoRA and QLoRA techniques obtained after adapting to attention layers (query, key, value, and project) to all the linear layers during fine tuning. We report the findings of our work along with limitations and future directions.
No abstract
Large Language Models (LLMs), such as ChatGPT and GPT-4, have dramatically transformed natural language processing research and shown promising strides towards Artificial General Intelligence (AGI). Nonetheless, the high costs associated with training and deploying LLMs present substantial obstacles to transparent, accessible academic research. While several large language models, such as LLaMA, have been open-sourced by the community, these predominantly focus on English corpora, limiting their usefulness for other languages. In this paper, we propose a method to augment LLaMA with capabilities for understanding and generating Chinese text and its ability to follow instructions. We achieve this by extending LLaMA's existing vocabulary with an additional 20,000 Chinese tokens, thereby improving its encoding efficiency and semantic understanding of Chinese. We further incorporate secondary pre-training using Chinese data and fine-tune the model with Chinese instruction datasets, significantly enhancing the model's ability to comprehend and execute instructions. Our experimental results indicate that the newly proposed model markedly enhances the original LLaMA's proficiency in understanding and generating Chinese content. Additionally, the results on the C-Eval dataset yield competitive performance among the models with several times the size of ours. We have made our pre-trained models, training scripts, and other resources available through GitHub, fostering open research for our community. Chinese LLaMA series: \url{https://github.com/ymcui/Chinese-LLaMA-Alpaca} and Chinese Llama-2 series: \url{https://github.com/ymcui/Chinese-LLaMA-Alpaca-2}
The paper considers a comprehensive analysis and comparative study of two advanced Large Language Models (LLMs), namely LLaMA 2 and Mixtral, with a specific focus on their performance in executing instructional tasks. These models were fine-tuned using techniques such as LoRA and QLoRA, which were applied to extensive instruction datasets. The fine-tuning process was further enhanced by the implementation of Parameter-Efficient Fine-Tuning (PEFT) on NVIDIA A100 Tensor Core GPU instances, ensuring optimal performance. Both LLaMA 2 and Mixtral models were fine-tuned using the Hugging Face and PyTorch platforms, ensuring that similar parameters were maintained to facilitate a fair comparison. An inference was made using data not used in the initial training phase. This approach was adopted to test the models' ability to generalize and adapt to new, unseen data, thereby providing a more robust evaluation of their performance. An evaluation framework was established using the RAGAS library. The framework was designed to provide precise and reliable metrics, offering a comprehensive assessment of the models' performance. While the LLaMA 2 model demonstrates a faster rate of fine-tuning, it is susceptible to overfitting. On the other hand, Mixtrail, despite requiring more time for training, outperforms in evaluations, making it a more dependable tool for instructional tasks. Keywords: LLMs, PEFT, Lora, Qlora, Mixtral, LLaMA, LLMs fine-tuning
There is a compelling necessity from enterprises for fine tuning LLMs (Large Language Models) o get them trained on proprietary domain knowledge. The challenge is to imbibe the LLMs with domain specific knowledge using the most optimial resource and cost and in the best possible time. Many enterprises rely on RAG (Retrieval Augmented Generation) which does not need LLMs to be ine-tuned but they are limited by the quality of vector databases and their retrieval capabilities rather than the intrinsic capabilities of the LLMs themselves. In our current work we focus on fine tuning LLaMA, an open source LLM using proprietary documents and code from an enterprise repository and use the fine tuned models to evaluate the quality of responses. As part of this work, we aim to guide beginners on how to start with fine tuning an LLM for documentation and code by making educated guesses on size of GPU required and options that are available for formatting the data. We also propose pre processing recipes for both documentation and code to prepare dataset in different formats. The proposed methods of data preparation for document datasets are forming paragraph chunks, forming question and answer pairs and forming keyword and paragraph chunk pairs. For code dataset we propose forming summary and function pairs. Further, we qualitatively evaluate the results of the models for domain specific queries. Finally, we also propose practical guidelines and recommendations for fine tuning LLMs.
Large language models (LLMs) encapsulate a vast amount of factual information within their pre-trained weights, as evidenced by their ability to answer diverse questions across different domains.However, this knowledge is inherently limited, relying heavily on the characteristics of the training data.Consequently, using external datasets to incorporate new information or refine the capabilities of LLMs on previously seen information poses a significant challenge.In this study, we compare two common approaches: unsupervised fine-tuning and retrieval-augmented generation (RAG).We evaluate both approaches on a variety of knowledge-intensive tasks across different topics.Our findings reveal that while unsupervised fine-tuning offers some improvement, RAG consistently outperforms it, both for existing knowledge encountered during training and entirely new knowledge.Moreover, we find that LLMs struggle to learn new factual information through unsupervised fine-tuning, and that exposing them to numerous variations of the same fact during training could alleviate this problem.
Parameter-Efficient Fine-Tuning (PEFT) methods address the increasing size of Large Language Models (LLMs). Currently, many newly introduced PEFT methods are challenging to replicate, deploy, or compare with one another. To address this, we introduce PEFT-Factory, a unified framework for efficient fine-tuning LLMs using both off-the-shelf and custom PEFT methods. While its modular design supports extensibility, it natively provides a representative set of 19 PEFT methods, 27 classification and text generation datasets addressing 12 tasks, and both standard and PEFT-specific evaluation metrics. As a result, PEFT-Factory provides a ready-to-use, controlled, and stable environment, improving replicability and benchmarking of PEFT methods. PEFT-Factory is a downstream framework that originates from the popular LLaMA-Factory, and is publicly available at https://github.com/kinit-sk/PEFT-Factory.
This paper explores a simple approach of Chinese poetry generation using pre-trained large language models (LLMs), specifically the Qwen1.5 Model series. Leveraging the capabilities of these advanced models, the authors pre-trained and fine-tuned on customized poetry datasets using LLaMA Factory. The developed model Xuejiu-Poem aims to produce authentic and aesthetically pleasing traditional Chinese poems. Results demonstrate that specialized training can surpass the performance of larger models, such as GPT-4o, in specific poetry generation tasks. The generated poems exhibit strong adherence to traditional formats and stylistic conventions of classical Chinese poetry, underscoring the potential of LLMs in creative applications. This study provides valuable insights into the application of LLMs to literary tasks and suggests promising avenues for future research in classical literature generation.
随着人工智能技术的迅猛发展,大语言模型在企业信息化过程中展现出巨大的应用潜力与价值。本文深入探讨了大语言模型在企业信息化中的多种应用场景,包括智能化客户服务、企业内部知识管理、业务流程自动化以及市场分析与营销优化。分析了大语言模型的应用面临模型层面、实施层面及社会层面的多重挑战。针对这些挑战,提出了相应的对策。随着大语言模型技术的进一步成熟和应用生态的不断完善,其在企业信息化中的作用将更加重要。
自然语言处理是计算机科学中深度学习领域的一个分支,旨在使计算机能够理解、解析或生成人类语言(包括文字、音频等)。本文主要介绍了自然语言处理(Natural Language Processing, NLP)中基于Transformer结构所衍生出的多种类型的模型。近年,随着深度学习技术的快速发展,自然语言处理模型的性能也得到了极大的提升,更多的自然语言处理任务得到了更好的解决。这些进展主要得益于神经网络模型的不断发展。本文讲解了当前最为流行的基于Transformer的几类自然语言处理模型,包括BERT (Bidirectional Encoder Representations from Transformers)系列、GPT (Generative Pre-trained Transformer)系列和T5系列等。主要介绍了上述系列的模型各自的发展变化以及其在模型结构,设计思路等方面的区别与联系。同时,对于自然语言处理领域未来的发展方向进行了展望。
本组参考文献系统地展示了大语言模型从底层训练算法、高效微调工具链、特定领域能力扩展,到企业级落地策略及安全防御的技术全景。研究重点已从单纯追求模型参数规模演进为:1. 开发如LLaMA-Factory和PEFT-Factory等集成化工具平台;2. 探索LoRA、QLoRA等参数高效型微调方案以降低资源消耗;3. 深入研究模型在特定任务(工具调用、代码审查、推荐等)中的对齐与泛化;4. 权衡微调与RAG在知识更新中的优劣,并兼顾数据隐私安全。