Back to Chinese Llama Alpaca

Chinese-LLaMA-Alpaca-3 is launched!

README_EN.md

5.033.6 KB
Original Source

Chinese-LLaMA-Alpaca-3 is launched!

🇨🇳中文 | 🌐English | 📖文档/Docs | ❓提问/Issues | 💬讨论/Discussions | ⚔️竞技场/Arena

<p align="center"> </p> <p align="center">
<a href="https://app.codacy.com/gh/ymcui/Chinese-LLaMA-Alpaca/dashboard?utm_source=gh&utm_medium=referral&utm_content=&utm_campaign=Badge_grade"></a>
</p>

To promote open research of large models in the Chinese NLP community, this project has open-sourced the Chinese LLaMA model and the Alpaca large model with instruction fine-tuning. These models expand the Chinese vocabulary based on the original LLaMA and use Chinese data for secondary pre-training, further enhancing Chinese basic semantic understanding. Additionally, the project uses Chinese instruction data for fine-tuning on the basis of the Chinese LLaMA, significantly improving the model's understanding and execution of instructions.

Technical Report (V2)[Cui, Yang, and Yao, 2023] Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca

Main contents of this project:

  • 🚀 Extended Chinese vocabulary on top of original LLaMA with significant encode/decode efficiency
  • 🚀 Open-sourced the Chinese LLaMA (general purpose) and Alpaca (instruction-tuned)
  • 🚀 Open-sourced the pre-training and instruction finetuning (SFT) scripts for further tuning on user's data
  • 🚀 Quickly deploy and experience the quantized version of the large model on CPU/GPU of your laptop (personal PC)
  • 🚀 Support 🤗transformers, llama.cpp, text-generation-webui, LlamaChat, LangChain, , privateGPT, etc.
  • Released versions: 7B (basic, Plus, Pro), 13B (basic, Plus, Pro), 33B (basic, Plus, Pro)

💡 The following image shows the actual experience effect of the 7B version model after local deployment (animation unaccelerated, tested on Apple M1 Max).


Chinese-LLaMA-Alpaca-2| Visual Chinese-LLaMA-Alpaca | Multi-modal VLE | Chinese MiniRBT | Chinese LERT | Chinese-English PERT | Chinese MacBERT | Chinese ELECTRA | Chinese XLNet | Chinese BERT | Knowledge distillation tool TextBrewer | Model pruning tool TextPruner

News

[Apr 30, 2024] Chinese-LLaMA-Alpaca-3 project introduces Llama-3-Chinese-8B and Llama-3-Chinese-8B-Instruct, based on Meta's Llama-3. Check: https://github.com/ymcui/Chinese-LLaMA-Alpaca-3

[Mar 27, 2024] This project is now online at the SOTA! model platform of Synced, see: https://sota.jiqizhixin.com/project/chinese-llama-alpaca

[Aug 14, 2023] Chinese-LLaMA-Alpaca-2 v2.0 released. We open-source Chinese-LLaMA-2-13B and Chinese-Alpaca-2-13B. See https://github.com/ymcui/Chinese-LLaMA-Alpaca-2

[July 19, 2023] Release v5.0: Release Alpaca-Pro models, significantly improve generation quality. Along with Plus-33B models.

[July 19, 2023] We are launching Chinese-LLaMA-Alpaca-2 project.

[July 10, 2023] Beta channel preview, know coming updates in advance. See Discussion

[July 7, 2023] The Chinese-LLaMA-Alpaca family welcomes a new member: Visual Chinese-LLaMA-Alpaca model for visual question answering and chat. The 7B test version is available.

[June 30, 2023] 8K context support with llama.cpp. See Discussion. For 4K+ context support with transformers, see PR#705.

[June 16, 2023] Release v4.1: New technical report, add C-Eval inference script, add low-resource model merging script, etc.

[June 8, 2023] Release v4.0: LLaMA/Alpaca 33B versions are available. We also add privateGPT demo, C-Eval results, etc.

Content Navigation

ChapterDescription
DownloadDownload links for Chinese LLaMA and Alpaca
Model Reconstruction(Important) Explains how to merge downloaded LoRA models with the original LLaMA
Quick DeploymentSteps for quantize and deploy LLMs on personal computers
Example ResultsExamples of the system output
Training DetailsIntroduces the training details of Chinese LLaMA and Alpaca
FAQReplies to some common questions
LimitationsLimitations of the models involved in this project

Model Download

⚠️ User Notice (Must Read)

The official LLaMA models released by Facebook prohibit commercial use, and the official model weights have not been open-sourced (although there are many third-party download links available online). In order to comply with the relevant licenses, it is currently not possible to release the complete model weights. We appreciate your understanding. After Facebook fully opens up the model weights, this project will update its policies accordingly. What is released here are the LoRA weights, which can be seen as a "patch" for the original LLaMA model, and the complete weights can be obtained by merging the two.

Model Overview

The following figure depicts all open-sourced models for our projects (including the second-gen project).

Which model should I use?

The following table provides a basic comparison of the Chinese LLaMA and Alpaca models, as well as recommended usage scenarios (including, but not limited to).

💡 Plus versions are trained on more data, which is highly recommended for use.

Comparison ItemChinese LLaMAChinese Alpaca
Training MethodTraditional CLM (trained on general corpus)Instruction Fine-tuning (trained on instruction data)
Model TypeBase modelInstruction-following model (like ChatGPT)
Training Dataunsupervised free textsupervised instruction data
Vocab size<sup>[3]</sup>4995349954=49953+1 (pad token)
Input TemplateNot requiredMust meet template requirements<sup>[1]</sup>
Suitable Scenarios ✔️Text continuation: Given a context, let the model continue writing1. Instruction understanding (Q&A, writing, advice, etc.)
  1. Multi-turn context understanding (chat, etc.) | | Unsuitable Scenarios ❌ | Instruction understanding, multi-turn chat, etc. | Unrestricted free text generation | | llama.cpp | Use -p parameter to specify context | Use -ins parameter to enable instruction understanding + chat mode | | text-generation-webui | Not suitable for chat mode | Use --cpu to run without a GPU; if not satisfied with generated content, consider modifying prompt | | LlamaChat | Choose "LLaMA" when loading the model | Choose "Alpaca" when loading the model | | inference_hf.py | No additional startup parameters required | Add --with_prompt parameter when launching | | web-demo | Not applicable | Simply provide the Alpaca model location; support multi-turn conversations | | LangChain-demo / privateGPT | Not applicable | Simply provide the Alpaca model location | | Known Issues | If not controlled for termination, it will continue writing until reaching the output length limit.<sup>[2]</sup> | Please use Pro models to avoid short responses (in Plus series). |

[1] Templates are built-in for (llama.cpp/LlamaChat/inference_hf.py/web-demo/LangChain-demo.

[2] If you encounter issues such as low-quality model responses, nonsensical answers, or failure to understand questions, please check whether you are using the correct model and startup parameters for the scenario.

[3] Alpaca model has an additional pad token in vocabulary than LLaMA. Please do not mix LLaMA/Alpaca tokenizers.

Below is a list of models recommended for this project. These models typically use more training data and optimized model training methods and parameters, so they should be used preferentially (for other models, please check Other Models). If you want to experience ChatGPT-like interaction, please use the Alpaca model instead of the LLaMA model. For Alpaca models, please use Pro versions for longer responses. If you prefer shorter response, please use Plus series instead.

ModelTypeDataRequired Original Model<sup>[1]</sup>Size<sup>[2]</sup>Download Links<sup>[3]</sup>
Chinese-LLaMA-Plus-7Bbase modelgeneral 120GLLaMA-7B790M[🤗HF] [🤖ModelScope] [Baidu]
Chinese-LLaMA-Plus-13Bbase modelgeneral 120GLLaMA-13B1.0G[🤗HF] [🤖ModelScope] [Baidu]
Chinese-LLaMA-Plus-33B 🆕base modelgeneral 120GLLaMA-33B1.3G<sup>[6]</sup>[🤗HF] [🤖ModelScope] [Baidu]
Chinese-Alpaca-Pro-7B 🆕instruction-following modelinstruction 4.3M*LLaMA-7B &
LLaMA-Plus-7B*<sup>[4]</sup>1.1G[🤗HF] [🤖ModelScope] [Baidu]
Chinese-Alpaca-Pro-13B 🆕instruction-following modelinstruction 4.3M*LLaMA-13B &
LLaMA-Plus-13B<sup>[4]</sup>*1.3G[🤗HF] [🤖ModelScope] [Baidu]
Chinese-Alpaca-Pro-33B 🆕instruction-following modelinstruction 4.3M*LLaMA-33B &
LLaMA-Plus-33B<sup>[4]</sup>*2.1G[🤗HF] [🤖ModelScope] [Baidu]

[1] The original LLaMA model needs to be applied for use in Facebook-LLaMA or refer to this PR. Due to copyright issues, this project cannot provide downloads, and we ask for your understanding.

[2] The reconstructed model is slightly larger than the original LLaMA (due to the expanded vocabulary); the 7B model is about 13G+.

[3] After downloading, be sure to check whether the SHA256 of the ZIP file is consistent; for the full value, please see SHA256.md.

[4] Merging steps for Alpaca-Plus are different from others, please refer to wiki.

[5] Also known as 30B model in elsewhere. There was a naming typo in release this model by Facebook. We stick to their original paper naming convention here (and also the actual numbers of weights).

[6] Stored in FP16.

The file directory inside the ZIP file is as follows (using Chinese-LLaMA as an example):

chinese_llama_lora_7b/
  - adapter_config.json       # LoRA weight configuration file
  - adapter_model.bin         # LoRA weight file
  - special_tokens_map.json   # special_tokens_map file
  - tokenizer_config.json     # tokenizer configuration file
  - tokenizer.model           # tokenizer file

Other Models

Due to factors such as training methods and training data, the models below are no longer recommended (they may still be useful in specific scenarios). Please preferentially use the recommended models in the previous section.

ModelTypeDataRequired Original Model<sup>[1]</sup>Size<sup>[2]</sup>Download Links<sup>[3]</sup>
Chinese-LLaMA-7BBase modelgeneral 20GLLaMA-7B770M[🤗HF] [🤖ModelScope] [Baidu]
Chinese-LLaMA-13BBase modelgeneral 20GLLaMA-13B1.0G[🤗HF] [🤖ModelScope] [Baidu]
Chinese-LLaMA-33BBase modelgeneral 20GLLaMA-33B2.7G[🤗HF] [🤖ModelScope] [Baidu]
Chinese-Alpaca-7BInstruction-following modelinstruction 2MLLaMA-7B790M[🤗HF] [🤖ModelScope] [Baidu]
Chinese-Alpaca-13BInstruction-following modelinstruction 3MLLaMA-13B1.1G[🤗HF] [🤖ModelScope] [Baidu]
Chinese-Alpaca-33BInstruction-following modelinstruction 4.3MLLaMA-33B2.8G[🤗HF] [🤖ModelScope] [Baidu]
Chinese-Alpaca-Plus-7BInstruction-following modelinstruction 4M*LLaMA-7B &
LLaMA-Plus-7B*1.1G[🤗HF] [🤖ModelScope] [Baidu]
Chinese-Alpaca-Plus-13BInstruction-following modelinstruction 4.3M*LLaMA-13B &
LLaMA-Plus-13B*1.3G[🤗HF] [🤖ModelScope] [Baidu]
Chinese-Alpaca-Plus-33BInstruction-following modelinstruction 4.3M*LLaMA-33B &
LLaMA-Plus-33B*2.1G[🤗HF] [🤖ModelScope] [Baidu]

Use with 🤗transformers

You can download all the above models from 🤗Model Hub and use them with transformers and PEFT to invoke the Chinese LLaMA or Alpaca LoRA models. The model invocation names referred to below are the model names specified in .from_pretrained().

Detailed list and model download link: https://huggingface.co/hfl

Model Reconstruction

In order to merge the LoRA model with the original LLaMA for further tuning or inference, two methods are currently provided:

MethodUsageTutorial
Online conversionSuitable for Google Colab users, can use notebook for online conversion and model quantization.link
Manual conversionSuitable for offline conversion, generates models in different formats for quantization or further fine-tuning.link

The following is the size of each original model and 4-bit quantization. When converting the corresponding model, make sure that the machine has enough memory and disk space (minimum requirements):

7B13B33B65B
Original(FP16)13 GB24 GB60 GB120 GB
Quantized (8-bit)7.8 GB14.9 GB32.4 GB~60 GB
Quantized(4-bit)3.9 GB7.8 GB17.2 GB38.5 GB

Related documentation has been moved to the project's >>> 📚GitHub Wiki.

Quick Deployment

We mainly provide the following three ways for inference and local deployment.

MethodFeaturesPlatformCPUGPUQuantizationUITutorial
llama.cppa tool for quantizing model and deploying on local CPUGenerallink
🤗Transformersoriginal transformers inference method, support CPU/GPUGenerallink
text-generation-webuia tool for deploying model as a web UIGenerallink
LlamaChata macOS app that allows you to chat with LLaMA, Alpaca, etc.MacOSlink
LangChainLLM application development framework, suitable for secondary developmentGeneral<sup></sup><sup></sup>link
privateGPTLangChain-based multi-document QA frameworkGenerallink
Colab Gradio DemoRunning a Gradio web demo in ColabGenerallink
API CallsA server that implements OPENAI APIGenerallink

<sup></sup>: Supported by LangChain, but not implemented in the tutorial. Please refer to the official LangChain Documentation for details.

Related documentation has been moved to the project's >>> 📚GitHub Wiki.

System Performance

Generation Performance Test

In order to quickly evaluate the actual performance of related models, this project compared the effects of Chinese Alpaca-7B, Alpaca-13B, Alpaca-Plus-7B, Alpaca-Plus-13B, and Alpaca-33B on some common tasks given the same prompt. Reply generation is random and is affected by factors such as decoding hyperparameters and random seeds. The following related evaluations are not absolutely rigorous, and the test results are for reference only. Welcome to experience it yourself.

NLU Performance Test

This project also conducted tests on relevant models using the "NLU" objective evaluation dataset. The results of this type of evaluation are objective and only require the output of given labels, so they can provide insights into the capabilities of large models from another perspective. In the recently launched C-Eval dataset, this project tested the performance of the relevant models. The test set contains 12.3K multiple-choice questions covering 52 subjects. The following are the evaluation results (average) of some models on the validation and test sets. For complete results, please refer to our technical report.

ModelsValid (zero-shot)Valid (5-shot)Test (zero-shot)Test (5-shot)
Chinese-Alpaca-Plus-33B46.546.344.943.5
Chinese-Alpaca-33B43.342.641.640.4
Chinese-Alpaca-Plus-13B43.342.441.539.9
Chinese-Alpaca-Plus-7B36.732.936.432.3
Chinese-LLaMA-Plus-33B37.440.035.738.3
Chinese-LLaMA-33B34.938.434.639.5
Chinese-LLaMA-Plus-13B27.334.027.833.3
Chinese-LLaMA-Plus-7B27.328.326.928.4

It is important to note that the comprehensive assessment of the capabilities of large models is still an urgent and significant topic to address. It is beneficial to approach the various evaluation results of large models in a rational and balanced manner to promote the healthy development of large-scale model technology. It is recommended for users to conduct tests on their own tasks and choose models that are suitable for the relevant tasks.

For C-Eval inference code, please refer to >>> 📚GitHub Wiki.

Training Details

The entire training process includes three parts: vocabulary expansion, pre-training, and instruction fine-tuning. Please refer to merge_tokenizers.py for vocabulary expansion; refer to run_clm.py in 🤗transformers and the relevant parts of dataset processing in the Stanford Alpaca project for pre-training and self-instruct fine-tuning.

We have open-sourced the scripts for pre-training and instruction finetuning (SFT):

Please refer to our >>> 📚GitHub Wiki.

FAQ

FAQ provides answers to frequent questions. Please see our FAQ before submitting an issue.

Q1: Why can't you release the complete model weights?
Q2: Will there be versions of 33B, and 65B in the future?
Q3: The model doesn't perform well on some tasks!
Q4: Why expand the vocabulary? Can't you just pre-train the original LLaMA with Chinese data?
Q5: The reply is very short
Q6: Under Windows, the model cannot understand Chinese, the generation speed is very slow, etc.
Q7: Chinese-LLaMA 13B model cannot be launched with llama.cpp, reporting inconsistent dimensions.
Q8: Chinese-Alpaca-Plus does not show better performance than the others.
Q9: The model does not perform well on NLU tasks, such as text classification.
Q10: Why 33B not 30B?
Q11: Inconsistent SHA256

Please refer to our >>> 📚GitHub Wiki.

Limitations

Although the models in this project have significantly improved Chinese understanding and generation capabilities compared to the original LLaMA and Alpaca, there are also the following limitations:

  • It may produce unpredictable harmful content and content that does not conform to human preferences and values.
  • Due to computing power and data issues, the training of the related models is not sufficient, and the Chinese understanding ability needs to be further improved.
  • There is no online interactive demo available for now (Note: users can still deploy it locally themselves).

Citation

If you find the model, data, code in our project useful, please consider citing our work as follows: https://arxiv.org/abs/2304.08177

@article{chinese-llama-alpaca,
      title={Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca}, 
      author={Cui, Yiming and Yang, Ziqing and Yao, Xin},
      journal={arXiv preprint arXiv:2304.08177},
      url={https://arxiv.org/abs/2304.08177},
      year={2023}
}
Project NameDescriptionType
Chinese-LLaMA-Alpaca-2 (Official)Chinese LLaMA-2, Alpaca-2 LLMsText
Visual-Chinese-LLaMA-Alpaca (Official)Multi-modal Chinese LLaMA & Alpaca LLMsMulti-modal

Want to join this list? >>> Apply Here

Acknowledgements

This project is based on the following open-source projects for secondary development, and we would like to express our gratitude to the related projects and research and development personnel.

Foundation Models, CodesQuantization, Inference, DeploymentData
LLaMA by Facebook
Alpaca by Stanford
alpaca-lora by @tloenllama.cpp by @ggerganov
LlamaChat by @alexrozanski
text-generation-webui by @oobaboogapCLUE and translation data by @brightmart
oasst1 by OpenAssistant

Episode: The current logo is automatically generated by GPT-4 with the DALL·E plugin (previously generated by midjourney).

Disclaimer

The resources related to this project are for academic research purposes only and are strictly prohibited for commercial use. When using parts involving third-party code, please strictly follow the corresponding open-source agreements. The content generated by the model is affected by factors such as model calculation, randomness, and quantization accuracy loss. This project cannot guarantee its accuracy. For any content output by the model, this project does not assume any legal responsibility and does not assume responsibility for any losses that may result from the use of related resources and output results.

This project is initiated and maintained by individuals and collaborators in their spare time, so we cannot guarantee a timely response to resolving relevant issues.

Feedback

If you have any questions, please submit them in GitHub Issues.

  • Before submitting a question, please check if the FAQ can solve the problem and consult past issues to see if they can help.
  • Please use our dedicated issue template for submitting.
  • Duplicate and unrelated issues will be handled by stable-bot; please understand.
  • Raise questions politely and help build a harmonious discussion community.