How to deploy Qwen-7B-Chat

This article takes building an AI dialogue robot as an example to introduce how to deploy the Qwen-7B model on Alibaba Cloud AMD CPU Cloud Server (g8a).

Background information

Qwen-7B is a 7 billion-parameter scale model of the Tongyi Qianwen large model series developed by Alibaba Cloud. Qwen-7B is a large language model based on Transformer, which can be trained on ultra-large-scale pre-training data. There are various types of pre-training data, covering a wide range, including a large number of network texts, professional books, codes, etc. At the same time, on the basis of Qwen-7B, an AI assistant Qwen-7B-Chat based on a large language model was created using an alignment mechanism.

Important:

The code of Qwen-7B-Chat is open source according to LICENSE, and you need to fill in the commercial authorization application for free. You should consciously abide by the user agreement, usage norms and relevant laws and regulations of the third-party model, and bear the relevant responsibility for the legality and compliance of the use of the third-party model.

Create an ECS instance

Go to the instance creation page.
Follow the interface prompts to complete the parameter configuration and create an ECS instance.The parameters that need attention are as follows. For the configuration of other parameters, please refer to the custom purchase instance.
- Instance: Qwen-7B-Chat requires about 30 GiB memory. In order to ensure the stability of the model operation, the instance specification needs to select at least ecs.g8a.4xlarge (64 GiB memory).
- Mirror: Alibaba Cloud Linux 3.2104 LTS 64-bit.
- Public network IP: Select to assign the public network IPv4 address, select the bandwidth billing mode according to the traffic used, and set the peak bandwidth to 100 Mbps. To speed up the download of the model.
- Disk: The operation of Qwen-7B-Chat requires downloading multiple model files, which will take up a lot of storage space. In order to ensure the smooth operation of the model, it is recommended to set the disk to 100 GiB.
Add security group rules. Add security group rules in the entry direction of the ECS instance security group and release ports 22, 443 and 7860 (used to access WebUI services). For specific operations, please refer to Adding Security Group Rules.
After the creation is completed, get the public network IP address on the ECS instance page.

Deploy Qwen-7B-Chat

Manual deployment

Step-1: Install the software required to configure the model

Connect to the ECS instance remotely.For specific operations, please refer to Logging in to Linux Instance through Password or Key Authentication.
Install the software necessary to deploy Qwen-7B-Chat. sudo yum install -y tmux git git-lfs wget curl gcc gcc-c++ autoconf tar zip unzip hwloc python38
Install Python 3.8.The Python version of the system is 3.6, which does not meet the minimum version requirements for deploying ChatGLM-6B, so Python 3.8 needs to be installed. sudo update-alternatives --config pythonDuring the run, please enter 4 to install Python 3.8.
Update the corresponding pip version of Python. sudo python -m ensurepip --upgrade sudo python -m pip install --upgrade pip
Enable Git LFS.Downloading the pre-training model requires the support of Git LFS. git lfs install

Step-2: Download the source code and model

Create a tmux session. tmuxDescriptionIt takes a long time to download the pre-training model, and the success rate is greatly affected by the network situation. It is recommended to download it in the tmux session to avoid the interruption of the download model due to the disconnection of ECS.
Download the source code of the Qwen-7B project and the pre-training model. git clone https://github.com/QwenLM/Qwen-7B.git git clone https://www.modelscope.cn/qwen/Qwen-7B-Chat.git qwen-7b-chat
Check the current directory. ls -l After the download is completed, the current directory is displayed as follows.

Step-3: Deploy the running environment

Change the pip download source.Before installing the dependency package, it is recommended that you change the pip download source to speed up the installation.
1. Create a pip folder. mkdir -p ~/.config/pip
2. Configure pip to install the mirror source. cat > ~/.config/pip/pip.conf <<EOF

index-url=http://mirrors.cloud.aliyuncs.com/pypi/simple/

trusted-host=mirrors.cloud.aliyuncs.com EOF

Dependencies required to install ZenDNN.

The subsequent installation of ZenDNN requires the use of the CPU version of PyTorch, so it is necessary to manually install the required dependencies.

pip install torch==1.13.1+cpu --extra-index-url https://download.pytorch.org/whl/cpu
pip install -r ~/Qwen-7B/requirements.txt
pip install streamlit gradio mdtex2html

Download and install ZenDNN.

The ZenDNN runtime library includes an API for basic neural network building blocks optimized for AMD CPU architecture, enabling deep learning application and framework developers to improve the deep learning reasoning performance on AMD CPU.

wget https://download.amd.com/developer/eula/zendnn/zendnn-4-1/pytorch/PT_v1.13_ZenDNN_v4.1_Python_v3.8.zip
unzip PT_v1.13_ZenDNN_v4.1_Python_v3.8.zip
cd PT_v1.13_ZenDNN_v4.1_Python_v3.8
source scripts/PT_ZenDNN_setup_release.sh

When the system returns as shown below, it means that ZenDNN has been installed.

设置环境变量OMP_NUM_THREADS和GOMP_CPU_AFFINITY。

ZenDNN运行库需要针对硬件平台显式设置环境变量OMP_NUM_THREADS和GOMP_CPU_AFFINITY。

sudo bash -c 'cat > /etc/profile.d/env.sh' << EOF
export OMP_NUM_THREADS=\$(nproc --all)
export GOMP_CPU_AFFINITY=0-\$(( \$(nproc --all) - 1 ))
EOF
source /etc/profile

Step-4: Conduct AI dialogue

Execute the following command to turn on the WebUI service. cd ~/Qwen-7B python web_demo.py -c ../qwen-7b-chat --cpu-only --server-name 0.0.0.0 --server-port 7860
- When the following information appears, it means that the WebUI service has been successfully started.
在浏览器地址栏输入http://<ECS公网IP地址>:7860，进入Web页面。
In the Input dialog box, enter the content of the dialogue and click Submit to start the AI conversation.