Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kunlunxin] add kunlun2 llama2-7b #348

Merged
merged 38 commits into from
Dec 26, 2023
Merged
Changes from 2 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
64a1cbf
add kunlun2 llama2-7b
shenzhu1993 Dec 1, 2023
bd494ab
[kunlunxin] add kunlun2 llama2-7b
shenzhu1993 Dec 1, 2023
4405c20
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 8, 2023
76634d1
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 8, 2023
1c3aa38
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 9, 2023
ca7ef21
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 9, 2023
c1d1c3b
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 9, 2023
dd356d6
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 9, 2023
429626e
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 15, 2023
362d75b
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 15, 2023
c837e02
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 15, 2023
995ef6f
Delete training/run_benchmarks/config/cluster_conf.py
shenzhu1993 Dec 15, 2023
5451cfa
Delete training/benchmarks/llama2_7b/deepspeed/config/config_A100x1x8.py
shenzhu1993 Dec 15, 2023
6e60cce
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 15, 2023
37edf66
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 15, 2023
6bf428a
Delete training/benchmarks/llama2_7b/deepspeed/run_llama.sh
shenzhu1993 Dec 15, 2023
8545849
Delete training/benchmarks/llama2_7b/deepspeed/run_llama.sh
shenzhu1993 Dec 15, 2023
d0d9673
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 15, 2023
bb75dee
Delete training/benchmarks/llama2_7b/deepspeed/dataset/llama_dataset.py
shenzhu1993 Dec 15, 2023
2cae4ea
Delete training/benchmarks/llama2_7b/deepspeed/dataset/llama_dataset.py
shenzhu1993 Dec 15, 2023
da49983
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 20, 2023
f575bfd
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 20, 2023
476fe5f
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 20, 2023
9f82254
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 20, 2023
07a09ee
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 20, 2023
476192b
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 20, 2023
c118469
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 20, 2023
0987f8a
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 20, 2023
06c7101
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 20, 2023
1b07da4
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 20, 2023
c900dfc
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 20, 2023
301ad3f
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 20, 2023
de8fb14
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 20, 2023
4667a79
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 20, 2023
0c60298
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 20, 2023
e897eae
Merge branch 'main' into kunlunxin_llama2
shenzhu1993 Dec 20, 2023
94ada4e
Merge branch 'main' into kunlunxin_llama2
shenzhu1993 Dec 20, 2023
48e6e08
Merge branch 'kunlunxin_llama2' of https://github.com/shenzhu1993/Fla…
shenzhu1993 Dec 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 19 additions & 14 deletions training/benchmarks/llama2_7b/deepspeed/README.md
Original file line number Diff line number Diff line change
@@ -10,7 +10,12 @@ Llama 2, a collection of pretrained and fine-tuned large language models (LLMs)

## 数据准备

当前目录的data/目录下,存放着数据
参考run_benckmarks下面的config/test_conf.py文件的路径,存放着数据

## 优化策略
gradient_checkpointing
fc优化,参考/data/dataset/llama2-7b/fc_autotune_fp16.log



### 昆仑芯XPU配置与运行信息参考
@@ -24,7 +29,7 @@ Llama 2, a collection of pretrained and fine-tuned large language models (LLMs)
- OS版本:Ubuntu 20.04
- OS kernel版本: 5.4.0-26-generic
- 加速卡驱动版本:4.0.25
- Docker镜像和版本:pytorch2.0.1-cu17-ubuntu20.04:v0.01
- Docker镜像和版本:XPyTorch2.0.1-cu17-ubuntu20.04:v0.01,如有需要联系周玮获取
- 训练框架版本:xmlir
- 训练编译器版本:xacc
- 依赖软件版本:pytorch-2.0.1+cu17
@@ -37,22 +42,22 @@ Llama 2, a collection of pretrained and fine-tuned large language models (LLMs)
| 指标名称 | 指标值 | 特殊说明 |
| -------------- | ----------------------- | ------------------------------------------- |
| 任务类别 | 自然语言理解 | |
| 模型 | deepspeed-llama2-7b | |
| 数据集 | openwebtext | |
| 模型 | deepspeed-llama2-7b | |
| 数据集 | openwebtext | |
| 数据精度 | precision,见“性能指标” | 可选fp32/amp/fp16 |
| 超参修改 | fix_hp,见“性能指标” | 跑满硬件设备评测吞吐量所需特殊超参 |
| 硬件设备简称 | R300 | |
| 硬件存储使用 | memory,见“性能指标” | 通常称为“显存”,单位为GiB |
| 吞吐量 | token/p/s,见“性能指标” | 平均单卡每秒处理的token数 |
| 损失值 | loss,见“性能指标” | 训练损失值 |
| 计算使用率 | MFU,见“性能指标” | 参见PaLM论文定义 |
| 硬件设备简称 | R300 | |
| 硬件存储使用 | memory,见“性能指标” | 通常称为“显存”,单位为GiB |
| 吞吐量 | token/p/s,见“性能指标” | 平均单卡每秒处理的token数 |
| 损失值 | loss,见“性能指标” | 训练损失值 |
| 计算使用率 | MFU,见“性能指标” | 参见PaLM论文定义 |

* 性能指标

| 配置 | precision | fix_hp | tokens/p/s | loss | memory | MFU |
| ------------------- | --------- | ------------------- | -------- | ----- | ------- | ------ | ------- | --------- |
| R300单机8卡(1x8) | fp32 | bs=12,seqlength=512 | | 5.4 | | | |
| R300单机8卡(1x8) | fp32 | bs=8,seqlength=512 | | 5.4 | |
| R300单机8卡(1x8) | fp16 | bs=12,seqlength=512 | | 6.76 | | |
| 配置 | precision | fix_hp | tokens/p/s | loss | memory | MFU |
| ------------------- | --------- | ------------------- | -------- | ----- | ------- | ------ |
| R300单机8卡(1x8) | fp32 | bs=8,seqlength=512 | | 5.4 | | |
| R300单机8卡(1x8) | fp32 | bs=12,seqlength=512 | | 5.4 | | |
| R300单机8卡(1x8) | fp16 | bs=12,seqlength=512 | | 6.76 | 26G/32G | |


3 changes: 1 addition & 2 deletions training/benchmarks/llama2_7b/deepspeed/run_pretraining.py
Original file line number Diff line number Diff line change
@@ -124,8 +124,7 @@ def get_metric(texts):
dataloader = DataLoader(dataset,
sampler=sampler,
batch_size=batchsize,
num_workers=4,
pin_memory=False)
pin_memory=True)

epoch = 0
while epoch < epochs:
116 changes: 0 additions & 116 deletions training/kunlunxin/docker_image/deepspeed/Dockerfile.source

This file was deleted.

13 changes: 13 additions & 0 deletions training/run_benchmarks/config/cluster_conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
'''Cluster configs'''

# Hosts to run the benchmark. Each item is an IP address or a hostname.
HOSTS = ["10.1.2.2", "10.1.2.3", "10.1.2.4"]

# Hosts port to run the tensorflow distribution_strategy = 'multi_worker_mirrored'
HOSTS_PORTS = ["2222"]

# Master port to connect
MASTER_PORT = "29501"

# ssh connection port
SSH_PORT = "22"
8 changes: 7 additions & 1 deletion training/run_benchmarks/config/test_conf.py
Original file line number Diff line number Diff line change
@@ -19,7 +19,7 @@
# "--device=/dev/davinciX --device=/dev/davinci_manager + \
# --device=/dev/devmm_svm --device=/dev/hisi_hdc + \
# -v /usr/local/Ascend/driver -v /usr/local/dcmi -v /usr/local/bin/npu-smi"
ACCE_CONTAINER_OPT = "--gpus all"
ACCE_CONTAINER_OPT = " --gpus all"
# XXX_VISIBLE_DEVICE item name in env
# possible value of ACCE_VISIBLE_DEVICE_ENV_NAME are:
# CUDA_VISIBLE_DEVICES for nvidia, iluvatar
@@ -58,18 +58,23 @@
# "glm:pytorch_1.8:A100:1:8:1": "/raid/home_datasets_ckpt/glm/train/",
# "cpm:pytorch_1.8:A100:1:8:1": "/raid/home_datasets_ckpt/cpm/train/",

#"llama2_7b_finetune:pytorch_2.0.1:A100:1:1:1": "/raid/dataset/llama2_finetune/",
# "mobilenetv2:pytorch_1.8:A100:1:8:1": "/raid/dataset/ImageNet_1k_2012/",
# "vit:pytorch_1.13:A100:1:8:1": "/raid/dataset/ImageNet_1k_2012/",
# "efficientnet:pytorch_1.13:A100:1:8:1": "/raid/dataset/ImageNet_1k_2012/",

# "faster_rcnn:pytorch_1.8:A100:1:8:1": "/raid/dataset/fasterrcnn/coco2017/",
# "bigtransfer:pytorch_1.8:A100:1:8:1": "/raid/dataset/ImageNet_1k_2012/",

#"tacotron2:pytorch_1.13:A100:1:8:1": "/raid/dataset/tacotron2/LJSpeech/",
# "resnet50:pytorch_1.8:A100:1:8:1": "/raid/dataset/ImageNet_1k_2012/",
# "mask_rcnn:pytorch_1.8:A100:1:8:1": "/raid/dataset/maskrcnn/coco2017",

# "wav2vec2:pytorch_1.13:A100:1:8:1": "/raid/dataset/wav2vec2_data/LibriSpeech",
# "WaveGlow:pytorch_1.13:A100:1:8:1": "/raid/dataset/LJSpeech/",

# "distilbert:pytorch_1.12:A100:1:8:1": "/raid/dataset/distilbert/",

# "transformer:pytorch_1.13:A100:1:8:1": "/raid/dataset/transformer/wmt14_en_de_joined_dict",
# "swin_transformer:pytorch_1.8:A100:1:8:1": "/raid/dataset/ImageNet_1k_2012/",
# "transformer_xl:pytorch_1.8:A100:1:8:1": "/raid/dataset/transformer_xl/",
@@ -79,6 +84,7 @@
# "bert_hf:pytorch_1.13:A100:1:8:1": "/raid/dataset/bert_hf_train",
# "longformer:pytorch_1.12:A100:1:8:1": "/raid/dataset/longformer_train/",
# "detr:pytorch_1.13:A100:1:8:1": "/raid/dataset/detr/coco2017/",

# "llama1_7B:paddle_2.5.1:TP1PP1SH2SP8A10040G:1:8:1":"/raid/dataset/llama/"
# "llama1_7B:paddle_2.5.1:TP2PP1SH1SP4A10040G:1:8:1":"/raid/dataset/llama/"
# "llama1_7B:paddle_2.5.1:TP2PP1SH2SP4A10040G:1:8:1":"/raid/dataset/llama/"