环境配置与工具使用记录 ¶

约 551 个字 32 行代码 2 张图片预计阅读时间 3 分钟

huggingface¶

Hugging Face Forums - Hugging Face Community Discussion

OSError: We couldn‘t connect to ‘https://huggingface.co‘ to load this file, couldn‘t find it( 亲测有效 )_checkout your internet connection or see how to ru-CSDN 博客

科学上网，访问该网址通过全局代理的方式，实现模型的下载。
使用镜像网址国内huggingface镜像地址：https://hf-mirror.com/ 往下翻，直接可看到使用教程。主要有四种解决方式。最直接的方式就是一个个下载使用。
在代码中增加设置 import os os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'

Attention is All You Need 论文复现 ¶

选择 hyunwoongko/transformer: Transformer : PyTorch Implementation of "Attention Is All You Need"

问题 1 conda 环境问题 ¶

miniconda 安装

conda 换源

conda config --add channels conda-forge
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/linux-64/
conda config --set show_channel_urls yes

创建环境

conda create -n transformer python=3.10

激活环境

conda activate transformer

安装 torchtext

conda install torchtext==0.13.1

安装 torchdata

conda install torchdata==0.4.1

艰辛的 torchtext 安装旅程 - 知乎

问题 2 mkl 降级 ¶

解决 lib/python3.7/site-packages/torch/lib/libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent_python_ 皮卡兔子屋 -2048 AI 社区

遇到了/site-packages/torch/lib/libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent

对 mkl 进行降级

mkl

conda install mkl=2024.0

问题 3 libfii 库出错 ¶

ImportError: /usr/lib/x86_64-linux-gnu/libp11-kit.so.0: undefined symbol: ffi_type_pointer, version LIBFFI_BASE_7.0

Conda虚拟环境下libp11-kit.so.0: undefined symbol: ffi_type_pointer…问题解决-CSDN博客

libffi

ls -l | grep libffi
mv libffi.so.7 libffi_bak.so.7
sudo ln -s /lib/x86_64-linux-gnu/libffi.so.7.1.0 libffi.so.7
ln -s /lib/x86_64-linux-gnu/libffi.so.7.1.0 libffi.so.7
ldconfig

问题 4 datasets 库出错 ¶

ImportError: cannot import name 'load_dataset' from 'datasets' (unknown location)

datasets

pip install datasets
# 或者在 conda 环境中
conda install -c huggingface datasets

问题 5 torchtext 库出错 ¶

’No module named ‘torchtext.legacy’

问题出现原因： 1. torchtext is not compatible with new versions of Numpy 2. torchtext current version don't have "from torchtext.legacy.data import Field, BucketIterator"

解决方法：修改代码，参照Fixed 'data_loader.py' by Faizanfarhad · Pull Request #35 · hyunwoongko/transformer

问题 6 spacy 库安装 ¶

OSError: [E050] Can't find model 'de_core_news_sm'. It doesn't seem to be a Python package or a valid path to a data directory.

spacy

pip install spacy
python -m spacy download de_core_news_sm
python -m spacy download en_core_web_sm

spacy

pip install -U spacy==3.6.0
python -m spacy download en_core_web_sm
python -m spacy download zh_core_web_sm
python -m spacy download de_core_news_sm

如果网络有问题，可以直接下载后本地安装

wget https://github.com/explosion/spacy-models/releases/download/de_core_news_sm-3.6.0/de_core_news_sm-3.6.0-py3-none-any.whl
wget https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.6.0/en_core_web_sm-3.6.0-py3-none-any.whl
wget https://github.com/explosion/spacy-models/releases/download/zh_core_web_sm-3.6.0/zh_core_web_sm-3.6.0-py3-none-any.whl
pip install zh_core_web_sm-3.6.0-py3-none-any.whl de_core_news_sm-3.6.0-py3-none-any.whl en_core_web_sm-3.6.0-py3-none-any.whl

numpy 降版本

pip install numpy==1.26.4

问题 7 Multi30k 数据集下载 ¶

Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None.

wmt16_files_mmt

wget http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz

https://github.com/neychev/small_DL_repo/tree/master/datasets/Multi30k

并将三个文件放在/root/.cache/torch/text/datasets/Multi30k（本目录根据个人有所变化）

但是这一招对我并不管用。

所以我直接修改了data_loader.py，使用本地数据集进行加载

训练结果 ¶

这里使用的是 Multi30K 数据集

en2de 训练结果

de2en 训练结果

llama-factory 使用记录 ¶

模型问题，下载模型 - 如果下载出现问题，会报错

safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer

模版问题：使用llama3模版：可以在 template.py 中添加自己的对话模板。
lora 问题

ValueError: Target modules {'c_attn'} not found in the base model. Please check the target modules and try again.

改成q_proj,v_proj

--lora_target q_proj,v_proj