跳转至

环境配置与工具使用记录

551 个字 32 行代码 2 张图片 预计阅读时间 3 分钟

huggingface

Hugging Face Forums - Hugging Face Community Discussion

OSError: We couldn‘t connect to ‘https://huggingface.co‘ to load this file, couldn‘t find it( 亲测有效 )_checkout your internet connection or see how to ru-CSDN 博客

  1. 科学上网,访问该网址 通过全局代理的方式,实现模型的下载。

  2. 使用镜像网址 国内huggingface镜像地址:https://hf-mirror.com/ 往下翻,直接可看到使用教程。主要有四种解决方式。最直接的方式就是一个个下载使用。

  3. 在代码中增加设置 import os os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'

Attention is All You Need 论文复现

选择 hyunwoongko/transformer: Transformer : PyTorch Implementation of "Attention Is All You Need"

问题 1 conda 环境问题

conda 换源
conda config --add channels conda-forge
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/linux-64/
conda config --set show_channel_urls yes
创建环境
conda create -n transformer python=3.10
激活环境
conda activate transformer
安装 torchtext
conda install torchtext==0.13.1
安装 torchdata
conda install torchdata==0.4.1

艰辛的 torchtext 安装旅程 - 知乎

问题 2 mkl 降级

解决 lib/python3.7/site-packages/torch/lib/libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent_python_ 皮卡兔子屋 -2048 AI 社区

遇到了/site-packages/torch/lib/libtorch_cpu.so: undefined symbol: iJIT_NotifyEvent

mkl 进行降级

mkl
conda install mkl=2024.0

问题 3 libfii 库出错

ImportError: /usr/lib/x86_64-linux-gnu/libp11-kit.so.0: undefined symbol: ffi_type_pointer, version LIBFFI_BASE_7.0
Conda虚拟环境下libp11-kit.so.0: undefined symbol: ffi_type_pointer…问题解决-CSDN博客
libffi
ls -l | grep libffi
mv libffi.so.7 libffi_bak.so.7
sudo ln -s /lib/x86_64-linux-gnu/libffi.so.7.1.0 libffi.so.7
ln -s /lib/x86_64-linux-gnu/libffi.so.7.1.0 libffi.so.7
ldconfig

问题 4 datasets 库出错

ImportError: cannot import name 'load_dataset' from 'datasets' (unknown location)

datasets
pip install datasets
# 或者在 conda 环境中
conda install -c huggingface datasets

问题 5 torchtext 库出错

’No module named ‘torchtext.legacy’

问题出现原因: 1. torchtext is not compatible with new versions of Numpy 2. torchtext current version don't have "from torchtext.legacy.data import Field, BucketIterator"

解决方法: 修改代码,参照Fixed 'data_loader.py' by Faizanfarhad · Pull Request #35 · hyunwoongko/transformer

问题 6 spacy 库安装

OSError: [E050] Can't find model 'de_core_news_sm'. It doesn't seem to be a Python package or a valid path to a data directory.

spacy
pip install spacy
python -m spacy download de_core_news_sm
python -m spacy download en_core_web_sm
spacy
pip install -U spacy==3.6.0
python -m spacy download en_core_web_sm
python -m spacy download zh_core_web_sm
python -m spacy download de_core_news_sm
如果网络有问题,可以直接下载后本地安装
wget https://github.com/explosion/spacy-models/releases/download/de_core_news_sm-3.6.0/de_core_news_sm-3.6.0-py3-none-any.whl
wget https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.6.0/en_core_web_sm-3.6.0-py3-none-any.whl
wget https://github.com/explosion/spacy-models/releases/download/zh_core_web_sm-3.6.0/zh_core_web_sm-3.6.0-py3-none-any.whl
pip install zh_core_web_sm-3.6.0-py3-none-any.whl de_core_news_sm-3.6.0-py3-none-any.whl en_core_web_sm-3.6.0-py3-none-any.whl
numpy 降版本
pip install numpy==1.26.4

问题 7 Multi30k 数据集下载

Could not get the file at http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz. [RequestException] None.

wmt16_files_mmt
wget http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz

https://github.com/neychev/small_DL_repo/tree/master/datasets/Multi30k

并将三个文件放在/root/.cache/torch/text/datasets/Multi30k(本目录根据个人有所变化)

但是这一招对我并不管用。

所以我直接修改了data_loader.py,使用本地数据集进行加载

训练结果

这里使用的是 Multi30K 数据集

en2de 训练结果

image-20250706135450331

de2en 训练结果

image-20250706135404376

llama-factory 使用记录

  1. 模型问题,下载模型 - 如果下载出现问题,会报错

    safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
    
  2. 模版问题:使用llama3模版:可以在 template.py 中添加自己的对话模板。

  3. lora 问题

ValueError: Target modules {'c_attn'} not found in the base model. Please check the target modules and try again.
改成q_proj,v_proj
--lora_target q_proj,v_proj