论文地址:https://arxiv.org/pdf/2007.09384.pdf
参考:
-
官方教程:https://github.com/jiaxi-wu/MPSR/blob/master/INSTALL.md
-
【代码调试】《Multi-scale Positive Sample Refinement for Few-shot Object Detection》_薛铁钢
实验环境
-
RTX 3090*2
-
Pytorch 1.10 cuda 11.3
-
Python 3.6.5
环境配置
- 创建虚拟环境
conda create -n maskrcnn_benchmark python=3.6.5
- 安装依赖
conda install ipython pip
安装 maskrcnn_benchmark 和 coco api 的依赖
pip install ninja yacs cython matplotlib tqdm opencv-python
错误1:opencv-python 安装缓慢,属于正常现象,建议分两步安装
解决: 可视化安装过程 https://blog.csdn.net/FRIGIDWINTER/article/details/129179235
安装 torch
pip install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
创建一个文件夹,所有的 后续文件夹 都放在其中
mkdir install_dir
下载MPSR
git clone https://github.com/jiaxi-wu/MPSR.git
安装 cityscapeScripts
# install cityscapesScripts
cd install_dir
git clone https://github.com/mcordts/cityscapesScripts.git
cd cityscapesScripts/
python setup.py build_ext install
提示:由于我的疏忽,忘记安装这个,在后面构建的过程中报错 ModuleNotFoundError: No module named 'cityscapesscripts'
安装 apex
cd install_dir
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext
构建成功 输出信息
安装 maskrcnn_benchmark
cd install_dir
git clone https://github.com/facebookresearch/maskrcnn-benchmark.git
cd maskrcnn-benchmark
python setup.py build develop
报错
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "setup.py", line 68, in <module>
cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension},
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/site-packages/setuptools/__init__.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/core.py", line 148, in setup
dist.run_commands()
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
_build_ext.build_ext.run(self)
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/command/build_ext.py", line 339, in run
self.build_extensions()
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 708, in build_extensions
build_ext.build_extensions(self)
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
_build_ext.build_ext.build_extensions(self)
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/command/build_ext.py", line 448, in build_extensions
self._build_extensions_serial()
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/command/build_ext.py", line 473, in _build_extensions_serial
self.build_extension(ext)
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
_build_ext.build_extension(self, ext)
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
depends=ext.depends)
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 538, in unix_wrap_ninja_compile
with_cuda=with_cuda)
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1359, in _write_ninja_file_and_compile_objects
error_prefix='Error compiling objects for extension')
File "/home/test/anaconda3/envs/mpsr/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1683, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
解决:执行以下命令
https://github.com/facebookresearch/maskrcnn-benchmark/issues/1274
cuda_dir="maskrcnn_benchmark/csrc/cuda"
perl -i -pe 's/AT_CHECK/TORCH_CHECK/' $cuda_dir/deform_pool_cuda.cu $cuda_dir/deform_conv_cuda.cu
# You can then run the regular setup command
python setup.py build develop
最终构建成功 输出信息
准备数据集
cd MPSR
mkdir -p datasets/voc
wget https://pjreddie.com/media/files/VOCtrainval_11-May-2012.tar
wget https://pjreddie.com/media/files/VOCtrainval_06-Nov-2007.tar
wget https://pjreddie.com/media/files/VOCtest_06-Nov-2007.tar
tar xf VOCtrainval_11-May-2012.tar
tar xf VOCtrainval_06-Nov-2007.tar
tar xf VOCtest_06-Nov-2007.tar
构建基数据集合小样本数据集
bash tools/fewshot_exp/datasets/inint_fs_dataset_standard.sh
我没有使用COCO数据集,所以把tools/fewshot_exp/datasets/init_fs_dataset_standard.sh文件中的第7、8行注释掉了
报错:AttributeError: module 'torch._six' has no attribute 'PY3'
解决:https://blog.csdn.net/pangweijian/article/details/120371802
VOC 数据集进行基础训练
将 configs/fewshot/base 文件夹 和
configs/fewshot_baseline/base文件夹
3个和 VOC数据集 相关的 yaml配置文件中的WEIGHTS路径改为Resnet-101权重文件位置 【放在 MPSR/configs/ 下】:
【!!! 不要漏改,两个文件夹下】
【!!!Resnet-101权重文件 直接浏览器搜索 R-101.pkl 进行下载】
修改后配置文件如下【!!!其他无需修改】
MODEL:
META_ARCHITECTURE: "GeneralizedRCNN"
WEIGHT: "configs/R-101.pkl"
BACKBONE:
CONV_BODY: "R-101-FPN"
根据自己的配置修改tools/fewshot_exp/train_voc_base.sh文件中GPU的数量和GPU编号:
基础训练
bash tools/fewshot_exp/train_voc_base.sh
报错:AttributeError: module 'torch._six' has no attribute 'PY3'
解决:根据报错信息,找到报错的文件 将 PY3 修改为 PY37
报错: ValueError: signal number 32 out of range
解决:修改maskrcnn_benchmark/config/defaults.py文件中_C.DATALOADER.NUM_WORKERS = 0,然后重新编译:
cd MPSR
python setup.py build develop
报错:cannot import name ‘container_abcs‘ from ‘torch._six‘
解决:找到报错位置:按照 https://blog.csdn.net/weixin_42620513/article/details/122728326 进行修改
报错:UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 2: ordinal not in range(128)
解决:找到报错位置,按照https://blog.csdn.net/qq_41185868/article/details/79039604建议修改
报错:RuntimeError: Address already in use
解决:
https://github.com/facebookresearch/maskrcnn-benchmark/issues/241
https://blog.csdn.net/qq_40682833/article/details/121235537
https://blog.csdn.net/Orientliu96/article/details/104597178
在 tools/fewshot_exp/train_voc_base.sh 修改 添加 端口号 --master_port 29501
修改后命令
python -m torch.distributed.launch --nproc_per_node=$NGPUS --master_port 29501 ./tools/train_net.py --config-file ${configfile}
训练后 得到3个权重文件
- model_voc_split1_base.pth
- model_voc_split2_base.pth
- model_voc_split3_base.pth
微调训练
依旧是,根据自己配置修改tools/fewshot_exp/train_voc_standard.sh文件中GPU的数量和GPU编号
然后执行
bash tools/fewshot_exp/train_voc_standard.sh
报错:
size mismatch for roi_heads.box.predictor.cls_score.weight: copying a param with shape torch.Size([16, 1024]) from checkpoint, the shape in current model is torch.Size([21, 1024]).
size mismatch for roi_heads.box.predictor.cls_score.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([21]).
解决:https://github.com/jiaxi-wu/MPSR/issues/9
python tools/fewshot_exp/trans_voc_pretrained.py 1
python tools/fewshot_exp/trans_voc_pretrained.py 2
python tools/fewshot_exp/trans_voc_pretrained.py 3
【!!!这个问题我也遇到了,但是在运行后依然报错,最后将修改的文件恢复 未修改状态,成功运行,有点抽象】
微调结束 可以得到下面这些文件 在 fs_exp/voc_standard_results 下
评估
3 split的1/2/3/5/10 shot进行评估
python tools/fewshot_exp/cal_novel_voc.py fs_exp/voc_standard_results
结果
可以看到,结果存在一定的偏差,但还是很接近的,复现成功!!!
【!!!有人复现,使用的单卡,复现结果差距很大,建议使用两张卡进行训练】
其他
-
pytorch 历史安装 https://pytorch.org/get-started/previous-versions/
-
maskrcnn_benchmark.data.build WARNING: When using more than one image per GPU you may encounter an out-of-memory (OOM) error if your GPU does not have sufficient memory. If this happens, you can reduce SOLVER.IMS_PER_BATCH (for training) or TEST.IMS_PER_BATCH (for inference). For training, you must also adjust the learning rate and schedule length according to the linear scaling rule. See for example: https://github.com/facebookresearch/Detectron/blob/master/configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml#L14
-
CUDA out of memory. Tried to allocate 80.00 MiB (GPU 0; 9.77 GiB total capacity; 4.46 GiB already allocated; 66.69 MiB free; 4.53 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid frag pytorch: 四种方法解决RuntimeError: CUDA out of memory. Tried to allocate