《Multi-Scale Positive Sample Refinement forFew-Shot Object Detection》论文复现

2023年5月20日 604点热度 0人点赞 0条评论

论文地址：https://arxiv.org/pdf/2007.09384.pdf

参考：

实验环境

RTX 3090*2
Pytorch 1.10 cuda 11.3
Python 3.6.5

环境配置

创建虚拟环境

conda create -n maskrcnn_benchmark python=3.6.5

安装依赖

conda install ipython pip

安装 maskrcnn_benchmark 和 coco api 的依赖

pip install ninja yacs cython matplotlib tqdm opencv-python

错误1：opencv-python 安装缓慢，属于正常现象，建议分两步安装

解决：可视化安装过程 https://blog.csdn.net/FRIGIDWINTER/article/details/129179235

安装 torch

pip install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

创建一个文件夹，所有的后续文件夹都放在其中

mkdir install_dir

下载MPSR

git clone https://github.com/jiaxi-wu/MPSR.git

安装 cityscapeScripts

# install cityscapesScripts
cd install_dir
git clone https://github.com/mcordts/cityscapesScripts.git
cd cityscapesScripts/
python setup.py build_ext install

提示：由于我的疏忽，忘记安装这个，在后面构建的过程中报错 ModuleNotFoundError: No module named 'cityscapesscripts'

安装 apex

cd install_dir
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext

构建成功输出信息

安装 maskrcnn_benchmark

cd install_dir
git clone https://github.com/facebookresearch/maskrcnn-benchmark.git
cd maskrcnn-benchmark

python setup.py build develop

报错

subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "setup.py", line 68, in <module>
    cmdclass={"build_ext": torch.utils.cpp_extension.BuildExtension},
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/site-packages/Cython/Distutils/old_build_ext.py", line 186, in run
    _build_ext.build_ext.run(self)
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/command/build_ext.py", line 339, in run
    self.build_extensions()
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 708, in build_extensions
    build_ext.build_extensions(self)
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/site-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
    _build_ext.build_ext.build_extensions(self)
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/command/build_ext.py", line 448, in build_extensions
    self._build_extensions_serial()
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/command/build_ext.py", line 473, in _build_extensions_serial
    self.build_extension(ext)
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
    _build_ext.build_extension(self, ext)
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/distutils/command/build_ext.py", line 533, in build_extension
    depends=ext.depends)
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 538, in unix_wrap_ninja_compile
    with_cuda=with_cuda)
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1359, in _write_ninja_file_and_compile_objects
    error_prefix='Error compiling objects for extension')
  File "/home/test/anaconda3/envs/mpsr/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1683, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

解决：执行以下命令

https://github.com/facebookresearch/maskrcnn-benchmark/issues/1274

cuda_dir="maskrcnn_benchmark/csrc/cuda"
perl -i -pe 's/AT_CHECK/TORCH_CHECK/' $cuda_dir/deform_pool_cuda.cu $cuda_dir/deform_conv_cuda.cu
# You can then run the regular setup command
python setup.py build develop

最终构建成功输出信息

准备数据集

cd MPSR
mkdir -p datasets/voc
wget https://pjreddie.com/media/files/VOCtrainval_11-May-2012.tar
wget https://pjreddie.com/media/files/VOCtrainval_06-Nov-2007.tar
wget https://pjreddie.com/media/files/VOCtest_06-Nov-2007.tar
tar xf VOCtrainval_11-May-2012.tar
tar xf VOCtrainval_06-Nov-2007.tar
tar xf VOCtest_06-Nov-2007.tar

构建基数据集合小样本数据集

bash tools/fewshot_exp/datasets/inint_fs_dataset_standard.sh

我没有使用COCO数据集，所以把tools/fewshot_exp/datasets/init_fs_dataset_standard.sh文件中的第7、8行注释掉了

报错：AttributeError: module 'torch._six' has no attribute 'PY3'

解决：https://blog.csdn.net/pangweijian/article/details/120371802

VOC 数据集进行基础训练

将 configs/fewshot/base 文件夹和

configs/fewshot_baseline/base文件夹

3个和 VOC数据集相关的 yaml配置文件中的WEIGHTS路径改为Resnet-101权重文件位置【放在 MPSR/configs/ 下】：

【！！！不要漏改，两个文件夹下】

【！！！Resnet-101权重文件直接浏览器搜索 R-101.pkl 进行下载】

修改后配置文件如下【！！！其他无需修改】

MODEL:
  META_ARCHITECTURE: "GeneralizedRCNN"
  WEIGHT: "configs/R-101.pkl"
  BACKBONE:
    CONV_BODY: "R-101-FPN"

根据自己的配置修改tools/fewshot_exp/train_voc_base.sh文件中GPU的数量和GPU编号：

基础训练

bash tools/fewshot_exp/train_voc_base.sh

报错：AttributeError: module 'torch._six' has no attribute 'PY3'
解决：根据报错信息，找到报错的文件将 PY3 修改为 PY37

报错： ValueError: signal number 32 out of range
解决：修改maskrcnn_benchmark/config/defaults.py文件中_C.DATALOADER.NUM_WORKERS = 0，然后重新编译：

cd MPSR
python setup.py build develop

报错：cannot import name ‘container_abcs‘ from ‘torch._six‘
解决：找到报错位置：按照 https://blog.csdn.net/weixin_42620513/article/details/122728326 进行修改

报错：UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 2: ordinal not in range(128)
解决：找到报错位置，按照https://blog.csdn.net/qq_41185868/article/details/79039604建议修改

报错：RuntimeError: Address already in use
解决：
https://github.com/facebookresearch/maskrcnn-benchmark/issues/241
https://blog.csdn.net/qq_40682833/article/details/121235537
https://blog.csdn.net/Orientliu96/article/details/104597178
在 tools/fewshot_exp/train_voc_base.sh 修改添加端口号 --master_port 29501
修改后命令

python -m torch.distributed.launch --nproc_per_node=$NGPUS --master_port 29501 ./tools/train_net.py --config-file ${configfile}

训练后得到3个权重文件

model_voc_split1_base.pth
model_voc_split2_base.pth
model_voc_split3_base.pth

微调训练

依旧是，根据自己配置修改tools/fewshot_exp/train_voc_standard.sh文件中GPU的数量和GPU编号

然后执行

bash tools/fewshot_exp/train_voc_standard.sh

报错：

 size mismatch for roi_heads.box.predictor.cls_score.weight: copying a param with shape torch.Size([16, 1024]) from checkpoint, the shape in current model is torch.Size([21, 1024]).
        size mismatch for roi_heads.box.predictor.cls_score.bias: copying a param with shape torch.Size([16]) from checkpoint, the shape in current model is torch.Size([21]).

解决：https://github.com/jiaxi-wu/MPSR/issues/9

python tools/fewshot_exp/trans_voc_pretrained.py 1
python tools/fewshot_exp/trans_voc_pretrained.py 2
python tools/fewshot_exp/trans_voc_pretrained.py 3

【！！！这个问题我也遇到了，但是在运行后依然报错，最后将修改的文件恢复未修改状态，成功运行，有点抽象】

微调结束可以得到下面这些文件在 fs_exp/voc_standard_results 下

评估

3 split的1/2/3/5/10 shot进行评估

python tools/fewshot_exp/cal_novel_voc.py fs_exp/voc_standard_results

结果

可以看到，结果存在一定的偏差，但还是很接近的，复现成功！！！

【！！！有人复现，使用的单卡，复现结果差距很大，建议使用两张卡进行训练】

其他

pytorch 历史安装 https://pytorch.org/get-started/previous-versions/
maskrcnn_benchmark.data.build WARNING: When using more than one image per GPU you may encounter an out-of-memory (OOM) error if your GPU does not have sufficient memory. If this happens, you can reduce SOLVER.IMS_PER_BATCH (for training) or TEST.IMS_PER_BATCH (for inference). For training, you must also adjust the learning rate and schedule length according to the linear scaling rule. See for example: https://github.com/facebookresearch/Detectron/blob/master/configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml#L14
CUDA out of memory. Tried to allocate 80.00 MiB (GPU 0; 9.77 GiB total capacity; 4.46 GiB already allocated; 66.69 MiB free; 4.53 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid frag pytorch: 四种方法解决RuntimeError: CUDA out of memory. Tried to allocate

本作品采用知识共享署名-非商业性使用 4.0 国际许可协议进行许可