Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
这篇paper给出了一个在高分辨率时仍能运行的transformer的backbone,并且能输出dense head.
在stage , 特征图会变成 个patches
对于高分辨率的图,可以采用如图的方式,spatially reduce key和value的维度,以控制在高分辨率时的维度.
PVTv2: Improved Baselines with Pyramid Vision Transformer
这篇paper 给出三个新的PVT改良:
- overlapping patch embedding
- Zero-padding and depth-wise convolution aided feed forward network
- Linear SRA