pdf code

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

这篇paper给出了一个在高分辨率时仍能运行的transformer的backbone,并且能输出dense head.

image

在stage , 特征图会变成 个patches

image

对于高分辨率的图,可以采用如图的方式,spatially reduce key和value的维度,以控制在高分辨率时的维度.

PVTv2: Improved Baselines with Pyramid Vision Transformer

pdf

这篇paper 给出三个新的PVT改良:

  • overlapping patch embedding
  • Zero-padding and depth-wise convolution aided feed forward network
  • Linear SRA

image image