torchvision.transforms.RandomResizedCrop
1.Pytorch 原码阅读
class RandomResizedCrop(torch.nn.Module):
"""
Crop a random portion of image and resize it to a given size
If the image is torch Tensor, it is expected
to have [..., H, W] shape, where ... means an arbitrary number of leading dimensions
A crop of the original image is made: the crop has a random area (H * W)
and a random aspect ratio. This crop is finally resized to the given
size. This is popularly used to train the Inception networks.
Args:
size (int or sequence): expected output size of the crop, for each edge. If size is an
int instead of sequence like (h, w), a square output size ``(size, size)`` is
made. If provided a sequence of length 1, it will be interpreted as (size[0], size[0]).
.. note::
In torchscript mode size as single int is not supported, use a sequence of length 1: ``[size, ]``.
scale (tuple of float): Specifies the lower and upper bounds for the random area of the crop,
before resizing. The scale is defined with respect to the area of the original image.
ratio (tuple of float): lower and upper bounds for the random aspect ratio of the crop, before
resizing.
interpolation (InterpolationMode): Desired interpolation enum defined by
:class:`torchvision.transforms.InterpolationMode`. Default is ``InterpolationMode.BILINEAR``.
If input is Tensor, only ``InterpolationMode.NEAREST``, ``InterpolationMode.BILINEAR`` and
``InterpolationMode.BICUBIC`` are supported.
For backward compatibility integer values (e.g. ``PIL.Image.NEAREST``) are still acceptable.
"""
def __init__(self, size, scale=(0.08, 1.0), ratio=(3.0 / 4.0, 4.0 / 3.0), interpolation=InterpolationMode.BILINEAR):
super().__init__()
_log_api_usage_once(self)
self.size = _setup_size(size, error_msg="Please provide only two dimensions (h, w) for size.")
if not isinstance(scale, Sequence):
raise TypeError("Scale should be a sequence")
if not isinstance(ratio, Sequence):
raise TypeError("Ratio should be a sequence")
if (scale[0] > scale[1]) or (ratio[0] > ratio[1]):
warnings.warn("Scale and ratio should be of kind (min, max)")
# Backward compatibility with integer value
if isinstance(interpolation, int):
warnings.warn(
"Argument interpolation should be of type InterpolationMode instead of int. "
"Please, use InterpolationMode enum."
)
interpolation = _interpolation_modes_from_int(interpolation)
self.interpolation = interpolation
self.scale = scale
self.ratio = ratio
def get_params(img: Tensor, scale: List[float], ratio: List[float]) -> Tuple[int, int, int, int]:
"""Get parameters for ``crop`` for a random sized crop.
Args:
img (PIL Image or Tensor): Input image.
scale (list): range of scale of the origin size cropped
ratio (list): range of aspect ratio of the origin aspect ratio cropped
Returns:
tuple: params (i, j, h, w) to be passed to ``crop`` for a random
sized crop.
"""
width, height = F.get_image_size(img)
area = height * width
log_ratio = torch.log(torch.tensor(ratio))
for _ in range(10):
target_area = area * torch.empty(1).uniform_(scale[0], scale[1]).item()
aspect_ratio = torch.exp(torch.empty(1).uniform_(log_ratio[0], log_ratio[1])).item()
w = int(round(math.sqrt(target_area * aspect_ratio)))
h = int(round(math.sqrt(target_area / aspect_ratio)))
if 0 < w <= width and 0 < h <= height:
i = torch.randint(0, height - h + 1, size=(1,)).item()
j = torch.randint(0, width - w + 1, size=(1,)).item()
return i, j, h, w
# Fallback to central crop
in_ratio = float(width) / float(height)
if in_ratio < min(ratio):
w = width
h = int(round(w / min(ratio)))
elif in_ratio > max(ratio):
h = height
w = int(round(h * max(ratio)))
else: # whole image
w = width
h = height
i = (height - h) // 2
j = (width - w) // 2
return i, j, h, w
def forward(self, img):
"""
Args:
img (PIL Image or Tensor): Image to be cropped and resized.
Returns:
PIL Image or Tensor: Randomly cropped and resized image.
"""
i, j, h, w = self.get_params(img, self.scale, self.ratio)
return F.resized_crop(img, i, j, h, w, self.size, self.interpolation)
def __repr__(self) -> str:
interpolate_str = self.interpolation.value
format_string = self.__class__.__name__ + f"(size={self.size}"
format_string += f", scale={tuple(round(s, 4) for s in self.scale)}"
format_string += f", ratio={tuple(round(r, 4) for r in self.ratio)}"
format_string += f", interpolation={interpolate_str})"
return format_string
2.源码理解
def get_params(img: Tensor, scale: List[float], ratio: List[float]) -> Tuple[int, int, int, int]:
参数: size:(224,224) , scale(0.08, 1.0), ratio=(3.0/4.0, 4.0/3.0)
**size**
:
就是传入图片的大小,可以是h, w, (h,), (w,),(h,w),并且传入的img_size需要是Torch.Tensor / PIL Image
类型,一般在Transform Compose
里面使用。
**scale**
:
import torch
h=w = 224
area = h*w
scale = [0.5,0.5]
target_area = area * torch.empty(1).uniform_(scale[0], scale[1]).item()
##从scale[0]到scale[1]两个数之间的均匀分布中抽取一个数出来,再乘以原来的面积
print(area, target_area)
###结果
scale=[0.5,0.5]---->(50176, 25088.0)
#下面是默认选择的参数
scale=[0.08, 1.0] --->(50176, 39760.93994140625)
**ratio**
:
import torch
import math
ratio = (3.0/4.0, 4.0/3.0)
log_ratio = torch.log(torch.tensor(ratio))
aspect_ratio = torch.exp(torch.empty(1).uniform_(log_ratio[0], log_ratio[1])).item()
#round 返回四舍五入指
w = int(round(math.sqrt(target_area * aspect_ratio)))
h = int(round(math.sqrt(target_area / aspect_ratio)))
print('log_ratio:',log_ratio)
print('aspect_ratio:',aspect_ratio)
print('h,w:',h,w)
##结果
log_ratio: tensor([-0.2877, 0.2877])
aspect_ratio: 1.1371617317199707
h,w: 187 213
if 0 < w <= 224 and 0 < h <= 224:
i = torch.randint(0, 224 - h + 1, size=(1,)).item()
j = torch.randint(0, 224 - w + 1, size=(1,)).item()
print('i,j',i,j )
##结果
(i,j 3 21)
**interpolation**
:
如果输入是Tensor类型的话,以下三个插值mode是被允许的:1.InterpolationMode.NEAREST(最近邻插值)
2.InterpolationMode.BILINEAR(双线性插值)
3.InterpolationMode.BICUBIC(双三次线性插值)
在反向传播过程中为了兼容整数值,PIL.Image.NEAREST
也是支持的。
**Resize**
:
函数内部是通过调用F.resized_crop(img, i, j, h, w, self.size, self.interpolation)Resize到指定大小。
版权声明:本文为weixin_43118280原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。