DALI加速图像数据预处理

Posted on 2023-01-21 In 训练技术

DALI预处理加速

NVIDIA DALI 文档：https://docs.nvidia.com/deeplearning/dali/user-guide/docs/examples/general/data_loading/external_input.html

安装：https://docs.nvidia.com/deeplearning/dali/user-guide/docs/installation.html#pip-official-releases

1、DALI pipeline

DALI可以选择纯CPU加载和预处理或者CPU&GPU混合加载，GPU加载。

在DALI中，任何数据处理任务都有一个称为 Pipeline 的对象， Pipeline 对象是类的实例nvidia.dali.Pipeline或派生类。

可以通过以下方式定义DALI Pipeline

通过实现内部使用 DALI 运算符的函数并使用pipeline_def()装饰器对其进行装饰。
通过Pipeline直接实例化对象、构建图形并使用Pipeline.set_outputs().
通过从Pipeline类继承并覆盖Pipeline.define_graph()（这是定义 DALI Pipelines 的传统方式）

截图20230907195707

2、图像分类的pipeline示例

所有操作均在GPU上，note：使用gpu进行预处理，会占用显存，模型越大占用越多，但是GPU利用率会一直保持在100%。模型较大不推荐使用GPU加载。

使用纯CPU操作，数据处理的速度也比 torchvision快

class TrainPipeline(Pipeline):
    def __init__(self, batch_size, num_threads, device_id, data_root, img_size, n_holes, length, custom_cutout=False):
         
        super(TrainPipeline, self).__init__(batch_size, num_threads, device_id, prefetch_queue_depth=4)
        mode = 'gpu'
        self.decode = ops.decoders.Image(device='mixed')
 
        self.img_size = img_size
 
        # readers.File类似torchvision.datasets.ImageFolder，dali还有其他高阶API，可自行研究使用
        self.input = ops.readers.File(file_root=data_root, random_shuffle=True)
        # Resize
        self.resize = ops.Resize(device=mode, resize_x=int(img_size*1.2), resize_y=int(img_size*1.2))
        # Randomcrop，类似于torchvision.transforms.RandomCrop
        self.randomcrop = ops.RandomResizedCrop(device=mode, size=img_size, random_area=[0.3, 1.0])
        # CropMirrorNormalize可以实现normalize和随机水平翻转，类似于torchvision.transforms.Normalize & RandomHorizontalFlip
        self.normalize = ops.CropMirrorNormalize(device=mode, mean=[0.5*255, 0.5*255, 0.5*255],
                                                 std=[0.5*255, 0.5*255, 0.5*255])
        # 获取随机数
        self.rng1 = ops.random.Uniform()
        self.rng2 = ops.random.CoinFlip()
        # 实例化改变图片色彩的类，类似于torchvision.transforms.ColorJitter
        self.colortwist = ops.ColorTwist(device=mode)
        # 实例化旋转图像的类，类似于torchvision.transforms.RandomRotation
        self.rotate = ops.Rotate(device=mode, fill_value=0)
        # gridmask，类似于cutout这种随机遮挡块操作
        self.gridmask = ops.GridMask(device=mode)

如果需要自定义数据处理的函数，可参考一下方式。以cutout为例：cutout使用的是cpu处理了，如果是gpu处理的话，需要将numpy改成cupy，DALI原生支持的操作和数据增强挺丰富的。

class CUTOUT(object):
  
    def __init__(self, n_holes, length):
        self.n_holes = n_holes
        self.length = length
 
    def __call__(self, imgs):
        c, h, w = imgs.shape
        mask = np.ones((h, w), np.float32)
        for n in range(self.n_holes):
            y = np.random.randint(h)
            x = np.random.randint(w)
            y1 = np.clip(y - self.length // 2, 0, h)
            y2 = np.clip(y + self.length // 2, 0, h)
            x1 = np.clip(x - self.length // 2, 0, w)
            x2 = np.clip(x + self.length // 2, 0, w)
            mask[y1: y2, x1: x2] = 0.
        mask = np.expand_dims(mask, 0).repeat(c, axis=0)
        imgs = imgs * mask
 
        return imgs
 
# 然后在上面的 TrainPipeline上加上下面这行,model 是 “cpu”
    self.mask = ops.PythonFunction(device="cpu", function=CUTOUT(n_holes, length), num_outputs=1)

图像分类数据加载的时候的调用方式：其他的Iterator可以参考 https://docs.nvidia.com/deeplearning/dali/user-guide/docs/plugins/pytorch_tutorials.html

from nvidia.dali.plugin.pytorch import DALIClassificationIterator
from nvidia.dali.plugin.base_iterator import LastBatchPolicy
 
pipe_train = TrainPipeline(batch_size, num_threads, device_id, data_root, img_size, n_holes, length,custom_cutout=custom_cutout)
pipe_train.build()
         
# DALIClassificationIterator: 返回pytorch tensor 形式是 (data and label) , 即DataLoader
train_loader = DALIClassificationIterator(pipe_train, size=pipe_train.epoch_size('Reader'),last_batch_policy=LastBatchPolicy.PARTIAL, auto_reset=True)