Imagenet

ImageNetDataModule #

Bases: ImageClassificationDataModule

ImageNet datamodule.

Extracted from https://github.com/Lightning-Universe/lightning-bolts/blob/master/src/pl_bolts/datamodules/imagenet_datamodule.py - Made this a subclass of VisionDataModule

Notes:

train_dataloader uses the train split of imagenet2012 and puts away a portion of it for the validation split.
val_dataloader uses the part of the train split of imagenet2012 that was not used for training via num_imgs_per_val_class
test_dataloader uses the validation split of imagenet2012 for testing.
- TODO: need to pass num_imgs_per_class=-1 for test dataset and split="test".

name `class-attribute` `instance-attribute` #

name: str | None = 'imagenet'

Dataset name.

dataset_cls `class-attribute` #

dataset_cls: type[VisionDataset] = ImageNet

Dataset class to use.

dims `class-attribute` `instance-attribute` #

dims: tuple[C, H, W] = (
    C(3),
    H(image_size),
    W(image_size),
)

A tuple describing the shape of the data.

init #

__init__(
    data_dir: str | Path = DATA_DIR,
    *,
    val_split: int | float = 0.01,
    num_workers: int = NUM_WORKERS,
    normalize: bool = False,
    image_size: int = 224,
    batch_size: int = 32,
    seed: int = 42,
    shuffle: bool = True,
    pin_memory: bool = True,
    drop_last: bool = False,
    train_transforms: Callable | None = None,
    val_transforms: Callable | None = None,
    test_transforms: Callable | None = None,
    **kwargs
)

Creates an ImageNet datamodule (doesn't load or prepare the dataset yet).

Parameters:

Name	Type	Description	Default
`data_dir`	`str \| Path`	path to the imagenet dataset file	`DATA_DIR`
`val_split`	`int \| float`	save `val_split`% of the training data of each class for validation.	`0.01`
`image_size`	`int`	final image size	`224`
`num_workers`	`int`	how many data workers	`NUM_WORKERS`
`batch_size`	`int`	batch_size	`32`
`shuffle`	`bool`	If true shuffles the data every epoch	`True`
`pin_memory`	`bool`	If true, the data loader will copy Tensors into CUDA pinned memory before returning them	`True`
`drop_last`	`bool`	If true drops the last incomplete batch	`False`

train_transform #

train_transform() -> Module

The standard imagenet transforms.

transforms.Compose([
    transforms.RandomResizedCrop(self.image_size),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ),
])

val_transform #

val_transform() -> Compose

The standard imagenet transforms for validation.

.. code-block:: python

transforms.Compose([
    transforms.Resize(self.image_size + 32),
    transforms.CenterCrop(self.image_size),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225]
    ),
])

prepare_imagenet #

prepare_imagenet(
    root: Path,
    *,
    split: Literal["train", "val"] = "train",
    network_imagenet_dir: Path
) -> None

Custom preparation function for ImageNet, using @obilaniu's tar magic in Python form.

The core of this is equivalent to these bash commands:

mkdir -p $SLURM_TMPDIR/imagenet/val
cd       $SLURM_TMPDIR/imagenet/val
tar  -xf /network/scratch/b/bilaniuo/ILSVRC2012_img_val.tar
mkdir -p $SLURM_TMPDIR/imagenet/train
cd       $SLURM_TMPDIR/imagenet/train
tar  -xf /network/datasets/imagenet/ILSVRC2012_img_train.tar          --to-command='mkdir ${TAR_REALNAME%.tar}; tar -xC ${TAR_REALNAME%.tar}'

Imagenet

ImageNetDataModule #

name class-attribute instance-attribute #

dataset_cls class-attribute #

dims class-attribute instance-attribute #

__init__ #