merlin-dataloader 23.8.0

Last updated:

0 purchases

merlin-dataloader 23.8.0 Image
merlin-dataloader 23.8.0 Images
Add to Cart

Description:

merlindataloader 23.8.0

Merlin Dataloader




The merlin-dataloader lets you quickly train recommender models for TensorFlow, PyTorch and JAX. It eliminates the biggest bottleneck in training recommender models, by providing GPU optimized dataloaders that read data directly into the GPU, and then do a 0-copy transfer to TensorFlow and PyTorch using dlpack.
The benefits of the Merlin Dataloader include:

Over 10x speedup over native framework dataloaders
Handles larger than memory datasets
Per-epoch shuffling
Distributed training

Installation
Merlin-dataloader requires Python version 3.7+. Additionally, GPU support requires CUDA 11.0+.
To install using Conda:
conda install -c nvidia -c rapidsai -c numba -c conda-forge merlin-dataloader python=3.7 cudatoolkit=11.2

To install from PyPi:
pip install merlin-dataloader

There are also docker containers on NGC with the merlin-dataloader and dependencies included on them
Basic Usage
# Get a merlin dataset from a set of parquet files
import merlin.io
dataset = merlin.io.Dataset(PARQUET_FILE_PATHS, engine="parquet")

# Create a Tensorflow dataloader from the dataset, loading 65K items
# per batch
from merlin.dataloader.tensorflow import Loader
loader = Loader(dataset, batch_size=65536)

# Get a single batch of data. Inputs will be a dictionary of columnname
# to TensorFlow tensors
inputs, target = next(loader)

# Train a Keras model with the dataloader
model = tf.keras.Model( ... )
model.fit(loader, epochs=5)

License:

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

Customer Reviews

There are no reviews.