Monday, August 8, 2022
HomeArtificial IntelligenceA Mild Introduction to tensorflow.knowledge API

A Mild Introduction to tensorflow.knowledge API

Final Up to date on July 12, 2022

Once we construct and practice a Keras deep studying mannequin, the coaching knowledge will be offered in a number of alternative ways. Presenting the information as a NumPy array or a TensorFlow tensor is a typical one. Making a Python generator perform and let the coaching loop to learn knowledge from it’s one other approach. One more approach of offering knowledge is to make use of tf.knowledge dataset.

On this tutorial, we are going to see how we are able to use tf.knowledge dataset for a Keras mannequin. After ending this tutorial, you’ll study:

  • Tips on how to create and use tf.knowledge dataset
  • The good thing about doing so in comparison with a generator perform

Let’s get began.

A Mild Introduction to tensorflow.knowledge API
Picture by Monika MG. Some rights reserved.


This text is cut up into 4 sections; they’re:

  • Coaching a Keras Mannequin with NumPy Array and Generator Operate
  • Making a Dataset utilizing tf.knowledge
  • Making a Dataest from Generator Operate
  • Knowledge with Prefetch

Coaching a Keras Mannequin with NumPy Array and Generator Operate

Earlier than we see how the tf.knowledge API works, let’s evaluation how we normally practice a Keras mannequin.

First, we’d like a dataset. An instance is the style MNIST dataset that comes with the Keras API, which we have now 60,000 coaching samples and 10,000 take a look at samples of 28×28 pixels in grayscale and the corresponding classification label is encoded with integers 0 to 9.

The dataset is a NumPy array. Then we are able to construct a Keras mannequin for classification, and with the mannequin’s match() perform, we offer the NumPy array as knowledge.

The whole code is as follows:

Working this code will print out the next:

And in addition create the next plot of validation accuracy over the 50 epochs we skilled our mannequin:

The opposite approach of coaching the identical community is to supply the information from a Python generator perform as a substitute of a NumPy array. A generator perform is the one with a yield assertion to emit knowledge whereas the perform is working in parallel to the information shopper. A generator of the style MNIST dataset will be created as follows:

This perform is meant to be name with the syntax batch_generator(train_image, train_label, 32). It’ll scan the enter arrays in batches indefinitely. As soon as it reaches the tip of the array, it would restart from the start.

Coaching a Keras mannequin with a generator is comparable, utilizing the match() perform:

As an alternative of offering the information and label, we simply want to supply the generator because the generator will give out each. When knowledge are introduced as NumPy array, we are able to inform what number of samples are there by trying on the size of the array. Keras can full one epoch when the complete dataset is used as soon as. Nonetheless, our generator perform will emit batches indefinitely so we have to inform when an epoch is ended, utilizing the steps_per_epoch argument to the match() perform.

Whereas within the above code, we offered the validation knowledge as NumPy array, we are able to additionally use a generator as a substitute and specify validation_steps argument.

The next is the whole code utilizing generator perform, which the output is identical because the earlier instance:

Making a Dataset utilizing tf.knowledge

Given we have now the style MNIST knowledge loaded, we are able to convert it right into a tf.knowledge dataset, like the next:

This prints the dataset’s spec, as follows:

We are able to see the information is a tuple (as we handed a tuple as argument to the from_tensor_slices() perform), whereas the primary factor is in form (28,28) whereas the second factor is a scalar. Each components are saved as 8-bit unsigned integers.

If we don’t current the information as a tuple of two NumPy array once we create the dataset, we are able to additionally do it later. The next is creating the identical dataset however first create the dataset for the picture knowledge and label individually earlier than combining them:

This may print the identical spec:

The zip() perform in dataset is just like the zip() perform in Python within the sense that it matches knowledge one-by-one from a number of datasets right into a tuple.

One advantage of utilizing tf.knowledge dataset is the flexibleness in dealing with the information. Under is the whole code on how we are able to practice a Keras mannequin utilizing dataset, which the batch measurement is ready to the dataset:

That is the best use case of utilizing a dataset. If we dive deeper, we are able to see {that a} dataset is simply an iterator. Due to this fact we are able to print out every pattern in a dataset utilizing the next:

The dataset has many capabilities built-in. The batch() we used earlier than is one among them. If we create batches from dataset and print it, we have now the next:

which every merchandise we get from a batch will not be a pattern however a batch of samples. We even have capabilities similar to map(), filter(), and cut back() for sequence transformation, or concatendate() and interleave() for combining with one other dataset. There are additionally repeat(), take(), take_while(), and skip() like our acquainted counterpart from Python’s itertools module. A full record of the capabilities will be discovered from the API documentation.

Making a Dataset from Generator Operate

To date, we noticed how dataset can be utilized rather than a NumPy array in coaching a Keras mannequin. Certainly, a dataset will also be created out of a generator perform. However as a substitute of a generator perform that generates a batch as we noticed in one of many instance above, right here we make a generator perform that generates one pattern at a time. The next is the perform:

This perform randomizes the enter array by shuffling the index vector. Then it generates one pattern at a time. Not like the earlier instance, this generator will finish when the samples from the array are exhausted.

We create a dataset from the perform utilizing from_generator(). We have to present the identify of the generator perform (as a substitute of an instantiated generator) and likewise the output signature of the dataset. That is required as a result of the tf.knowledge.Dataset API can not infer the dataset spec earlier than the generator is consumed.

Working the above code will print the identical spec as earlier than:

Such a dataset is functionally equal to the dataset that we created beforehand. Therefore we are able to use it for coaching as earlier than. The next is the whole code:

Dataset with Prefetch

The true advantage of utilizing dataset is to make use of prefetch().

Utilizing a NumPy array for coaching might be the most effective in efficiency. Nonetheless, this implies we have to load all knowledge into reminiscence. Utilizing a generator perform for coaching permits us to organize one batch at a time, which the information will be loaded from disk on demand, for instance. Nonetheless, utilizing a generator perform to coach a Keras mannequin means both the coaching loop or the generator perform is working at any time. It’s not simple to make the generator perform and Keras’ coaching loop to run in parallel.

Dataset is the API that permits the generator and the coaching loop to run in parallel. When you’ve got a generator that’s computationally costly (e.g., doing picture augmentation at realtime), you’ll be able to create a dataset from such generator perform after which use it with prefetch(), as follows:

The quantity argument to prefetch() is the dimensions of the buffer. Right here we ask the dataset to maintain 3 batches in reminiscence prepared for the coaching loop to devour. At any time when a batch is consumed, the dataset API will resume the generator perform to refill the buffer, asynchronously in background. Due to this fact we are able to enable the coaching loop and the information preparation algorithm contained in the generator perform to run in parallel.

It price to say that, within the earlier part, we created a shuffling generator for the dataset API. Certainly the dataset API additionally has a shuffle() perform to do the identical however we might not wish to use it until the datset is sufficiently small to slot in reminiscence.

The shuffle() perform, identical as prefetch(), takes a buffer measurement argument. The shuffle algorithm will fill the buffer with the dataset and draw one factor randomly from it. The consumed factor will probably be changed with the subsequent factor from the dataset. Therefore we’d like the buffer as giant because the dataset itself to make a very random shuffle. We are able to display this limitation with the next snippet:

The output from the above seems to be like the next:

Which we are able to see the numbers are shuffled round its neighborhood and we by no means see giant numbers from its output.

Additional Studying

Extra concerning the tf.knowledge dataset will be discovered from its API documentation:


On this put up, you could have seen how we are able to use the tf.knowledge dataset and the way it may be utilized in coaching a Keras mannequin.

Particularly, you discovered:

  • Tips on how to practice a mannequin utilizing knowledge from NumPy array, a generator, and a dataset
  • Tips on how to create a dataset utilizing a NumPy array or a generator perform
  • Tips on how to use prefetch with dataset to make the generator and coaching loop run in parallel

Develop Deep Studying Tasks with Python!

Deep Learning with Python

 What If You Might Develop A Community in Minutes

…with just some strains of Python

Uncover how in my new E book:

Deep Studying With Python

It covers end-to-end initiatives on subjects like:

Multilayer PerceptronsConvolutional Nets and Recurrent Neural Nets, and extra…

Lastly Convey Deep Studying To

Your Personal Tasks

Skip the Teachers. Simply Outcomes.

See What’s Inside



Please enter your comment!
Please enter your name here

Most Popular