By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: Beginner’s Guide to Keras Preprocessing Layers | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > Beginner’s Guide to Keras Preprocessing Layers | HackerNoon
Computing

Beginner’s Guide to Keras Preprocessing Layers | HackerNoon

News Room
Last updated: 2025/09/19 at 1:21 AM
News Room Published 19 September 2025
Share
SHARE

Content Overview

  • Keras preprocessing
  • Available preprocessing
  • Text preprocessing
  • Numerical features preprocessing
  • Categorical features preprocessing
  • Image preprocessing
  • Image data augmentation
  • The adapt() method
  • Preprocessing data before the model or inside the model
  • Benefits of doing preprocessing inside the model at inference time
  • Preprocessing during multi-worker training

Keras preprocessing

The Keras preprocessing layers API allows developers to build Keras-native input processing pipelines. These input processing pipelines can be used as independent preprocessing code in non-Keras workflows, combined directly with Keras models, and exported as part of a Keras SavedModel.

With Keras preprocessing layers, you can build and export models that are truly end-to-end: models that accept raw images or raw structured data as input; models that handle feature normalization or feature value indexing on their own.

Available preprocessing

Text preprocessing

  • tf.keras.layers.TextVectorization: turns raw strings into an encoded representation that can be read by an Embedding layer or Dense layer.

Numerical features preprocessing

  • tf.keras.layers.Normalization: performs feature-wise normalization of input features.
  • tf.keras.layers.Discretization: turns continuous numerical features into integer categorical features.

Categorical features preprocessing

  • tf.keras.layers.CategoryEncoding: turns integer categorical features into one-hot, multi-hot, or count dense representations.
  • tf.keras.layers.Hashing: performs categorical feature hashing, also known as the “hashing trick”.
  • tf.keras.layers.StringLookup: turns string categorical values into an encoded representation that can be read by an Embedding layer or Dense layer.
  • tf.keras.layers.IntegerLookup: turns integer categorical values into an encoded representation that can be read by an Embedding layer or Dense layer.

Image preprocessing

These layers are for standardizing the inputs of an image model.

  • tf.keras.layers.Resizing: resizes a batch of images to a target size.
  • tf.keras.layers.Rescaling: rescales and offsets the values of a batch of images (e.g. go from inputs in the [0, 255] range to inputs in the [0, 1] range.
  • tf.keras.layers.CenterCrop: returns a center crop of a batch of images.

Image data augmentation

These layers apply random augmentation transforms to a batch of images. They are only active during training.

  • tf.keras.layers.RandomCrop
  • tf.keras.layers.RandomFlip
  • tf.keras.layers.RandomTranslation
  • tf.keras.layers.RandomRotation
  • tf.keras.layers.RandomZoom
  • tf.keras.layers.RandomContrast

The adapt() method

Some preprocessing layers have an internal state that can be computed based on a sample of the training data. The list of stateful preprocessing layers is:

  • TextVectorization: holds a mapping between string tokens and integer indices
  • StringLookup and IntegerLookup: hold a mapping between input values and integer indices.
  • Normalization: holds the mean and standard deviation of the features.
  • Discretization: holds information about value bucket boundaries.

Crucially, these layers are non-trainable. Their state is not set during training; it must be set before training, either by initializing them from a precomputed constant, or by “adapting” them on data.

You set the state of a preprocessing layer by exposing it to training data, via the adapt() method:

import numpy as np
import tensorflow as tf
from tensorflow import keras
from keras import layers

data = np.array(
    [
        [0.1, 0.2, 0.3],
        [0.8, 0.9, 1.0],
        [1.5, 1.6, 1.7],
    ]
)
layer = layers.Normalization()
layer.adapt(data)
normalized_data = layer(data)

print("Features mean: %.2f" % (normalized_data.numpy().mean()))
print("Features std: %.2f" % (normalized_data.numpy().std()))

Features mean: -0.00
Features std: 1.00

The adapt() method takes either a Numpy array or a tf.data.Dataset object. In the case of StringLookup and TextVectorization, you can also pass a list of strings:

data = [
    "ξεῖν᾽, ἦ τοι μὲν ὄνειροι ἀμήχανοι ἀκριτόμυθοι",
    "γίγνοντ᾽, οὐδέ τι πάντα τελείεται ἀνθρώποισι.",
    "δοιαὶ γάρ τε πύλαι ἀμενηνῶν εἰσὶν ὀνείρων:",
    "αἱ μὲν γὰρ κεράεσσι τετεύχαται, αἱ δ᾽ ἐλέφαντι:",
    "τῶν οἳ μέν κ᾽ ἔλθωσι διὰ πριστοῦ ἐλέφαντος,",
    "οἵ ῥ᾽ ἐλεφαίρονται, ἔπε᾽ ἀκράαντα φέροντες:",
    "οἱ δὲ διὰ ξεστῶν κεράων ἔλθωσι θύραζε,",
    "οἵ ῥ᾽ ἔτυμα κραίνουσι, βροτῶν ὅτε κέν τις ἴδηται.",
]
layer = layers.TextVectorization()
layer.adapt(data)
vectorized_text = layer(data)
print(vectorized_text)

tf.Tensor(
[[37 12 25  5  9 20 21  0  0]
 [51 34 27 33 29 18  0  0  0]
 [49 52 30 31 19 46 10  0  0]
 [ 7  5 50 43 28  7 47 17  0]
 [24 35 39 40  3  6 32 16  0]
 [ 4  2 15 14 22 23  0  0  0]
 [36 48  6 38 42  3 45  0  0]
 [ 4  2 13 41 53  8 44 26 11]], shape=(8, 9), dtype=int64)

In addition, adaptable layers always expose an option to directly set state via constructor arguments or weight assignment. If the intended state values are known at layer construction time, or are calculated outside of the adapt() call, they can be set without relying on the layer’s internal computation. For instance, if external vocabulary files for the TextVectorization, StringLookup, or IntegerLookup layers already exist, those can be loaded directly into the lookup tables by passing a path to the vocabulary file in the layer’s constructor arguments.

Here’s an example where you instantiate a StringLookup layer with precomputed vocabulary:

vocab = ["a", "b", "c", "d"]
data = tf.constant([["a", "c", "d"], ["d", "z", "b"]])
layer = layers.StringLookup(vocabulary=vocab)
vectorized_data = layer(data)
print(vectorized_data)

tf.Tensor(
[[1 3 4]
 [4 0 2]], shape=(2, 3), dtype=int64)

Preprocessing data before the model or inside the model

There are two ways you could be using preprocessing layers:

Option 1: Make them part of the model, like this:

inputs = keras.Input(shape=input_shape)
x = preprocessing_layer(inputs)
outputs = rest_of_the_model(x)
model = keras.Model(inputs, outputs)

With this option, preprocessing will happen on device, synchronously with the rest of the model execution, meaning that it will benefit from GPU acceleration. If you’re training on a GPU, this is the best option for the Normalization layer, and for all image preprocessing and data augmentation layers.

Option 2: apply it to your tf.data.Dataset, so as to obtain a dataset that yields batches of preprocessed data, like this:

dataset = dataset.map(lambda x, y: (preprocessing_layer(x), y))

With this option, your preprocessing will happen on a CPU, asynchronously, and will be buffered before going into the model. In addition, if you call dataset.prefetch(tf.data.AUTOTUNE) on your dataset, the preprocessing will happen efficiently in parallel with training:

dataset = dataset.map(lambda x, y: (preprocessing_layer(x), y))
dataset = dataset.prefetch(tf.data.AUTOTUNE)
model.fit(dataset, ...)

This is the best option for TextVectorization, and all structured data preprocessing layers. It can also be a good option if you’re training on a CPU and you use image preprocessing layers.

Note that the TextVectorization layer can only be executed on a CPU, as it is mostly a dictionary lookup operation. Therefore, if you are training your model on a GPU or a TPU, you should put the TextVectorization layer in the tf.data pipeline to get the best performance.

When running on a TPU, you should always place preprocessing layers in the tf.data pipeline (with the exception of Normalization and Rescaling, which run fine on a TPU and are commonly used as the first layer in an image model).

Benefits of doing preprocessing inside the model at inference time

Even if you go with option 2, you may later want to export an inference-only end-to-end model that will include the preprocessing layers. The key benefit to doing this is that it makes your model portable and it helps reduce the training/serving skew.

When all data preprocessing is part of the model, other people can load and use your model without having to be aware of how each feature is expected to be encoded & normalized. Your inference model will be able to process raw images or raw structured data, and will not require users of the model to be aware of the details of e.g. the tokenization scheme used for text, the indexing scheme used for categorical features, whether image pixel values are normalized to [-1, +1] or to [0, 1], etc. This is especially powerful if you’re exporting your model to another runtime, such as TensorFlow.js: you won’t have to reimplement your preprocessing pipeline in JavaScript.

If you initially put your preprocessing layers in your tf.data pipeline, you can export an inference model that packages the preprocessing. Simply instantiate a new model that chains your preprocessing layers and your training model:

inputs = keras.Input(shape=input_shape)
x = preprocessing_layer(inputs)
outputs = training_model(x)
inference_model = keras.Model(inputs, outputs)

Preprocessing during multi-worker training

Preprocessing layers are compatible with the tf.distribute API for running training across multiple machines.

In general, preprocessing layers should be placed inside a tf.distribute.Strategy.scope() and called either inside or before the model as discussed above.

with strategy.scope():
    inputs = keras.Input(shape=input_shape)
    preprocessing_layer = tf.keras.layers.Hashing(10)
    dense_layer = tf.keras.layers.Dense(16)

For more details, refer to the Data preprocessing section of the Distributed input tutorial.

:::info
Originally published on the TensorFlow website, this article appears here under a new headline and is licensed under CC BY 4.0. Code samples shared under the Apache 2.0 License.

:::

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article This is what you need to know about USB port colors
Next Article Samsung Issued An Emergency Update For Android Galaxy Users – Here’s Why – BGR
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

Zink OpenGL-On-Vulkan Driver Tackles OpenGL Mesh Shaders
Computing
iPhone 17 buyers warned of tech glitch as new mobiles hit shelves TODAY
News
Trip.com announces plans to explore four-day workweek · TechNode
Computing
Amazon has slashed TV prices ahead of Big Deal Days 2025
Gadget

You Might also Like

Computing

Zink OpenGL-On-Vulkan Driver Tackles OpenGL Mesh Shaders

1 Min Read
Computing

Trip.com announces plans to explore four-day workweek · TechNode

1 Min Read
Computing

This app turns my phone into a distraction‑free writing device

6 Min Read
Computing

Street Wallet acquires Digitip, joins South Africa’s fintech consolidation wave

4 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?