By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
World of SoftwareWorld of SoftwareWorld of Software
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Search
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
Reading: How to Customize BERT Encoders with TensorFlow Model Garden | HackerNoon
Share
Sign In
Notification Show More
Font ResizerAa
World of SoftwareWorld of Software
Font ResizerAa
  • Software
  • Mobile
  • Computing
  • Gadget
  • Gaming
  • Videos
Search
  • News
  • Software
  • Mobile
  • Computing
  • Gaming
  • Videos
  • More
    • Gadget
    • Web Stories
    • Trending
    • Press Release
Have an existing account? Sign In
Follow US
  • Privacy
  • Terms
  • Advertise
  • Contact
Copyright © All Rights Reserved. World of Software.
World of Software > Computing > How to Customize BERT Encoders with TensorFlow Model Garden | HackerNoon
Computing

How to Customize BERT Encoders with TensorFlow Model Garden | HackerNoon

News Room
Last updated: 2025/09/08 at 8:23 AM
News Room Published 8 September 2025
Share
SHARE

Content Overview

  • Learning objectives
  • Install and import
  • Install the TensorFlow Model Garden pip package
  • Import TensorFlow and other libraries
  • Canonical BERT encoder
  • Customize BERT encoder
  • Use EncoderScaffold
  • Use nlp.layers.TransformerScaffold
  • Build a new encoder

Learning objectives

The TensorFlow Models NLP library is a collection of tools for building and training modern high-performance natural language models.

The tfm.nlp.networks.EncoderScaffold is the core of this library, and lots of new network architectures are proposed to improve the encoder. In this Colab notebook, we will learn how to customize the encoder to employ new network architectures.

Install and import

Install the TensorFlow Model Garden pip package

  • tf-models-official is the stable Model Garden package. Note that it may not include the latest changes in the tensorflow_models github repo. To include latest changes, you may install tf-models-nightly, which is the nightly Model Garden package created daily automatically.
  • pip will install all models and dependencies automatically.
pip install -q opencv-python

pip install -q tf-models-official

Import Tensorflow and other libraries

import numpy as np
import tensorflow as tf

import tensorflow_models as tfm
nlp = tfm.nlp

2023-12-14 12:09:32.926415: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-14 12:09:32.926462: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-14 12:09:32.927992: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

Canonical BERT encoder

Before learning how to customize the encoder, let’s firstly create a canonical BERT enoder and use it to instantiate a bert_classifier.BertClassifier for classification task.

cfg = {
    "vocab_size": 100,
    "hidden_size": 32,
    "num_layers": 3,
    "num_attention_heads": 4,
    "intermediate_size": 64,
    "activation": tfm.utils.activations.gelu,
    "dropout_rate": 0.1,
    "attention_dropout_rate": 0.1,
    "max_sequence_length": 16,
    "type_vocab_size": 2,
    "initializer": tf.keras.initializers.TruncatedNormal(stddev=0.02),
}
bert_encoder = nlp.networks.BertEncoder(**cfg)

def build_classifier(bert_encoder):
  return nlp.models.BertClassifier(bert_encoder, num_classes=2)

canonical_classifier_model = build_classifier(bert_encoder)

canonical_classifier_model can be trained using the training data. For details about how to train the model, please see the Fine tuning bert notebook. We skip the code that trains the model here.

After training, we can apply the model to do prediction.

def predict(model):
  batch_size = 3
  np.random.seed(0)
  word_ids = np.random.randint(
      cfg["vocab_size"], size=(batch_size, cfg["max_sequence_length"]))
  mask = np.random.randint(2, size=(batch_size, cfg["max_sequence_length"]))
  type_ids = np.random.randint(
      cfg["type_vocab_size"], size=(batch_size, cfg["max_sequence_length"]))
  print(model([word_ids, mask, type_ids], training=False))

predict(canonical_classifier_model)

tf.Tensor(
[[ 0.03545166  0.30729884]
 [ 0.00677404  0.17251147]
 [-0.07276718  0.17345032]], shape=(3, 2), dtype=float32)

Customize BERT encoder

One BERT encoder consists of an embedding network and multiple transformer blocks, and each transformer block contains an attention layer and a feedforward layer.

We provide easy ways to customize each of those components via (1) EncoderScaffold and (2) TransformerScaffold.

Use EncoderScaffold

networks.EncoderScaffold allows users to provide a custom embedding subnetwork (which will replace the standard embedding logic) and/or a custom hidden layer class (which will replace the Transformer instantiation in the encoder).

Without Customization

Without any customization, networks.EncoderScaffold behaves the same the canonical networks.BertEncoder.

As shown in the following example, networks.EncoderScaffold can load networks.BertEncoder‘s weights and output the same values:

default_hidden_cfg = dict(
    num_attention_heads=cfg["num_attention_heads"],
    intermediate_size=cfg["intermediate_size"],
    intermediate_activation=cfg["activation"],
    dropout_rate=cfg["dropout_rate"],
    attention_dropout_rate=cfg["attention_dropout_rate"],
    kernel_initializer=cfg["initializer"],
)
default_embedding_cfg = dict(
    vocab_size=cfg["vocab_size"],
    type_vocab_size=cfg["type_vocab_size"],
    hidden_size=cfg["hidden_size"],
    initializer=cfg["initializer"],
    dropout_rate=cfg["dropout_rate"],
    max_seq_length=cfg["max_sequence_length"]
)
default_kwargs = dict(
    hidden_cfg=default_hidden_cfg,
    embedding_cfg=default_embedding_cfg,
    num_hidden_instances=cfg["num_layers"],
    pooled_output_dim=cfg["hidden_size"],
    return_all_layer_outputs=True,
    pooler_layer_initializer=cfg["initializer"],
)

encoder_scaffold = nlp.networks.EncoderScaffold(**default_kwargs)
classifier_model_from_encoder_scaffold = build_classifier(encoder_scaffold)
classifier_model_from_encoder_scaffold.set_weights(
    canonical_classifier_model.get_weights())
predict(classifier_model_from_encoder_scaffold)

WARNING:absl:The `Transformer` layer is deprecated. Please directly use `TransformerEncoderBlock`.
WARNING:absl:The `Transformer` layer is deprecated. Please directly use `TransformerEncoderBlock`.
WARNING:absl:The `Transformer` layer is deprecated. Please directly use `TransformerEncoderBlock`.
tf.Tensor(
[[ 0.03545166  0.30729884]
 [ 0.00677404  0.17251147]
 [-0.07276718  0.17345032]], shape=(3, 2), dtype=float32)

Customize Embedding

Next, we show how to use a customized embedding network.

We first build an embedding network that would replace the default network. This one will have 2 inputs (mask and word_ids) instead of 3, and won’t use positional embeddings.

word_ids = tf.keras.layers.Input(
    shape=(cfg['max_sequence_length'],), dtype=tf.int32, name="input_word_ids")
mask = tf.keras.layers.Input(
    shape=(cfg['max_sequence_length'],), dtype=tf.int32, name="input_mask")
embedding_layer = nlp.layers.OnDeviceEmbedding(
    vocab_size=cfg['vocab_size'],
    embedding_width=cfg['hidden_size'],
    initializer=cfg["initializer"],
    name="word_embeddings")
word_embeddings = embedding_layer(word_ids)
attention_mask = nlp.layers.SelfAttentionMask()([word_embeddings, mask])
new_embedding_network = tf.keras.Model([word_ids, mask],
                                       [word_embeddings, attention_mask])

/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/keras/src/initializers/initializers.py:120: UserWarning: The initializer TruncatedNormal is unseeded and being called multiple times, which will return identical values each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initializer instance more than once.
  warnings.warn(

Inspecting new_embedding_network, we can see it takes two inputs: input_word_ids and input_mask.

tf.keras.utils.plot_model(new_embedding_network, show_shapes=True, dpi=48)

We can then build a new encoder using the above new_embedding_network.

kwargs = dict(default_kwargs)

# Use new embedding network.
kwargs['embedding_cls'] = new_embedding_network
kwargs['embedding_data'] = embedding_layer.embeddings

encoder_with_customized_embedding = nlp.networks.EncoderScaffold(**kwargs)
classifier_model = build_classifier(encoder_with_customized_embedding)
# ... Train the model ...
print(classifier_model.inputs)

# Assert that there are only two inputs.
assert len(classifier_model.inputs) == 2

WARNING:absl:The `Transformer` layer is deprecated. Please directly use `TransformerEncoderBlock`.
WARNING:absl:The `Transformer` layer is deprecated. Please directly use `TransformerEncoderBlock`.
WARNING:absl:The `Transformer` layer is deprecated. Please directly use `TransformerEncoderBlock`.
/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/keras/src/initializers/initializers.py:120: UserWarning: The initializer TruncatedNormal is unseeded and being called multiple times, which will return identical values each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initializer instance more than once.
  warnings.warn(
[<KerasTensor: shape=(None, 16) dtype=int32 (created by layer 'input_word_ids')>, <KerasTensor: shape=(None, 16) dtype=int32 (created by layer 'input_mask')>]

Customized Transformer

Users can also override the hidden_cls argument in networks.EncoderScaffold‘s constructor employ a customized Transformer layer.

See the source of nlp.layers.ReZeroTransformer for how to implement a customized Transformer layer.

The following is an example of using nlp.layers.ReZeroTransformer:

kwargs = dict(default_kwargs)

# Use ReZeroTransformer.
kwargs['hidden_cls'] = nlp.layers.ReZeroTransformer

encoder_with_rezero_transformer = nlp.networks.EncoderScaffold(**kwargs)
classifier_model = build_classifier(encoder_with_rezero_transformer)
# ... Train the model ...
predict(classifier_model)

# Assert that the variable `rezero_alpha` from ReZeroTransformer exists.
assert 'rezero_alpha' in ''.join([x.name for x in classifier_model.trainable_weights])

tf.Tensor(
[[-0.08663296  0.09281035]
 [-0.07291833  0.36477187]
 [-0.08730186  0.1503254 ]], shape=(3, 2), dtype=float32)

Use nlp.layers.TransformerScaffold

The above method of customizing the model requires rewriting the whole nlp.layers.Transformer layer, while sometimes you may only want to customize either attention layer or feedforward block. In this case, nlp.layers.TransformerScaffold can be used.

Customize Attention Layer

User can also override the attention_cls argument in layers.TransformerScaffold‘s constructor to employ a customized Attention layer.

See the source of nlp.layers.TalkingHeadsAttention for how to implement a customized Attention layer.

Following is an example of using nlp.layers.TalkingHeadsAttention:

# Use TalkingHeadsAttention
hidden_cfg = dict(default_hidden_cfg)
hidden_cfg['attention_cls'] = nlp.layers.TalkingHeadsAttention

kwargs = dict(default_kwargs)
kwargs['hidden_cls'] = nlp.layers.TransformerScaffold
kwargs['hidden_cfg'] = hidden_cfg

encoder = nlp.networks.EncoderScaffold(**kwargs)
classifier_model = build_classifier(encoder)
# ... Train the model ...
predict(classifier_model)

# Assert that the variable `pre_softmax_weight` from TalkingHeadsAttention exists.
assert 'pre_softmax_weight' in ''.join([x.name for x in classifier_model.trainable_weights])

tf.Tensor(
[[-0.20591784  0.09203205]
 [-0.0056177  -0.10278902]
 [-0.21681327 -0.12282   ]], shape=(3, 2), dtype=float32)

tf.keras.utils.plot_model(encoder_with_rezero_transformer, show_shapes=True, dpi=48)

Customize Feedforward Layer

Similiarly, one could also customize the feedforward layer.

See the source of nlp.layers.GatedFeedforward for how to implement a customized feedforward layer.

Following is an example of using nlp.layers.GatedFeedforward:

# Use GatedFeedforward
hidden_cfg = dict(default_hidden_cfg)
hidden_cfg['feedforward_cls'] = nlp.layers.GatedFeedforward

kwargs = dict(default_kwargs)
kwargs['hidden_cls'] = nlp.layers.TransformerScaffold
kwargs['hidden_cfg'] = hidden_cfg

encoder_with_gated_feedforward = nlp.networks.EncoderScaffold(**kwargs)
classifier_model = build_classifier(encoder_with_gated_feedforward)
# ... Train the model ...
predict(classifier_model)

# Assert that the variable `gate` from GatedFeedforward exists.
assert 'gate' in ''.join([x.name for x in classifier_model.trainable_weights])

/tmpfs/src/tf_docs_env/lib/python3.9/site-packages/keras/src/initializers/initializers.py:120: UserWarning: The initializer TruncatedNormal is unseeded and being called multiple times, which will return identical values each time (even if the initializer is unseeded). Please update your code to provide a seed to the initializer, or avoid using the same initializer instance more than once.
  warnings.warn(
tf.Tensor(
[[-0.10270456 -0.10999684]
 [-0.03512481  0.15430304]
 [-0.23601504 -0.18162844]], shape=(3, 2), dtype=float32)

Build a new Encoder

Finally, you could also build a new encoder using building blocks in the modeling library.

See the source for nlp.networks.AlbertEncoder as an example of how to do this.

Here is an example using nlp.networks.AlbertEncoder:

albert_encoder = nlp.networks.AlbertEncoder(**cfg)
classifier_model = build_classifier(albert_encoder)
# ... Train the model ...
predict(classifier_model)

tf.Tensor(
[[-0.00369881 -0.2540995 ]
 [ 0.1235221  -0.2959229 ]
 [-0.08698564 -0.17653546]], shape=(3, 2), dtype=float32)

Inspecting the albert_encoder, we see it stacks the same Transformer layer multiple times (note the loop-back on the “Transformer” block below..

tf.keras.utils.plot_model(albert_encoder, show_shapes=True, dpi=48)

:::info
Originally published on the TensorFlow website, this article appears here under a new headline and is licensed under CC BY 4.0. Code samples shared under the Apache 2.0 License.

:::

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.
By signing up, you agree to our Terms of Use and acknowledge the data practices in our Privacy Policy. You may unsubscribe at any time.
Share This Article
Facebook Twitter Email Print
Share
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article I refuse to buy a Copilot+ laptop until Windows fixes this
Next Article OnePlus 15’s leaked camera spec hints at higher zoom, but with a catch
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Stay Connected

248.1k Like
69.1k Follow
134k Pin
54.3k Follow

Latest News

These hidden search tricks make my Samsung phone faster to use
News
Dell AI server revenues leap but storage waits on Project Lightning | Computer Weekly
News
B2B social media marketing: 14 strategies for leads and reach
Computing
Get the Shark Corded Upright Vacuum Cleaner on the cheap with £100 discount
Gadget

You Might also Like

Computing

B2B social media marketing: 14 strategies for leads and reach

25 Min Read
Computing

Keras Not Flexible Enough? Orbit Your Way to Better BERT Training | HackerNoon

16 Min Read
Computing

GPUGate Malware Uses Google Ads and Fake GitHub Commits to Target IT Firms

4 Min Read
Computing

Ex-Microsoft strategist running for Congress wants a ‘realistic’ approach to regulating and guiding AI policy

5 Min Read
//

World of Software is your one-stop website for the latest tech news and updates, follow us now to get the news that matters to you.

Quick Link

  • Privacy Policy
  • Terms of use
  • Advertise
  • Contact

Topics

  • Computing
  • Software
  • Press Release
  • Trending

Sign Up for Our Newsletter

Subscribe to our newsletter to get our newest articles instantly!

World of SoftwareWorld of Software
Follow US
Copyright © All Rights Reserved. World of Software.
Welcome Back!

Sign in to your account

Lost your password?