TensorBoard is a visualization library that enables data science practitioners to visualize various aspects of their machine learning modeling. For instance, you can use TensorBoard to:
Visualize the performance of the model. Tuning model parameters. Profile the executions of the program. For example, check the utilization of GPUs. Debug machine learning code.TensorBoard can be used with various machine learning libraries such as TensorFlow, PyTorch, Flax, and XGBoost. Let's dive in and see how to use TensorBoard with all these packages.
mlnuggets newsletterJoin the newsletter to receive the technical deep dives in your inbox.
Join delighted readers. Advantages of using TensorboardThe main advantages of using Tensorboard include:
Allows data scientists to visualize the construction of neural networks, thus driving better problem-solving.Enables tracking of the performance of machine learning models using metrics such as accuracy and log loss on training or validation sets.Easy debugging of the neural nodes.How to use TensorBoardLet's look at how you can start using TensorBoard.
How to install TensorBoardTo get started, install TensorBoard, which can be done using pip or conda.
PIP installationRun the following command on the terminal or command prompt:
pip install tensorboardAlternatively, in Jupyter Notebook:
!pip install tensorboard Conda installationOpen the Anaconda command prompt and run any of the following commands:
conda install tensorboardDocker installationIf you use a Docker image of the Jupyter Notebook server, expose the notebook's and TensorBoard's ports. To do so, run the following command:
docker run -it -p 8888:8888 -p 6006:6006 \ tensorflow/tensorflow:nightly-py3-jupytermlnuggets newsletterJoin the newsletter to receive the technical deep dives in your inbox.
Join delighted readers. Using TensorBoard with Jupyter notebooks and Google ColabTo install Jupyter Notebook, either install it using Anaconda or through pip:
pip install notebookAfter installing the Jupyter Notebook, start an instance of a notebook:
jupyter notebookIf you prefer using Google Colab, go to https://colab.research.google.com/ and create a new notebook instance.
You are now set to use TensorBoard. Run the following command in a notebook instance (Jupyter or Google Colab):
%load_ext tensorboardTo reload a TensorBoard that had been previously loaded, run:
%reload_ext tensorboardNext, set the log directory where all the logs will be stored. Logs refer to the data that will be used to generate visualizations. If you are using Jupyter Notebook in a Linux distribution, remove the existing logs:
rm -rf ./logs/If you are using Google Colab.
!rm -rf ./logs/For users running TensorBoard from a Jupyter Notebook on a Windows machine, run the following code:
#rm -rf ./logs/#for windowsimport shutiltry:shutil.rmtree('logs')except:pass#for windowsimport shutiltry:shutil.rmtree('logsx')except:passNow create a directory where you can store the logs.
log_dir = "logs/model_fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")Adding a datetime enables the storage and comparison of logs at different run times.
mlnuggets newsletterJoin the newsletter to receive the technical deep dives in your inbox.
Join delighted readers. How to run TensorBoardTo demonstrate model visualization in Tensorboard, including metrics, consider the iris data classification problem, which involves classifying iris plants into three classes.
First, load the TensorBoard extension:
%load_ext tensorboardThen define the model:
import datetimefrom sklearn.preprocessing import normalizeimport numpy as npfrom sklearn import datasetsimport tensorflow as tf#LOaD DATAiris = datasets.load_iris()X = iris.datay = iris.target#normalizeX = normalize(X,axis = 0)#Neural network modulefrom keras.models import Sequential import kerasfrom keras.layers import Dense,Activation,Dropoutimport tensorflowfrom tensorflow.keras.layers import BatchNormalization from keras.utils import np_utilsimport osfrom sklearn.model_selection import train_test_split# Load the iris datasetiris = datasets.load_iris()X = iris.datay = iris.target# Create training and test split'''70% -- train y30% -- test y'''X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=42)X_train1, X_test1, y_train1, y_test1 = X_train, X_test, y_train, y_test# Create categorical labelsy_train = np_utils.to_categorical(y_train)y_test = np_utils.to_categorical(y_test)def create_model():# Create the modelmodel = keras.models.Sequential()model.add(Dense(512, activation='relu', input_shape=(4,)))model.add(Dense(3, activation='softmax'))# Compile the modelmodel.compile(optimizer='rmsprop',loss='categorical_crossentropy',metrics=['accuracy'])return modelcreate_model()Create and compile the model.
logdir = os.path.join("logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)def train_model():'''utility function for training the model'''model = create_model()tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)model.fit(x=X_train, y=y_train, epochs=10, validation_data=(X_test, y_test), callbacks = tensorboard_callback)## Get the accuracy of test data set#test_loss, test_acc = model.evaluate(X_test, y_test)## Print the test accuracy#print('Test Accuracy: ', test_acc, '\nTest Loss: ', test_loss)tf.debugging.experimental.enable_dump_debug_info("/tmp/tfdbg2_logdir",tensor_debug_mode="FULL_HEALTH",circular_buffer_size=-1)#train the modeltrain_model()Now load the TensorBoard notebook extension and define a variable log_folder that points to the logs folder that you had created.
%load_ext tensorboard log_folder = 'logs'How to use TensorBoard callbackA callback is an object that carries out operations over various stages of training, such as:
At the end of an epoch. Before or after a specified number of batches.on_batch_begin – when a batch begins.on_batch_end – when a batch ends. on_train_begin – when training begins. on_train_end – when training ends.TensorBoard callback creates a log for the TensorBoard, including:
Plots summarizing metrics.Training graph visualization.Weight histograms. Sampled profiling.When used in Model.evaluate, additional components apart from epochs, there will be summaries that show the distribution of evaluation metrics vs Model.optimizer.iterations. Metrics are prepended with the corresponding evaluation, with model.optimizer.iterations being the step in the visualized TensorBoard.
Import TensorBoard.
from tensorflow.keras.callbacks import TensorBoardCreate the TensorBoard callback.
logdir = os.path.join("logs", datetime.datetime.now().strftime("%Y%m%d-%H%M%S")) tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq = 1, write_graph = False,write_images = False)You can include other parameters such as:
write_graph – specifies whether to visualize a graph in TensorBoard. It results in a larger log file if set to True.write_images – specifies whether to write model weights to visualize as an image in TensorBoard.write_steps_per_second – specifies whether to log training steps per second into TensorBoard. Can be used with either epoch or batch frequency logging.update_freq – batch or epoch or integer. When using batch, losses and metrics are written to TensorBoard after each batch. Similar to when epoch is specified. If you specify integer , for example, 1000, the metrics and losses are saved to TensorBoard every 1000 batches.profile_batch – a non-negative integer or tuple of integers that profiles a batch(es) to sample compute characteristics. Profiling is disabled by default.embeddings_freq – frequency (in epochs) at which embedding layers are visualized. If set to 0, there is no visualization of the embeddings.embeddings_metadata – a dictionary that maps embedding layer names to the filename where the metadata for the embedding layer is saved. A single filename can be passed if the same metadata file is used for all embedding layers.histogram_freq – the frequency at which to compute activation and weight histograms for layers of the trained model.The next step involves compiling and fitting the model using the callbacks, which will store information in the logs.
from tensorflow import keraslogdir = "logs/scalars/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")file_writer = tf.summary.create_file_writer(logdir + "/metrics")file_writer.set_as_default()tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir, histogram_freq=1)def train_model():'''utility function for training the model'''model = create_model()tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)model.fit(x=X_train, y=y_train, epochs=20, validation_data=(X_test, y_test), callbacks = [tensorboard_callback])## Get the accuracy of test data set#test_loss, test_acc = model.evaluate(X_test, y_test)## Print the test accuracy#print('Test Accuracy: ', test_acc, '\nTest Loss: ', test_loss)train_model()
Once the model is trained, the next step is to visualize it using TensorBoard. For that, data from the logs stored through callbacks will be used.
mlnuggets newsletterJoin the newsletter to receive the technical deep dives in your inbox.
Join delighted readers. How to launch TensorBoardYou can launch the TensorBoard extension via the command prompt.
tensorboard --logdir logsAlternatively, you can launch TensorBoard in Jupyter Notebook or Google Colab:
%tensorboard --logdir logsThe TensorBoard can also be accessed through the local host http://localhost:6006 or http://127.0.0.1:6006/
If you have set everything right, you will see a window with interactive functionality like the one shown below.
Running TensorBoard remotelyIt is common practice to experiment remotely on a server with GPUs, especially when the Tensorflow model requires a lot of computational resources. To use TensorBoard on a remote server:
Initiate an SSH to access the TensorBoard web user interface. On the command prompt, run:ssh -L 6006:127.0.0.1:6006 username@server_ipIf you are using PuTTY, you will need to replace ssh in the command with PuTTY to create an ssh tunnel on port 6006 from the local machine to port 6006 on the server that you connected to with SSH. The tunnel you have created will stay open while the SSH connection is active.
2. Next, from the browser, you can access TensorBoard through http://localhost:6006 or http://127.0.0.1:6006/
However, sometimes you need to contact the server and then use the contact to connect to the server GPU. In such a case, you will add an extra step to the transfer port:
Transfer port from contact server to the local machine using SSH. In your local machine:ssh -L 6006:127.0.0.1:6006 username@contact_server_ip2. Transfer the port from the GPU server to the contact. Your server:
ssh -L 6006:127.0.0.1:6006 username@GPU_server_ip3. Now start the TensorBoard on the GPU server.
tensorboard --logdir = './tensorboard_dirs' --port = 6006 mlnuggets newsletterJoin the newsletter to receive the technical deep dives in your inbox.
Join delighted readers. TensorBoard dashboardsAs shown in the Sample dashboard earlier, various components are included in a single dashboard. These components include:
TensorBoard scalars.Images. Graphs.Distributions.Histograms.Fairness indicators.What-If Tool (WIT).Each of these components provides information regarding the model, as illustrated below.
TensorBoard scalarsThe TensorBoard scalars dashboard visualizes scalar statistics such as classification accuracy, model loss, or learning rate.
from tensorflow import keraslogdir = "logs/scalars/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")file_writer = tf.summary.create_file_writer(logdir + "/metrics")file_writer.set_as_default()tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir, histogram_freq=1)def train_model():'''utility function for training the model'''model = create_model()tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)model.fit(x=X_train, y=y_train, epochs=20, validation_data=(X_test, y_test), callbacks = [tensorboard_callback])## Get the accuracy of test data set#test_loss, test_acc = model.evaluate(X_test, y_test)## Print the test accuracy#print('Test Accuracy: ', test_acc, '\nTest Loss: ', test_loss)train_model()Load TensorBoard.
%tensorboard --logdir logs/scalars
You can also include custom scalars. For instance, if you want to have a custom learning rate that decreases as epochs increase, you can define a function as shown below.
def lr_schedule(epoch):"""Returns a custom learning rate that decreases as epochs progress."""learning_rate = 0.2if epoch > 10:learning_rate = 0.02if epoch > 20:learning_rate = 0.01if epoch > 50:learning_rate = 0.005tf.summary.scalar('learning rate', data=learning_rate, step=epoch)return learning_ratelr_callback = keras.callbacks.LearningRateScheduler(lr_schedule)tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)def train_model(epochs = 20):'''utility function for training the model'''model = create_model()tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)model.fit(x=X_train, y=y_train, epochs=epochs, validation_data=(X_test, y_test), callbacks = [tensorboard_callback, lr_callback])## Get the accuracy of test data set#test_loss, test_acc = model.evaluate(X_test, y_test)## Print the test accuracy#print('Test Accuracy: ', test_acc, '\nTest Loss: ', test_loss)train_model(4) Next, load TensorBoard.
%tensorboard --logdir logs/scalarsNotice that now you have a new scalar output– learning rate
mlnuggets newsletterJoin the newsletter to receive the technical deep dives in your inbox.
Join delighted readers. TensorBoard imagesTensorboard allows you to display images using tf.summary and tf.summary.image. Consider the case of the popular MNIST dataset. You can display the image as shown below.
#import librariesimport itertoolsimport datetimeimport ioimport tensorflow as tffrom tensorflow import kerasimport matplotlib.pyplot as pltimport numpy as npimport sklearn.metricsimport shutiltry:shutil.rmtree('logsx')except:passfashion_mnist = keras.datasets.fashion_mnist(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()# Names of the integer classesclass_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']# Reshape the image for the Summary API.img = np.reshape(train_images[100], (-1, 28, 28, 1))# Sets up a timestamped log/images directory.logdir = "logsx/images/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")# Creates a file writer for the log directory.file_writer = tf.summary.create_file_writer(logdir)# Using the file writer, log the reshaped imagewith file_writer.as_default():tf.summary.image("Image data", img, step=0)%reload_ext tensorboard%tensorboard --logdir logsx/images
You can also display multiple images using the max_outputs. The max_outputsargument specifies the number of images you want to visualize.
with file_writer.as_default():# Don't forget to reshape.images = np.reshape(train_images[50:53], (-1, 28, 28, 1))tf.summary.image("Plotting multiple images", images, max_outputs=3, step=0)%tensorboard --logdir logsx/images
TensorBoard graphsThe graphing component of TensorBoard can be helpful in model debugging. To see the graph from TensorBoard, click on the GRAPHS tab in the upper pane. From the upper left corner, select your preferred run. You can view the model and align it with your desired design.

You will notice options like an op-level graph that gives you insight into how to change your model. Turning on the trace inputs node option shows the upstream dependencies of that node.
mlnuggets newsletterJoin the newsletter to receive the technical deep dives in your inbox.
Join delighted readers. TensorBoard distributionsDeep neural network models(DNN) are made up of many layers. Each layer of a DNN comprises biases and weights. Distributions display the distribution of the biases and weights.
TensorBoard histogramsTensorBoard histograms are a collection of values aggregated by frequency. TensorBoard histograms visualize weights over time. Hence, they help establish whether there is something wrong with weights initialization or the learning rate. Histograms are located in the HISTOGRAM tab.

You can specify the histogram mode as either OVERLAY:
Overlay histogram modeor OFFSET histogram:
Offset histogram modeAs shown, Histograms display similar information as the Distributions but as a 3-D histogram changing across various iterations.
mlnuggets newsletterJoin the newsletter to receive the technical deep dives in your inbox.
Join delighted readers. Fairness indicatorsRegardless of how much care has been taken during the model implementation and evaluation process, bias can happen at various stages in the model pipeline.
Therefore, it is essential to evaluate the model for human bias across all the steps. In Tensorboard, the Fairness Indicators enable developers to evaluate fairness metrics, such as False Positive Rate (FPR) and False Negative Rate (FNR), for binary and multi-class classification and regression models.
Install the Fairness Indicators plugin:
pip install --upgrade pip pip install fairness_indicators pip install tensorboard-plugin-fairness-indicatorsYou will need to restart the kernel for the plugin to be included in TensorBoard. The Fairness Indicators widget can be accessed from the dialog box:
fairness-indicators/Fairness_Indicators_Example_Colab.ipynb at master · tensorflow/fairness-indicatorsTensorflow’s Fairness Evaluation and Visualization Toolkit - fairness-indicators/Fairness_Indicators_Example_Colab.ipynb at master · tensorflow/fairness-indicatorsWhen building machine learning models, developers are often concerned with understanding when the model underperforms or performs well. The What-If Tool (WIT) comes in handy when you are interested in:
Counterfactual reasoning.Investigating decision boundaries.Explore how general changes to data points affect predictions. Simulating various realities to determine how a model behaves from the tool's widget visual interface.In Tensorboard, the What-If Tool can be configured from the dialog box. After opening the What If widget, you need to provide:
The host and port of the model server.The name of the model being served. The type of model. The path to where you stored the TFRecords file to load.Next, click Accept. The tool will do the rest and return the results.
Displaying data in TensorBoardVarious data formats are supported for logging and visualization in Tensorboard, including scalars, images, audio, histograms, and graphs.
Using the TensorBoard embedding projectorTensorBoard's projector facilitates easy interpretation and understanding of embeddings. By visualizing the high-dimensional embeddings, you understand the connection of embedding layers. This guide will consider a simple example of vectors and metadata. You will use the SummaryWriter to write the embedding by creating an instance and adding an embedding.
Delete previous logs.
!rm -rf runsCreate some vectors and metadata.
%load_ext tensorboardimport numpy as npimport tensorflow as tfimport tensorboard as tbtf.io.gfile = tb.compat.tensorflow_stub.io.gfile#install pytorch#!pip install torchfrom torch.utils.tensorboard import SummaryWritervectors = np.array([[0,0,1], [0,1,0], [1,0,0], [1,1,1], [1,0,1]])metadata = ['001', '010', '100', '111', '101'] # labelswriter = SummaryWriter()writer.add_embedding(vectors, metadata)writer.close()%tensorboard --logdir=runsLoad the TensorBoard dashboard and navigate to the Projector window.
%tensorboard --logdir logs/train_data
mlnuggets newsletterJoin the newsletter to receive the technical deep dives in your inbox.
Join delighted readers. Plot training examples with TensorBoardBefore fitting the training model, you can visualize training data as shown below.
from tensorflow import keras#clear previous logs!rm -rf logs/train_data# Download the mnist data. The data is already divided into train and test.# The labels are integers representing classes.handwriting_mnist = keras.datasets.mnist(train_images, train_labels), (test_images, test_labels) = \handwriting_mnist.load_data()logdir = "logs/train_data/"file_writer = tf.summary.create_file_writer(logdir)import numpy as npwith file_writer.as_default():images = np.reshape(train_images[50:53], (-1, 28, 28, 1))tf.summary.image("3 Digits", images, max_outputs=3, step=0)Load TensorBoard.
%tensorboard --logdir logs/train_data
Visualize images in TensorBoardInstead of tensors, you might consider plotting arbitrary images in TensorBoard. To demonstrate this, consider the MNIST dataset.
# Clear out prior logging data.!rm -rf logs/plotslogdir = "logs/plots/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")file_writer = tf.summary.create_file_writer(logdir)# class namesclass_names = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']def plot_to_image(figure):"""Converts the matplotlib plot specified by 'figure' to a PNG image and returns it. The supplied figure is closed and inaccessible after this call."""# Save the plot to a PNG in memory.buf = io.BytesIO()plt.savefig(buf, format='png')# Closing the figure prevents it from being displayed directly inside# the notebook.plt.close(figure)buf.seek(0)# Convert PNG buffer to TF imageimage = tf.image.decode_png(buf.getvalue(), channels=4)# Add the batch dimensionimage = tf.expand_dims(image, 0)return imagedef image_grid():""" Return a 5x5 grid of the MNIST images as a matplotlib figure."""# Create a figure to contain the plot.figure = plt.figure(figsize=(10,10))for i in range(25):# Start next subplot.plt.subplot(5, 5, i + 1, title=class_names[train_labels[i]])plt.xticks([])plt.yticks([])plt.grid(False)plt.imshow(train_images[i], cmap=plt.cm.binary)return figure# Prepare the plotfigure = image_grid()# Convert to image and logwith file_writer.as_default():tf.summary.image("Image data", plot_to_image(figure), step=0)%tensorboard --logdir logs/plots
Displaying text data in TensorBoardUsing the TensorFlow Text Summary API, you can log textual data and visualize it in TensorBoard.
#define text to logyour_text = "This is some text in TensorBoard!"# Remove prior log data.!rm -rf logs# Sets up a timestamped log directory.logdir = "logs/text_basics/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")#log the writer to the logs directory.file_writer = tf.summary.create_file_writer(logdir)# Using the file writer, log the text.with file_writer.as_default(): tf.summary.text("TensorBoard Text", your_text, step=0)Reload TensorBoard from the logs in the logs directory.
%tensorboard --logdir logs
mlnuggets newsletterJoin the newsletter to receive the technical deep dives in your inbox.
Join delighted readers. Log confusion matrix to TensorBoardYou can log a confusion matrix and display the results as images. Sticking to the MNIST fashion dataset, log the confusion matrix as follows.
Clear previous logs
!rm -rf logsDownload and prepare the data.
#Importing Dataset# downloading the datasetfashion_mnist = keras.datasets.fashion_mnist(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()# all the classesclass_names = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat','Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot')#train modelmodel = keras.models.Sequential([keras.layers.Flatten(input_shape=(28, 28)),keras.layers.Dense(512, activation='relu'),keras.layers.Dense(256, activation='relu'),keras.layers.Dense(128, activation='relu'),keras.layers.Dense(64, activation='relu'),keras.layers.Dense(32, activation='relu'),keras.layers.Dense(10, activation='softmax')])model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])Create the function to log the confusion matrix using the LambdaCallback.
from tensorflow import keras# Clearing out prior logging data.!rm -rf logs/imagedef plot_confusion_matrix(cm, class_names):figure = plt.figure(figsize=(8, 8))plt.imshow(cm, interpolation='nearest', cmap=plt.cm.Blues)plt.title("Confusion Matrix of the Results")plt.colorbar()tick_marks = np.arange(len(class_names))plt.xticks(tick_marks, class_names, rotation=90)plt.yticks(tick_marks, class_names)labels = np.around(cm.astype('float') / cm.sum(axis=1)[:, np.newaxis], decimals=2)threshold = cm.max() / 2.for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):color = "white" if cm[i, j] > threshold else "black"plt.text(j, i, labels[i, j], horizontalalignment="center", color=color)plt.tight_layout()plt.ylabel('Real Class')plt.xlabel('Predicted Class')return figurelogdir = "logs/image/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")# Defining the basic TensorBoard callback.tensorboard_callback = keras.callbacks.TensorBoard(log_dir=logdir)file_writer_cm = tf.summary.create_file_writer(logdir + '/cm')def log_confusion_matrix(epoch, logs):# Using the model to predict the values from the validation dataset.test_pred_raw = model.predict(test_images)test_pred = np.argmax(test_pred_raw, axis=1)# Calculating the confusion matrix.cm = sklearn.metrics.confusion_matrix(test_labels, test_pred)figure = plot_confusion_matrix(cm, class_names=class_names)cm_image = plot_to_image(figure)with file_writer_cm.as_default():tf.summary.image("Confusion Matrix", cm_image, step=epoch)# Defining the per-epoch callback.cm_callback = keras.callbacks.LambdaCallback(on_epoch_end=log_confusion_matrix)Train the model with the TensorFlow callback.
# Training the classifier.model.fit(train_images,train_labels,epochs=2,verbose=0,callbacks=[tensorboard_callback, cm_callback],validation_data=(test_images, test_labels),)Load TensorBoard with the confusion matrix logs.
# Starting TensorBoard.%tensorboard --logdir logs/image
mlnuggets newsletterJoin the newsletter to receive the technical deep dives in your inbox.
Join delighted readers. Hyperparameter tuning with TensorBoardModels are built with hyperparameters that influence the functionality of the model. You select the hyperparameters for optimization during modeling before settling for the 'best' model.
Some of these hyperparameters include number of epochs, dropout rate, or learning rate. Optimizing the selected hyperparameter is known as hyperparameter optimization or tuning. The goal is to improve the performance of the model.
To conduct hyperparameter tuning in TensorBoard, use the hparams plugin from Tensorboard. Consider the iris data classification problem.
Clear earlier logs.
#for kali#rm -rf ./logs/#for windowsimport shutiltry:shutil.rmtree('logs')except:pass#for windowsimport shutiltry:shutil.rmtree('logsx')except:passReload TensorBoard.
%reload_ext tensorboardDefine the hyperparameters you want to optimize and the data to train the model.
## Create hyperparametersHP_NUM_UNITS=hp.HParam('num_units', hp.Discrete([ 5, 10]))HP_DROPOUT=hp.HParam('dropout', hp.RealInterval(0.1, 0.2))HP_LEARNING_RATE= hp.HParam('learning_rate', hp.Discrete([0.001, 0.0005, 0.0001]))HP_OPTIMIZER=hp.HParam('optimizer', hp.Discrete(['adam', 'sgd', 'rmsprop']))METRIC_ACCURACY='accuracy'Set configuration files and store them in the logs directory.
'''Set configuration log files'''log_dir ='logs/fit/' + datetime.datetime.now().strftime('%Y%m%d-%H%M%S')with tf.summary.create_file_writer(log_dir).as_default():hp.hparams_config(hparams=[HP_NUM_UNITS, HP_DROPOUT, HP_OPTIMIZER, HP_LEARNING_RATE],metrics=[hp.Metric(METRIC_ACCURACY, display_name='Accuracy')],) Fit the models and include the log for metrics and hyperparameters.
def create_model(hparams):# Create the modelmodel = keras.models.Sequential()model.add(Dense(512, activation='relu', input_shape=(4,)))model.add(Dense(3, activation='softmax'))#setting the optimizer and learning rateoptimizer = hparams[HP_OPTIMIZER]learning_rate = hparams[HP_LEARNING_RATE]if optimizer == "adam":optimizer = tf.optimizers.Adam(learning_rate=learning_rate)elif optimizer == "sgd":optimizer = tf.optimizers.SGD(learning_rate=learning_rate)elif optimizer=='rmsprop':optimizer = tf.optimizers.RMSprop(learning_rate=learning_rate)else:raise ValueError("unexpected optimizer name: %r" % (optimizer_name,))# Comiple the mode with the optimizer and learninf rate specified in hparamsmodel.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])#Fit the model model.fit(X_train, y_train, epochs=1, callbacks=[tf.keras.callbacks.TensorBoard(log_dir), # log metricshp.KerasCallback(log_dir, hparams),# log hparams]) # Run with 1 epoch to speed things up for demo purposes_, accuracy = model.evaluate(X_test, y_test)return accuracydef run(run_dir, hparams): with tf.summary.create_file_writer(run_dir).as_default():hp.hparams(hparams) # record the values used in this trialaccuracy = create_model(hparams)#converting to tf scalaraccuracy= tf.reshape(tf.convert_to_tensor(accuracy), []).numpy()tf.summary.scalar(METRIC_ACCURACY, accuracy, step=1)All that remains is running the experiments and logging the metrics and the hyperparameters.
session_num = 0for num_units in HP_NUM_UNITS.domain.values: for dropout_rate in (HP_DROPOUT.domain.min_value, HP_DROPOUT.domain.max_value):for optimizer in HP_OPTIMIZER.domain.values:for learning_rate in HP_LEARNING_RATE.domain.values: hparams = { HP_NUM_UNITS: num_units, HP_DROPOUT: dropout_rate, HP_OPTIMIZER: optimizer, HP_LEARNING_RATE: learning_rate, } run_name = "run-%d" % session_num print('--- Starting trial: %s' % run_name) print({h.name: hparams[h] for h in hparams}) run('logs/hparam_tuning/' + run_name, hparams) session_num += 1Now load TensorBoard. All the model runs and their performance can be accessed from the HPARAMS tab in the upper pane.
%tensorboard --logdir logs/hparam_tuningYou can view the results from Table View which shows the experiment runs. Each row shows the value of the underlying hyper-parameter that was being optimized and the corresponding accuracy.

You can also view the results as Parallel Coordinates View which shows each experiment as a line moving through an axis for each hyper-parameter and the accuracy metric. You can hover over a coordinate to view the hyper-parameters and the accuracy metric.

The Scatter Plot View shows the distribution of the hyper-parameters vs. the metrics.
mlnuggets newsletterJoin the newsletter to receive the technical deep dives in your inbox.
Join delighted readers. TensorFlow ProfilerThe TensorFlow Profiler tool facilitates CPU operations recording, and the CUDA kernel launches on GP. The information can be visualized in TensorBoard and provides a quick analysis of the performance bottleneck.
To get started, install the plugin.
pip install -U tensorboard-plugin-profileNext, create a TensorBoard callback specifying the batches that will be profiled using the profile_batch argument. Going back to the Iris classification problem.
#directory to store profileslog_dir ='logs/profile/' + datetime.datetime.now().strftime('%Y%m%d-%H%M%S') callbacks = [tf.keras.callbacks.TensorBoard(log_dir=log_dir, profile_batch='10,50')] #set profile batches ''' set X_train, and y_train from the iris data iris = datasets.load_iris() X = iris.data y = iris.target '''model = create_model()model.fit(X_train, y_train, epochs=10, validation_split=0.2, callbacks=callbacks)Load TensorBoard and go to Profile in the dialog box to view the captured profile.
%tensorboard --logdir logs/profile
Overview pageThe overview_page provides information related to the performance summary of the GPU and CPU, Run Environment, and Step-time graph, which shows the distribution of the step time during training and testing of the model based on various aspects such as but not limited to Compilation, Output, Input, etcetera., and Recommendations for Next Steps.
Performance Summary shows the information on:
The time taken during various processes including Compilation, Output, Input, Kernel Launch, Host Compute, Device Collection Communication, Device to Device, and Device Compute.TF Op Placement.Op Time Spent on eager executionDevice compute precision
Run Environment provides information on the system where Profiling was conducted. For instance, the environment used for this guide includes one host on a CPU device.

The Step Time graph displays the device step time over all the steps that have been sampled. It shows all the time components included in the performance summary but over different train and test processes.

Another vital tool included in the profiler is the recommendations for the next step, which contain suggestions on how to improve your pipeline. The recommendations depend on the kind of model that you have implemented.
Trace viewerSelecting Trace Viewer from the Tools drop-down dialogue should return a dashboard similar to the one shown below. It shows a timeline for different events on the GPU or CPU during the profiling process.

The Trace Viewer is designed such that:
To the left (vertical grey column), you can see two major sections: /device and /host. This provides information on which TensorFlow op was executed on which device (GPU or CPU resp.).To the right, the colored bars denote the duration for which the respective TensorFlow ops were executed.

Trace Viewer makes it easy to understand the performance bottlenecks in the input pipeline. Besides, the Trace Viewer provides interactive functionality. Use the keyboard shortcut S. A and D to move to the left and right, respectively. Alternatively, use the navigation widget included in the Trace Viewer window.

To analyze an individual event, use the selection option and click on a TensorFlow Op.

You can use your mouse to select multiple events and analyze the traces based on the selected events or by holding onto the Ctrl key and selecting the desired events.
Input pipeline analyzerThe input pipeline analyzer checks the input pipeline and shows whether there is a performance bottleneck in the pipeline. It also tells us whether the model is input bound. The tool contains information related to:
The Summary of input-pipeline analysis.Recommendations for the next step.Device-side analysis details.Host-side analysis details.Input Op statistics.
Summary of input-pipeline analysis includes information on the overall input pipeline. The information shows whether the application is input bound and, if so, to what extent.

Recommendation for the next step provides suggestions on what steps to take next.

Device-side analysis details show the device step-time summary statistics and the graph of time taken during various processes, including:
CompilationOutput InputKernel launch Host computeDevice collection communicationDevice to device Device computation time
Host-side analysis details provide information on the breakdown of the input processing time on the host. Information contained includes:
Enqueuing dataData preprocessing Data reading in both advance and on demand other reading data or processingThis guide's processes mainly involved data preprocessing, as shown below.


The Host-side analysis details also include a section for recommendations on what can be done based on the host-side statistics.

Lastly, the Input Op statistics shows details of various input operations, including:
Input Op – the name of the underlying TensorFlow input operation.Count – number of instances of the operation execution during the profiling session.Total Time – the cumulative sum of time spent on each corresponding instance.Total Time % – total time spent on an operation as a percentage of the total time spent on processing the input.Total Self Time – the cumulative sum of the self-time spent on each instance.Total Self Time % – total self-time as a percentage of the total time spent on input processing.Category – processing category of the input operation.
mlnuggets newsletterJoin the newsletter to receive the technical deep dives in your inbox.
Join delighted readers. TensorFlow statsThe TensorBoard stats displays the performance of every TensorFlow operation that the host device has executed. The graphs shown might vary depending on the host device and TensorFlow processes. For instance, in this case, there are two pie charts.

The plot to the left shows the distribution of the total self-execution time of each operation on the host, while the last plot shows the distribution of the self-execution time on each operation type on the host.

The TensorFlow statistics can be filtered by IDLE time from the dialog box. IDLE time refers to the portion of the total execution time on a device (or host) that is idle.

Other statistics included in the TensorFlow stats dashboard are TensorFlow operations which various details regarding given operations.
GPU kernel statsIf the host device runs with a TPU or GPU kernel, you can view the performance statistics, and the originating operation for each GPU accelerated kernel through kernel_stats windows.
The figure below provides a sample overview of a GPU accelerated kernel.
Memory profile pageThe Memory profile page tool profiles information on GPU memory usage during TensorFlow Ops. This tool can analyze and debug OOM (Out of Memory) error– raised whenever the GPU’s memory is exhausted.
Components included in the Memory profile page include:
Memory Profile Summary shows a summary of the memory profile of the TensorFlow application.Memory Timeline Graph is a plot of the memory usage in GiBs and the percentage of fragmentation versus time in milliseconds. Memory Breakdown Table shows active memory allocations at the point of the highest memory usage in the profiling interval.
mlnuggets newsletterJoin the newsletter to receive the technical deep dives in your inbox.
Join delighted readers. How to enable debugging on TensorBoardYou can debug the information in the TensorBoard:
Select particular nodes and debug them.Graphically control the execution of the model.Visualize the tensors and their values.To enable debugging, add the following code before the model begins training.
logdir = os.path.join("logs/debugg", datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))tf.debugging.experimental.enable_dump_debug_info(logdir, tensor_debug_mode="FULL_HEALTH", circular_buffer_size=-1)Load TensorBoard.
%tensorboard --logdir logs/debugg
mlnuggets newsletterJoin the newsletter to receive the technical deep dives in your inbox.
Join delighted readers. Using TensorBoard with deep learning frameworksTensorBoard allows integration with other machine learning frameworks.
TensorBoard in PyTorchPyTorch is a popular open-source machine learning framework. You can log PyTorch events using TensorBoard to track loss, RMSE, and accuracy metrics.
First, define a SummaryWriter instance. You will log the events in ./runs/ so delete any prior logs.
import torch #summary instance from torch.utils.tensorboard import rm -rf ./runs/SummaryWriter writer = SummaryWriter()Next, define the data and model, and write the metrics to the SummaryWriter instance.
#install torch#pip install torchimport torch#datax = torch.arange(-5, 5, 0.1).view(-1, 1)y = -5 * x + 0.1 * torch.randn(x.size())#modelmodel = torch.nn.Linear(1, 1)criterion = torch.nn.MSELoss()optimizer = torch.optim.SGD(model.parameters(), lr = 0.1)def train_model(iter):for epoch in range(iter):y1 = model(x)loss = criterion(y1, y)writer.add_scalar("Loss/train", loss, epoch)optimizer.zero_grad()loss.backward()optimizer.step()train_model(10)writer.flush()#close writerwriter.close()To avoid cluttering, especially in cases where you have a large sample, you can arrange the results in the SummaryWriter instance as shown below.
from torch.utils.tensorboard import SummaryWriterimport numpy as npwriter = SummaryWriter()for n_iter in range(100):writer.add_scalar('Loss/train', np.random.random(), n_iter)writer.add_scalar('Loss/test', np.random.random(), n_iter)writer.add_scalar('Accuracy/train', np.random.random(), n_iter)writer.add_scalar('Accuracy/test', np.random.random(), n_iter)Load TensorBoard.
%tensorboard --logdir=runs
TensorBoard in KerasTo add Keras models to the TensorBoard, first, create a Keras callback object of TensorBoard whose logs will be saved in the experiment folder inside the folder containing the main logs.
tb_callback = tf.keras.callbacks.TensorBoard(log_dir="logs/experiment", histogram_freq=1)model = create_model()model.fit(X_train, y_train, epochs=10, callbacks=[tb_callback])Now run and visualize the Keras model in TensorBoard.
%tensorboard --logdir logs/experiment
TensorBoard in XGBoostXGBoost is another popular ML package used for classification and regression problems. To log events from XGBoost modeling, you need the tensorboardX package which can be installed using pip install tensorboardX. To work with XgBoost, install the package using:
conda install -c anaconda py-xgboost from your command prompt or !conda install -c anaconda py-xgboost in Google Colab notebook.
This example logs the events for an XGBoost model trained on the popular Ames housing dataset.
Remove prior logs rm -rf ./runs/ and define the XGBoost model.
import datetime#conda install -c anaconda py-xgboostimport xgboost as xgbimport os#set some xgboost attributes that miss in version 1.6.xnew_attrs = ['grow_policy', 'max_bin', 'eval_metric', 'callbacks', 'early_stopping_rounds', 'max_cat_to_onehot', 'max_leaves', 'sampling_method']for attr in new_attrs:setattr(xgb, attr, None)from tensorboardX import SummaryWriterfrom sklearn.model_selection import train_test_splitclass TensorBoardCallback(xgb.callback.TrainingCallback):'''Run experiments while scoring the model and saving the error to train or test folders'''def __init__(self, experiment: str = None, data_name: str = None):self.experiment = experiment or "logs"self.data_name = data_name or "test"self.datetime_ = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")#save the logs to the 'run/' folderself.log_dir = f"runs/{self.experiment}/{self.datetime_}"self.train_writer = SummaryWriter(log_dir=os.path.join(self.log_dir, "train/"))if self.data_name:self.test_writer = SummaryWriter(log_dir=os.path.join(self.log_dir, f"{self.data_name}/"))def after_iteration(self, model, epoch: int, evals_log: xgb.callback.TrainingCallback.EvalsLog) -> bool:if not evals_log:return Falsefor data, metric in evals_log.items():for metric_name, log in metric.items():score = log[-1][0] if isinstance(log[-1], tuple) else log[-1]if data == "train":self.train_writer.add_scalar(metric_name, score, epoch)else:self.test_writer.add_scalar(metric_name, score, epoch)return Falsefrom sklearn.datasets import fetch_openmlX, y = fetch_openml(name="house_prices", return_X_y=True)#subset numerical variablesnumerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']X = X.select_dtypes(include=numerics)#subset the data to train and testX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=100)dtrain = xgb.DMatrix(X_train, label=y_train, enable_categorical = True)dtest = xgb.DMatrix(X_test, label=y_test, enable_categorical = True)params = {'objective':'reg:squarederror', 'eval_metric': 'rmse'}bst = xgb.train(params, dtrain, num_boost_round=100, evals=[(dtrain, 'train'), (dtest, 'test')],callbacks=[TensorBoardCallback(experiment='exp_1', data_name='test')])Next, load the TensorBoard using the logs saved to SummaryWriter.
%tensorboard --logdir runs/
mlnuggets newsletterJoin the newsletter to receive the technical deep dives in your inbox.
Join delighted readers. TensorBoard in JAX and FlaxYou can log evaluation metrics when using JAX during model training, use TensorBoard to profile JAX programs using the jax.profiler.start_trace() and jax.profiler.stop_trace() to start and stop JAX tracing, respectively.
from torch.utils.tensorboard import SummaryWriterimport torchvision.transforms.functional as Flog_folder = "runs"writer = SummaryWriter(logdir)for epoch in range(1, num_epochs + 1):train_state, train_metrics = train_one_epoch(state, train_loader)training_loss.append(train_metrics['loss'])training_accuracy.append(train_metrics['accuracy'])print(f"Train epoch: {epoch}, loss: {train_metrics['loss']}, accuracy: {train_metrics['accuracy'] * 100}")test_metrics = evaluate_model(train_state, test_images, test_labels)testing_loss.append(test_metrics['loss'])testing_accuracy.append(test_metrics['accuracy'])writer.add_scalar('Loss/train', train_metrics['loss'], epoch)writer.add_scalar('Loss/test', test_metrics['loss'], epoch)writer.add_scalar('Accuracy/train', train_metrics['accuracy'], epoch)writer.add_scalar('Accuracy/test', test_metrics['accuracy'], epoch)print(f"Test epoch: {epoch}, loss: {test_metrics['loss']}, accuracy: {test_metrics['accuracy'] * 100}")The figure below shows a manual sample profiling of JAX.

Read more: How to use TensorBoard in JAX & Flax
Download TensorBoard data as Pandas DataFrameAfter you have finished modeling, you might be interested in conducting post-hoc analyses and creating custom visualizations based on log data. TensorBoard allows you to access your log data using data.experimental.ExperimentFromDev() function.
Consider the Iris classification problem. You can access the data for a given experiment using:
import tensorboard as tbexperiment_id = "c1KCv3X3QvGwaXfgX1c4tg"experiment = tb.data.experimental.ExperimentFromDev(experiment_id)df = experiment.get_scalars()df.head()
You can also obtain the DataFrame as a wide format since, in the experiment, the two tags (epoch_loss and epoch_accuracy) are present at the same set of steps in each run.
try:experiment_id = "c1KCv3X3QvGwaXfgX1c4tg"experiment = tb.data.experimental.ExperimentFromDev(experiment_id)df = experiment.get_scalars()df_wide = experiment.get_scalars(pivot=True)display(df_wide.head())except:print("There is only a single tag")df_wide = experiment.get_scalars(pivot=False)display(df_wide.head())
Finally, you can save the Pandas DataFrame as a CSV file.
#pathimport pandas as pdcsv_path = 'tensor_experiment_1.csv'df_wide.to_csv(csv_path, index=False)df_wide_roundtrip = pd.read_csv(csv_path)pd.testing.assert_frame_equal(df_wide_roundtrip, df_wide)You can now visualize the data using a visualization package such as Matplotlib.
mlnuggets newsletterJoin the newsletter to receive the technical deep dives in your inbox.
Join delighted readers. Tensorboard.devWith TensorBoard, you can easily host, track, and share ML experiments. All you need to do is upload logs to TensorBoard.dev. Sharing logs is possible in Google Colab or from the command prompt.
In your working directory, open the command prompt and run:
tensorboard dev upload --logdir logs
On Google Colab notebook:
%tensorboard dev upload --logdir logsOn Jupyter Notebook:
!tensorboard dev upload --logdir logs
You will be prompted to continue with the upload by entering y/yes; otherwise, abort the operation. After supplying Yes, an authorization window for www.google.com will be opened for you to complete the process. Upon successful completion, a unique link for the experiment will be created. The following link shows an example of an uploaded TensorBoard.
To stop uploading, interrupt the execution in Jupyter and Google Colab notebooks or press Ctrl-C if you are using the command prompt.
Limitations of using TensorBoardTensorBoard has its share of limitations. Some of the limitations of TensorBoard include:
Lacks private hosting. All experiments shared using Tensorboard.dev are public. Be keen not to upload sensitive information to TensorBoard.dev.TensorBoard is limited to specific data formats limiting the logging and visualization of other data formats such as audio/video or custom HTML.Lack of user and workspace management features often necessary for larger organizations.Scalability issues. TensorBoard starts getting performance issues as the number of runs increases.Does not offer functionality for team collaboration which disadvantages users who work on ML products as a team.Final thoughtsThe process of machine learning engineering, which every data scientist interacts with from time to time, requires extensive modeling using different frameworks to optimize the underlying models' predictive ability. However, the process of model optimization, debugging, and deployment can present its fair share of challenges. Tools like TensorBoard provide developers with resources to build better machine learning models and produce quality results with less effort.
From setting up TensorBoard to debugging and visualizing logs from other libraries, this guide delves into the functionality of TensorBoard in visualizing the machine learning modeling process.
TensorFlow ResourcesObject detection with TensorFlow 2 Object detection APIHow to train deep learning models on Apple Silicon GPUHow to build CNN in TensorFlow(examples, code, and notebooks)How to build artificial neural networks with Keras and TensorFlowCustom training loops in Keras and TensorFlowFlax vs. TensorFlowHow to build TensorFlow models with the Keras Functional APITensorFlow Recurrent Neural Networks (Complete guide with examples and code)Whenever you're ready, there is 2 way I can help you:
If you're looking for a way to build a career while writing about data science and machine learning, I'd recommend starting with an affordable ebook:
→ Writing for Data Scientists: The exact path I followed to get technical work that pays between $250-$500 from machine learning companies such as Comet, Neptune, cnvrg, Paperspace, Layer, Neural Magic, Determined, Activeloop, and many more. Get your copy.
→ Data Science and Machine Learning Ebook: I offer numerous free and paid data science and machine learning ebooks to help you in your data science career. Check them out.