# Introduction to Tensorflow

We examine Google's open source library Tensorflow, and go through its components to understand how it can be used to create scalable machine learning models.

Tensorflow is likely the most popular, and fastest growing machine learning framework that exists. With over 70000 stars on Github, and backing from Google, it not only has more stars than Linux, but also has a ton of resources behind it.

If that doesn't pique your interest, I have no idea what will.

If you've been following the machine learning 101 series up to now, you will notice that we've used the sklearn framework to implement our models. However, as we begin venturing into neural networks, deep learning, and the inner workings of some of the algorithms, we will start using the Tensorflow framework which has the capability to access more low-level APIs to give us a more nuanced control over the model.

Because of this, we will spend some time familiarizing ourselves with Tensorflow, and its design philosophy, so that we in subsequent tutorials can start using it without introduction.

In this tutorial we will talk about:

- General design philosophy
- Visualization
- Examples covering common use cases
- How it relates to machine learning

In the official white-paper, Tensorflow is described as "an interface for expressing machine learning algorithms, and an implementation for executing such algorithms". Its main advantage over other frameworks is how easy it is to execute the code on a wide array of devices. This is related to the initial motivation for its development, before it was open-sourced. Google initially developed Tensorflow to bridge the gap between research and production aspiring to an ideal where no edits to the code had to be made to go from research to production.

Tensorflow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms.

To achieve this, Tensorflow implements a computational graph behind the scenes; in your code, you're defining just defining that graph: the flow of tensors.

Wait, what is a tensor?

Just like a vector can be thought of as being an array, or a list, of scalars (ordinary numbers like 1, 2, and PI), and matrices can be thought of as arrays of vectors, a tensor can be thought of as an array of matrices. So a tensor is really just an n-dimensional matrix. It turns out, as we will see in the coding examples, that this architecture makes a lot of sense when working with machine learning.

What is the flow?

The flow is how tensors are passed around in the network. When the tensors are passed around, their values and shapes are updated by the graph operations.

As an analogy, you can think of the graph as a car factory with a series of workstations. One station may put on the wheels of the car while another installs the gearbox. The flow then describes the route a car skeleton has to take in order to become a fully functional car. The tensors passed around in this analogy would be the car prototype, or skeleton.

## Installing Tensorflow

You can install Tensorflow using pip using the following command:

```
pip install tensorflow
```

Or if you have a GPU:

```
pip install tensorflow-gpu
```

Note that if you're installing the GPU version, you need to have CUDA and cuDNN installed.

As of writing this, Tensorflow (v1.3) supports CUDA 8 and cuDNN 6.

Once you have installed Tensorflow, you can verify that everything works correctly using:

```
import tensorflow as tf
```

```
# Figure out what devices are available
from tensorflow.python.client import device_lib
def get_devices():
return [x.name for x in device_lib.list_local_devices()]
print (get_devices())
```

```
['/cpu:0', '/gpu:0']
```

For more information, you can refer to the installation page.

## The atoms of Tensorflow

We already discussed how Tensorflow literally is the flow of tensors, but we didn't go into much detail. In order to better justify the architectural decisions, we will elaborate a bit on this.

### Three types of tensors

In Tensorflow, there are three primary types of tensors:

- tf.Variable
- tf.constant
- tf.placeholder

It's worth it to take a a look at each of these to discuss the differences, and when they are to be used.

### tf.Variable

The `tf.Variable`

tensor is the most straight forward basic tensor, and is in many ways analogous to pure Python variables in that the value of it is, well, variable.

Variables retain their value during the entire session, and are therefore useful when defining learnable parameters such as weights in neural networks, or anything else that's going to change as the code is running.

You define a variable as by the following:

```
a = tf.Variable([1,2,3], name="a")
```

Here, we create a tensor variable with the initial state `[1,2,3]`

, and the name `a`

. Notice, that Tensorflow is not able to inherit the Python variable name, so if you want to have a name on the graph (more on that later), you need to specify a name.

There are a few more options, but this is only meant to cover the basics. As with any of the things discussed here, you can read more about it on the documentation page.

### tf.constant

The `tf.Constant`

is very similar to `tf.Variable`

with one major difference, they are immutable, that is the value is constant (wow, Google really nailed the naming of tensors).

The usage follows that of the `tf.Variable`

tensor:

```
b = tf.constant([1,2,3], name="b")
```

You use this whenever you have a value that doesn't change through the execution of the code for example to denote some property of the data, or to store the learning rate when using neural networks.

### tf.placeholder

Finally, we have the `tf.placeholder`

tensor. As the name implies, this tensor type is used to define variables, or graph nodes (operations), for which you don't have an initial value. You then defer setting a value until you actually do the computation using `sess.run`

. This is useful for example as a proxy for your training data when defining the network.

When running the operations, you need to pass actual data for the placeholders. This is done like so:

```
c = tf.placeholder(tf.int32, shape=[1,2], name="myPlaceholder")
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
res = sess.run(c,
feed_dict={
c:[[5,6]]
})
print (res)
```

```
[[5 6]]
```

Notice that we define a placeholder by first passing a non-optional parameter of the element type (here `tf.int32`

), and then we define the `shape`

using matrix dimension notation. The `[1,2]`

denotes a matrix with 1 row and two columns. If you haven't studied linear algebra, this may seem confusing at first: why denote the height before the width?, and isn't `[1,2]`

a 1 by 2 matrix itself with the values 1 and 2?

These are valid questions, but in-depth answers are out of the scope of this essay. However, to give you the gist of it, the apparantly weird notation form has some quite neat mnemonic properties for some matrix operations, and yes, `[1,2]`

can also be seen as a one by two matrix in itself. Tensorflow uses the list like notation because it supports n-dimensional matrices, and it's therefore very convenient as we will see later.

You can find a complete list of supported Tensorflow datatypes here.

When we evaluate the value of `c`

with `sess.run`

we pass in the actual data using a `feed_dict`

. Notice that we use the Python variable name, and not the name given to the Tensorflow graph to target the placeholder. This same approach also extends to multiple placeholders where each variable name is mapped to a dictionary key of the same name.

### Wildcards when defining shapes

Sometimes, you don't know some, or the entire shape of a placeholder when defining it. For example, you may use a variable batch size when training, this is where wildcards come in.

Wildcards essentially allows you to say, "I don't know" to Tensorflow, and let it infer the shapes from the incoming tensors.

What's the difference between `-1`

and `None`

?

Honestly, I tried to figure out the answer to this, but I haven't been able to find any documented difference between them, and the little I dug around in the source-code of Tensorflow didn't yield any results either. However, I've run into a couple of examples where one would raise an error while the other one wouldn't.

Of the two, `None`

seems to work better for me, so that's what I always use, and if I get an error related to the size of my placeholders, I try to change it to `-1`

, but I do think they are supposed to be equivalent.

Why not just wildcard EVERYTHING!?!

Having explicit shapes helps debugging as a lot of errors will be catched at "compile time" as opposed when training allowing you to spot mistakes more quickly, and it ensures that errors don't creep up on you silently (at least it tries to).

So to save your future self from headaches, you should only use wildcards when describing something variable such as input size, and not something static such as network parameter size.

## Basic computation example

Knowing how variables work, we can now look at how to create more complex interactions.

A graph in Tensorflow consists of interconnected operations (ops). An op is essentially a function that is anything that takes some input and produces some output, and as we discussed before, the default datatype of Tensorflow is the tensor, so operations can be said to be doing tensor manipulations.

Taking a look at a very basic example, multiplying two scalars, it can be done like so:

```
a = tf.Variable(3)
b = tf.Variable(4)
c = tf.multiply(a,b)
print (c)
```

```
Tensor("Mul:0", shape=(), dtype=int32)
```

```
print (a)
print (b)
```

```
<tf.Variable 'Variable_4:0' shape=() dtype=int32_ref>
<tf.Variable 'Variable_5:0' shape=() dtype=int32_ref>
```

Note that when we print the result we get another Tensor, and not the actual result. Also, notice that the variables have the shape `()`

which is because a scalar is a zero dimensional tensor. Finally, because we didn't specify a name, we get the names `'Variable_4:0'`

, and `'Variable_5:0'`

which means they are variable 4 and 5 on graph 0.

To get the actual result, we have to compute the value in the context of a session. This can be done like so:

```
with tf.Session() as sess:
sess.run(tf.global_variables_initializer()) # this is important
print (sess.run(c))
```

```
12
```

You can also use `tf.InteractiveSession`

which is useful if you're using something like IDLE or a jupyter notebook. Furthermore, it's also possible to start a session by declaring `sess = tf.Session()`

, and then close it by using `sess.close()`

, however, I do not recommend this practice as it's easy to forget to close the session, and using this method as an interactive session may have performance implications as Tensorflow really likes to eat as many resources as it can get its hands on (it's a bit like Chrome in this regard).

We start by creating a session which signals to Tensorflow that we want to start doing actual computations. Behind the scenes, Tensorflow does a few things; it chooses a device to perform the computations on (by default your first CPU), and it initializes the computational graph. While you can use multiple graph, it's generally recommended to use just one because data cannot be sent between two graphs without having to go through Python (which we established is slow). This holds true even if you have multiple disconnected parts.

Next we initialize the variables. Why you cannot do this while starting a session I don't know, but it fills in the values of our variables in the graph, so we can use it in our computation. This is one of these small annoyances which you have to remember every time you want to compute something.

It might help to remember that Tensorflow is really lazy, and wants to do as little as possible. As an implication of this, you will have to explicitly tell Tensorflow to initialize the variables.

### Tensorflow is lazy

It might be useful to explore this in a bit more detail as it's really important to understand how and why this was chosen in order to use Tensorflow effectively.

Tensorflow likes to defer computation for as long as possible. It does so because Python is slow, so it wants to run the computation outside Python. Normally, we use libraries such as numpy to accomplish this, but transferring data between Python and optimized libraries such as numpy is expensive.

Tensorflow gets around this by first defining a graph using Python without doing any computation, and then it sends all the data to the graph outside Python where it can be run using efficient GPU libraries (CUDA). This way, the time spent on transferring data is kept at a minimum.

As a result of this, Tensorflow only has to compute the part of the graph you actually need. It does this by propagating back through the network when you run an operation to discover all the dependencies the computation relies on, and only computes those. It ignores the rest of the network.

Consider the code below for example:

```
a = tf.Variable(3)
b = tf.Variable(4)
c = tf.multiply(a,b)
d = tf.add(a,c)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
c_value = sess.run(c)
d_value = sess.run(d)
print (c_value, d_value)
```

```
12 15
```

Here, we have two primitive values, a, and b, and two composite values, c, and d.

- c relies on a, and b.
- d relies on a, and c.

So what happens when we compute the values of the composites? If we start with the simplest, c, we see that it relies on the primitive values, a, and b, so when computing, c, Tensorflow discovers this through the backpropagation (which is not the same as backpropagation through a neural network), gets the value of these primitives and multiplies them together.

The value of d is computed in a similar fashion. Tensorflow finds that d is an additions operation that relies on the value of a, and c, so Tensorflow gets the value of each of them. For the value a, all is great, and Tensorflow is able to use the primitive value as is, but with the value c, Tensorflow discovers that it itself is a composite value, here a multiply operation that relies on a, and b. Tensorflow now gets the value of a, and b which it uses to compute the value of c, so it can compute the value of d.

Tensorflow recursively computes the dependencies of an operation to find its computed value.

However, this also means that values are discarded once computed, and can therefore not be used to speed up future computations. Using the example above, this means that the value of c is recalculated when computing the value of d even though we just computed c and it hasn't changed since then.

Below, this concept is explored further. We see that while the result of `c`

is immediately discarded after being computed, you can save the result into a variable (here `res`

), and when you do that, you can even access the result after the session is closed.

```
a = tf.Variable(3)
b = tf.Variable(4)
c = tf.multiply(a,b)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
res = sess.run(c)
print (res,c)
```

```
12 Tensor("Mul:0", shape=(), dtype=int32)
```

```
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
res = sess.run(c)
print (res,c)
```

```
12 Tensor("Mul:0", shape=(), dtype=int32)
```

## Choosing devices

You can choose to compute some operations on a specific device using template below:

```
with tf.device("/gpu:0"):
# do stuff with GPU
with tf.device("/cpu:0"):
# do some other stuff with CPU
```

Where the string `"/gpu:0"`

, and `"/cpu:0"`

can be replaced with any of the available device name strings you found when verifying that Tensorflow was correctly installed.

If you installed the GPU version, Tensorflow will automatically try and run the graph on the GPU without you having to explicitly define it.

If a GPU is available it will be prioritized over the CPU.

When using multiple devices, it's worth considering that switching between devices is rather slow because all the data has to be copied over to the memory of the new device.

## Distributed computing

For when one computer simply isn't enough.

Tensorflow allows for distributed computing. I imagine that this will not be relevant for most of us, so feel free to skip this section as you please, however, if you believe you might use multiple computers to work on a problem, this section might have some value to you.

Tensorflow's distributed model can be broken down into several two parts:

- Server
- Cluster

These are analogous to a server/client model. While the server contains the master copy, the clusters contain a set of jobs that each have a set of tasks which are actual computations.

A server that manages a cluster with one job and two workers sharing the load between two tasks can be created like so:

```
cluster = tf.train.ClusterSpec({"my_job": ["worker1.ip:2222", "worker2.ip:2222"]})
server = tf.train.Server(cluster, job_name="my_job", task_index=1)
a = tf.Variable(5)
with tf.device("/job:my_job/task:0"):
b = tf.multiply(a, 10)
with tf.device("/job:my_job/task:1"):
c = tf.add(b, a)
with tf.Session("grpc://localhost:2222") as sess:
res = sess.run(c)
print(res)
```

A corresponding worker-client can be created like so:

```
# Get task number from command line
import sys
task_number = int(sys.argv[1])
import tensorflow as tf
cluster = tf.train.ClusterSpec({"my_job": ["worker1.ip:2222", "worker2.ip:2222"]})
server = tf.train.Server(cluster, job_name="my_job", task_index=task_number)
print("Worker #{}".format(task_number))
server.start()
server.join()
```

If the client code is saved to a file, you can start the workers by typing into a terminal:

```
python filename.py 0
```

```
python filename.py 1
```

This will start two workers that listen for task 0 and task 1 of the `my_job`

job.
Once the server is startedk, it will send the tasks to the workers which will return the answers to the server.

For a more in-depth look at distributed computing with Tensorflow, please refer to the documentation.

## Saving variables (model)

Having to throw out the hard learned parameters after they have been computed isn't much fun.

Luckily, saving a model in Tensorflow quite simple using the saver object as illustrated in the example below:

```
a = tf.Variable(5)
b = tf.Variable(4, name="my_variable")
# set the value of a to 3
op = tf.assign(a, 3)
# create saver object
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(op)
print ("a:", sess.run(a))
print ("my_variable:", sess.run(b))
# use saver object to save variables
# within the context of the current session
saver.save(sess, "/tmp/my_model.ckpt")
```

```
a: 3
my_variable: 4
```

## Loading variables (model)

As with saving the model, loading a model from a file is also simple.

Note: If you have specified a Tensorflow name, you must use that same name in your loader as it has higher priority than the Python name. If you haven't specified a Tensorflow name, the Variable is saved using the Python name.

```
# Only necessary if you use IDLE or a jupyter notebook
tf.reset_default_graph()
# make a dummy variable
# the value is arbitrary, here just zero
# but the shape must the the same as in the saved model
a = tf.Variable(0)
c = tf.Variable(0, name="my_variable")
saver = tf.train.Saver()
with tf.Session() as sess:
# use saver object to load variables from the saved model
saver.restore(sess, "/tmp/my_model.ckpt")
print ("a:", sess.run(a))
print ("my_variable:", sess.run(c))
```

```
INFO:tensorflow:Restoring parameters from /tmp/my_model.ckpt
a: 3
my_variable: 4
```

## Visualizing the graph

It's easy to lose the big picture when looking at the model as code, and it can be difficult to see the evolution of a model's performance over time from `print`

statements alone. This is where visualization comes in.

Tensorflow offers some tools that can take a lot of the work out of creating graphs.

The visualization kit consists of two parts: tensorboard and a summary writer. Tensorboard is where you will see the visualizations, and the summary writer is what will convert the model and variables into something tensorboard can render.

Without any work, the summary writer can give you a graphical representation of a model, and with very little work you can get more detailed summaries such as the evolution of loss, and accuracy as the model learns.

Let's start by considering the simplest form for visualization that Tensorflow supports: visualizing the graph.

To achieve this, we simply create a summary writer, give it a path to save the summary, and point it to the graph we want saved. This can be done in one line of code:

```
fw = tf.summary.FileWriter("/tmp/summary", sess.graph)
```

Integrated in an example, this becomes:

```
a = tf.Variable(5, name="a")
b = tf.Variable(10, name="b")
c = tf.multiply(a,b, name="result")
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print (sess.run(c))
fw = tf.summary.FileWriter("/tmp/summary", sess.graph)
```

Running tensorboard using the command below, and opening the URL, we get a simple overview of the graph.

`tensorboard --logdir=/tmp/summary`

### Naming and scopes

Sometimes when working with large models, the graph visualization can become complex. To help with this, we can define scopes using `tf.name_scope`

to add another level of abstraction, in fact, we can define scopes within scopes as illustrated in the example below:

```
with tf.name_scope('primitives') as scope:
a = tf.Variable(5, name='a')
b = tf.Variable(10, name='b')
with tf.name_scope('fancy_pants_procedure') as scope:
# this procedure has no significant interpretation
# and was purely made to illustrate why you might want
# to work at a higher level of abstraction
c = tf.multiply(a,b)
with tf.name_scope('very_mean_reduction') as scope:
d = tf.reduce_mean([a,b,c])
e = tf.add(c,d)
with tf.name_scope('not_so_fancy_procedure') as scope:
# this procedure suffers from imposter syndrome
d = tf.add(a,b)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print (sess.run(c))
print (sess.run(e))
fw = tf.summary.FileWriter("/tmp/summary", sess.graph)
```

Note that the scope names must be one word.

Opening this summary in tensorboard we get:

We can expand the scopes to see the individual operations that make up the scope.

If we expand `very_mean_reduction`

even further, we can see `Rank`

, and `Mean`

which are a part of the `reduce_mean`

function. We can even expand those to see how those are implemented.

### Visualizing changing data

While just visualizing the graph is pretty cool, when learning parameters, it'd be useful to be able to visualize how certain variables change over time.

The simplest way of visualizing changing data is by adding a scalar summary. Below is an example that implements this and logs the change of c.

```
import random
a = tf.Variable(5, name="a")
b = tf.Variable(10, name="b")
# set the intial value of c to be the product of a and b
# in order to write a summary of c, c must be a variable
init_value = tf.multiply(a,b, name="result")
c = tf.Variable(init_value, name="ChangingNumber")
# update the value of c by incrementing it by a placeholder number
number = tf.placeholder(tf.int32, shape=[], name="number")
c_update = tf.assign(c, tf.add(c,number))
# create a summary to track to progress of c
tf.summary.scalar("ChangingNumber", c)
# in case we want to track multiple summaries
# merge all summaries into a single operation
summary_op = tf.summary.merge_all()
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# initialize our summary file writer
fw = tf.summary.FileWriter("/tmp/summary", sess.graph)
# do 'training' operation
for step in range(1000):
# set placeholder number somewhere between 0 and 100
num = int(random.random()*100)
sess.run(c_update, feed_dict={number:num})
# compute summary
summary = sess.run(summary_op)
# add merged summaries to filewriter,
# so they are saved to disk
fw.add_summary(summary, step)
```

So what happens here?

If we start by looking at the actual logic, we see that the value of c, the changing variable, starts by being the product of a, and b (50).

We then run an update operation 1000 times which increments the value of c by a randomly selected amount between 0 and 100.

This way, if we were to plot the value of c over time, we'd expect to see it linearly increase over time.

With that out of the way, let's see how we create a summary of c.

Before the session, we start by telling Tensorflow that we do in fact want a summary of c.

```
tf.summary.scalar("ChangingNumber", c)
```

In this case, we use a scalar summary because, well, c is a scalar. However, Tensorflow supports an array of different summarizers including:

- histogram (which accepts a tensor array)
- text
- audio
- images

The last three are useful if you need to summarize rich data you may be using to feed a network.

Next, we add all the summaries to a summary op to simplify the computation.

```
summary_op = tf.summary.merge_all()
```

Strictly speaking, this is not necessary here as we only record the summary of one value, but in a more realistic example, you'd typically have multiple summaries which makes this very useful. You can also use `tf.summary.merge`

to merge specific summaries like so:

```
summary = tf.summary.merge([summ1, summ2, summ3])
```

This can be powerful if coupled with scopes.

Next, we start the session where we do the actual summary writing. We have to tell Tensorflow what and when to write; it won't automatically write a summary entry every time a variable changes even though it'd be useful.

Therefore, every time we want a new entry in the summary, we have to run the summary operation. This allows for flexibility in how often, or with what precision, you want to log your progress. For example, you could choose to log progress only every thousand iterations to speed up computation, and free IO calls.

Here we just log the progress at every iteration with the following line of code:

```
summary = sess.run(summary_op)
```

We now have the summary tensorboard uses, but we haven't written it to disk yet. For this, we need to add the summary to the filewriter:

```
fw.add_summary(summary, step)
```

Here, the second argument `step`

indicates the location index for the summary, or the x-value in a plot of it. This can be any number you want, and when training networks, you can often just use the iteration number. By manually specifying the index number, the summary writer allows for a lot of flexibility when creating the graphs as you can walk backwards, skip values, and even compute two, or more, values for the same index.

This is all we need. If we now open tensorboard, we see the resulting graph, and the plot that has been made from the summary.

And as predicted, the trend of the summary plot is indeed linear with a positive slope.

## An almost practical example

While the small examples up until now are great at demonstrating individual ideas, they do a poor job of showing how it all comes together.

To illustrate this, we will now use everything (well, almost everything) we have learned about Tensorflow to make something we at least can pretend to be somewhat practical; we will build a very simple neural network to classify digits from the classic MNIST dataset. If you're not fully up to speed with neural networks, you can read this introduction (coming soon) before coming back to this.

The construction and training of the neural network can be broken down into a couple of phases:

- Importing the data.
- Constructing the model architecture.
- Defining a loss function to optimize, and a way to optimize it.
- Actually training the model.
- Evaluating the model.

However, before we can start creating the model, we must first prepare Tensorflow:

```
import tensorflow as tf
tf.reset_default_graph() # again, this is not needed if run as a script
```

Next, we import the data.

```
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
```

```
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
```

Since mnist is such a well known dataset, we can use the built in data extractor to get a nice wrapper around the data.

Now, it's time to define the actual model that's going to be used. For this task, we will use a feed forward network with two hidden layers that has 500 and 100 parameters respectively.

Using the idea about scopes to separate the graph into chunks, we can implement the model like so:

```
# input
with tf.name_scope('input') as scope:
x = tf.placeholder(tf.float32, [None, 28*28], name="input")
# a placeholder to hold the correct answer during training
labels = tf.placeholder(tf.float32, [None, 10], name="label")
# the probability of a neuron being kept during dropout
keep_prob = tf.placeholder(tf.float32, name="keep_prob")
with tf.name_scope('model') as scope:
with tf.name_scope('fc1') as scope: # fc1 stands for 1st fully connected layer
# 1st layer goes from 784 neurons (input) to 500 in the first hidden layer
w1 = tf.Variable(tf.truncated_normal([28*28, 500], stddev=0.1), name="weights")
b1 = tf.Variable(tf.constant(0.1, shape=[500]), name="biases")
with tf.name_scope('softmax_activation') as scope:
# softmax activation
a1 = tf.nn.softmax(tf.matmul(x, w1) + b1)
with tf.name_scope('dropout') as scope:
# dropout
drop1 = tf.nn.dropout(a1, keep_prob)
with tf.name_scope('fc2') as scope:
# takes the first hidden layer of 500 neurons to 100 (second hidden layer)
w2 = tf.Variable(tf.truncated_normal([500, 100], stddev=0.1), name="weights")
b2 = tf.Variable(tf.constant(0.1, shape=[100]), name="biases")
with tf.name_scope('relu_activation') as scope:
# relu activation, and dropout for second hidden layer
a2 = tf.nn.relu(tf.matmul(drop1, w2) + b2)
with tf.name_scope('dropout') as scope:
drop2 = tf.nn.dropout(a2, keep_prob)
with tf.name_scope('fc3') as scope:
# takes the second hidden layer of 100 neurons to 10 (which is the output)
w3 = tf.Variable(tf.truncated_normal([100, 10], stddev=0.1), name="weights")
b3 = tf.Variable(tf.constant(0.1, shape=[10]), name="biases")
with tf.name_scope('logits') as scope:
# final layer doesn't have dropout
logits = tf.matmul(drop2, w3) + b3
```

For training, we are going to use the cross entropy loss function together with tha ADAM optimizer with a learning rate of 0.001. Following the example above, we continue the use of scopes to organize the graph.

We also add two summarizers for accuracy and the average loss, and create a merged summary operation to simplify later steps.

Finally, once we add the saver object, so we don't lose the model after training (which would be a shame), we have this:

```
with tf.name_scope('train') as scope:
with tf.name_scope('loss') as scope:
# loss function
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits)
# use adam optimizer for training with a learning rate of 0.001
train_step = tf.train.AdamOptimizer(0.001).minimize(cross_entropy)
with tf.name_scope('evaluation') as scope:
# evaluation
correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# create a summarizer that summarizes loss and accuracy
tf.summary.scalar("Accuracy", accuracy)
# add average loss summary over entire batch
tf.summary.scalar("Loss", tf.reduce_mean(cross_entropy))
# merge summaries
summary_op = tf.summary.merge_all()
# create saver object
saver = tf.train.Saver()
```

It's now time to begin training the network. Using the techniques discussed previously, we write a summary every 100 steps for the total of 20000 steps.

At each step we train the network with a batch of 100 examples by running the `train_step`

operation which will update the weights of network in accordance with the learning rate.

Finally, once the learning is done, we print out the test accuracy, and save the model.

```
with tf.Session() as sess:
# initialize variables
tf.global_variables_initializer().run()
# initialize summarizer filewriter
fw = tf.summary.FileWriter("/tmp/nn/summary", sess.graph)
# train the network
for step in range(20000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, labels: batch_ys, keep_prob:0.2})
if step%1000 == 0:
acc = sess.run(accuracy, feed_dict={
x: batch_xs, labels: batch_ys, keep_prob:1})
print("mid train accuracy:", acc, "at step:", step)
if step%100 == 0:
# compute summary using test data every 100 steps
summary = sess.run(summary_op, feed_dict={
x: mnist.test.images, labels: mnist.test.labels, keep_prob:1})
# add merged summaries to filewriter,
# so they are saved to disk
fw.add_summary(summary, step)
print ("Final Test Accuracy:", sess.run(accuracy, feed_dict={
x: mnist.test.images, labels: mnist.test.labels, keep_prob:1}))
# save trained model
saver.save(sess, "/tmp/nn/my_nn.ckpt")
```

```
mid train accuracy: 0.1 at step: 0
mid train accuracy: 0.91 at step: 1000
mid train accuracy: 0.89 at step: 2000
mid train accuracy: 0.91 at step: 3000
[...]
mid train accuracy: 0.97 at step: 17000
mid train accuracy: 0.98 at step: 18000
mid train accuracy: 0.97 at step: 19000
Final Test Accuracy: 0.9613
```

96% accuracy is that any good?

No, that actually kind of sucks, but the point of this network is not to be the best network. Instead, the point of it is to demonstrate how you can use Tensorflow to construct a network, and get a lot of visualization pizzazz for very little work.

If we run the model, and open it in tensorboard, we get:

Furthermore, we can see the summaries Tensorflow made for the accuracy and loss, and that they do, as expected, behave approximately like inverse of each other. We also see that the accuracy increases a lot in the beginning, but flattens out over time which is expected partly because we use the ADAM optimizer, and partly because the nature of gradients.

The use of nested scopes let's us progressively change the abstraction level. Notice how, if we expand the model, we can see the individual layers before the individual layer components.

If you want to run this network yourself, you can access the code on Github.

## Conclusion

Wow, you're still here. You deserve a cute picture of a cat.

If you have followed this far, you should now be comfortable with the basics of Tensorflow: How it functions, how to do basic computations, how to visualize the graph, and finally you have seen a real example of how it can be used to create a basic neural network.

Also, send me a tweet @kasperfredn if you made it all the way through: You're awesome.

As this was just an introduction to Tensorflow, there's a lot we didn't cover, but you should know enough now to be able to understand the API documentation where you can find modules you can incorporate into your code.

If you want a challenge to test your comprehension, try to use Tensorflow to implement another machine learning model by either working from the model we created here, or starting from scratch.

For feedback, send your results to "homework [at] kasperfred.com". Remember to include the title of the essay in the subject line.