Enhancing Graph Readability with Labels and Implementing Backpropagation
This post explores how to improve the readability of computational graphs by adding labels to nodes and demonstrates the process of backpropagation for calculating gradients.
Adding Labels for Enhanced Clarity
Visualizing computational graphs can be complex. Adding labels significantly improves readability, making it easier to understand the flow of operations.
Here’s how to add labels within a Python Value class:
class Value:
  def __init__(self, data, _children=(), _op='', label=''):
    self.data = data
    self._prev = set(_children)
    self._op = _op
    self.label = label
  # ... other methods ...
The label parameter is now part of the Value object.
To display these labels in the graph visualization, modify the draw_dot function. The original node representation likely looked like this:
dot.node(name=uid, label="{ data %.4f }" % (n.data,), shape='record')
Update it to include the label:
dot.node(name=uid, label="{ %s | data %.4f }" % (n.label, n.data), shape='record')
Now, visualizing the graph will display both the data and the assigned label for each node, greatly enhancing interpretability.
Building a More Complex Expression
Let’s expand our example with additional nodes:
a = Value(2.0, label='a')
b = Value(-3.0, label='b')
c = Value(10, label='c')
e = a * b; e.label = 'e'
d = e + c; d.label = 'd'
f = Value(-2.0, label='f')
L = d * f; L.label = 'L'
Visualizing this expression with draw_dot(L) will produce a more intricate graph, clearly showing the relationships between variables and operations due to the added labels.
Understanding and Calculating Gradients with Backpropagation
The goal of backpropagation is to understand how changes in input values affect the output of a function, specifically the loss function (L in our example).  This involves calculating gradients like dL/da, dL/db, etc.
Adding the grad Parameter
To facilitate backpropagation, add a grad attribute to the Value class:
class Value:
  def __init__(self, data, _children=(), _op='', label=''):
    # ... (existing code)
    self.grad = 0.0 # Initialize gradient to zero
Update the graph visualization to display the gradient:
dot.node(name=uid, label="{ %s | data %.4f | grad %.4f }" % (n.label, n.data, n.grad), shape='record')
Manual Backpropagation
We’ll manually calculate gradients for our example.
- Node L: dL/dLis trivially 1.L.grad = 1
- 
Node d: dL/dd = f = -2.0.d.grad = -2.0
- 
Node f: dL/df = d = 4.0.f.grad = 4.0
- 
Node c: Using the chain rule: dL/dc = dL/dd * dd/dc = -2.0 * 1 = -2.0.c.grad = -2.0
- 
Node e: Similarly, dL/de = dL/dd * dd/de = -2.0 * 1 = -2.0.e.grad = -2.0
- 
Node a: dL/da = dL/de * de/da = -2.0 * -3.0 = 6.0.a.grad = 6.0
- 
Node b: dL/db = dL/de * de/db = -2.0 * 2.0 = -4.0.b.grad = -4.0
Redrawing the graph at each step visually demonstrates how the gradients propagate backward through the network.
Numerical Verification of Gradients
Numerical verification provides a way to check the correctness of the calculated gradients.  This involves perturbing an input value slightly and observing the change in the output.  For example, to verify dL/df:
def verify_dL_by_df():
  h = 0.001
  # Calculate L1 with original f value
  # ...
  # Calculate L2 with f + h
  # ...
  print((L2 - L1)/h) # Approximation of dL/df
Conclusion
This post demonstrated how to add labels to computational graph nodes for improved understanding and walked through a manual backpropagation example. We also touched upon numerical gradient verification. These concepts are fundamental to understanding and implementing neural networks.