TensorFlow Profiling
This article introduces the TensorFlow profiler for Python user in practice. First, I will present the basic configuration for the profiler, then I use an example to show how to integrate it with existing code. In the end, I conclude with how to access the results from the profiler's output.
TensorFlow profiler (profiler hereafter) is a tool for analyzing deep neural network and performance tuning.
import tensorflow as tf
import tabulate
from tensorflow.profiler import ProfileOptionBuilder as POB
Profile Option Builder¶
The profiler relies on a dictionary of parameters to control the trackings. The documentation can be found at the following places.
- TensorFlow documentaion: https://www.tensorflow.org/api_docs/python/tf/profiler/ProfileOptionBuilder
- Souce code Definition: https://github.com/tensorflow/tensorflow/blob/r1.11/tensorflow/python/profiler/option_builder.py
- Detailed documentation: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/profiler/g3doc/options.md
Functions to create default options¶
TensorFlow provides three methods to create a configuration for basic statistics:
tf.profiler.ProfileOptionBuilder.Protrainable_variables_parameter()
tf.profiler.ProfileOptionBuilder.float_operation()
tf.profiler.ProfileOptionBuilder.time_and_memory()
The above functions returns a dict
object contains the default values as below:
opts = [POB.trainable_variables_parameter(), POB.float_operation(), POB.time_and_memory()]
compare_table = [(i, opts[0][i], opts[1][i], opts[2][i]) for i in opts[0].keys()]
print(tabulate.tabulate(compare_table, headers=["Name","trainable_variables_parameter", "float_operation", "time_and_memory"]))
Customize options¶
The above detailed configurations can be further modified. There are two approaches:
-
Not preferred, directly change the value in the
dict
; - Create a
ProfileOptionBuilder
object without default, or with defaut in a dictionary. Then update values (see next section), and usebuild()
to create thedict
.
config_dict = POB.time_and_memory()
new_config = POB(config_dict)
new_config.with_max_depth(100)
new_config.build()
Profiler update function and value¶
Comments are from the detailed documentation in g3doc.
function | Key | Comment |
---|---|---|
with_max_depth | max_depth | Show nodes that are at most this number of hops from starting node in the data structure. |
with_min_memory | min_bytes min_peak_bytes min_residual_bytes min_output_bytes |
Show nodes that request at least this number of bytes. Show nodes that using at least this number of bytes during peak memory usage. Show nodes that have at least this number of bytes not being de-allocated after Compute. Show nodes that have at least this number of bytes output (no necessarily allocated by the nodes). |
with_min_execution_time | min_micros min_accelerator_micros min_cpu_micros |
Show nodes that spend at least this number of microseconds to run. It sums accelerator_micros and cpu_micros. Note: cpu and accelerator can run in parallel. Show nodes that spend at least this number of microseconds to run on accelerator (e.g. GPU). Show nodes that spend at least this number of microseconds to run on CPU. |
with_min_parameters | min_parameters | Show nodes that contains at least this number of parameters. |
with_min_occurrence | min_occurrence | Show nodes that appear at least this number of times. |
with_min_float_operations | min_float_ops | Show nodes that contain at least this number of float operations. Only available if an node has op.RegisterStatistics() defined and OpLogProto is provided |
with_accounted_types | account_type_regexes | Account and display the nodes whose types match one of the type regexes specified. tfprof allow user to define extra operation types for graph nodes through tensorflow.tfprof.OpLogProto proto. regexes are comma-sperated. |
with_node_names | start_name_regexes trim_name_regexes show_name_regexes hide_name_regexes |
Show node starting from the node that matches the regexes, recursively. regexes are comma-separated. Hide node starting from the node that matches the regexes, recursively, regexes are comma-seprated. Show node that match the regexes. regexes are comma-seprated. Hide node that match the regexes. regexes are comma-seprated. |
account_displayed_op_only | account_displayed_op_only | If True, only account the statistics of ops eventually displayed. If False, account all op statistics matching -account_type_regexes recursively. |
with_empty_output | output | Do not generate side-effect outputs. |
with_stdout_output with_file_output with_timeline_output with_pprof_output |
output | Print the result to stdout. Print the result to a file. Generate a timeline json file. Generate a pprof profile gzip file. |
order_by | order_by | Order the results by [name| depth| bytes| peak_bytes| residual_bytes| output_bytes| micros| accelerator_micros| cpu_micros| params| float_ops| occurrence]. |
select | select | Comma-separated list of attributes to show. Supported attributes: [bytes| peak_bytes| residual_bytes| output_bytes| micros| accelerator_micros| cpu_micros| params| float_ops| occurrence| tensor_value| device| op_types| input_shapes]. |
with_step | step | Show the stats of the this step when multiple steps of RunMetadata were added. By default, show the average of all steps. |
A simple demo¶
# Build a graph
tf.reset_default_graph()
x = [[1]*5 for i in range(2)] # 2x5 matrix
A = [[1]*2 for i in range(2)] # 2x2 matrix
tf_x = tf.constant(x)
tf_A = tf.constant(A)
tf_y = tf.matmul(tf_A, tf_x)
opts = POB.float_operation()
output = tf.profiler.profile(tf.get_default_graph(),
cmd='scope',
options=opts)
print("The total number of FLOPs is %d"%output.total_float_ops)
type(output)
How to use the result¶
The output of profiler is a Protocol Buffer, whose content is defined here
For people not familiar with it (such as me), its values are access by name directly.
Or you can convert it to a dict
by using this.
The basic keys are:
print("\n".join(output.DESCRIPTOR.fields_by_name.keys()))
And the values associated with those keys are:
compare_table = [(i, getattr(output,i)) for i in output.DESCRIPTOR.fields_by_name.keys()]
print(tabulate.tabulate(compare_table, headers=["Name","Value"]))
Other notes¶
If you are careful enough, you will find that the number of float operations seems a bit large.
For two matrices in $R^{j\times k}, R^{k\times l}$, it should be $(2k-1)jl$, whereas we have $2jkl$.
Because the profiler relies on the RegisterStatistics('flops')
defined with the operation, and the matmul
is defined here as we found above.
So be careful when you are sensitive to the result. Read more here for the disccusion about profiler's caveat in float operations.
Comments