Reflection on Tensorflow Documentation by a short user journey

Reflection on Tensorflow Documentation by a short user journey

Tensorflow community keeps improving to address problems with Tensorflow.

At the time of TF 2.0 release, I sitll found it is very painful to follow the TF documentation to get things done. Here I write down some random notes during my short journey to use TF Lite for the quantization. Also I hope this tour can guide some other people to find an easier life when use TF.

In [1]:
import tensorflow as tf
import numpy as np
tf.__version__
Out[1]:
'1.13.1'

Honeymoon

The first glance of the TF Lite documentation delights me by its well-organized structure and detailed documentation.

I followed the tutorial on the model conversion without worrying about the different ways of saving a TF model. And quickly, I got what I want - save a quantized model:

with tf.Graph().as_default():
  img = tf.placeholder(name="img", dtype=tf.float32, shape=(1, 64, 64, 3))
  const = tf.constant([1., 2., 3.]) + tf.constant([1., 4., 4.])
  val = img + const
  out = tf.fake_quant_with_min_max_args(val, min=0., max=1., name="output")

  with tf.Session() as sess:
    converter = tf.lite.TFLiteConverter.from_session(sess, [img], [out])
    converter.inference_type = tf.lite.constants.QUANTIZED_UINT8
    input_arrays = converter.get_input_arrays()
    converter.quantized_input_stats = {input_arrays[0] : (0., 1.)}  # mean, std_dev
    tflite_model = converter.convert()
    open("converted_model.tflite", "wb").write(tflite_model)

Now, the pain starts - I could not find a solution to restore the quantized model to produce the right prediction!

And during my exploration on solving the challenge (yes, it is a challenge given the many misleading and confusing messages in the official documentation), I even start wondering whether TF community welcome users or it is just a propaganda tool for Google and no true intention for sharing.

Now let me show you my user journey towards the evil side of TF.

A simplified model, but wait ...

OK, the first step, let me use a simpler model as follows, a vector with a dimension of $1\times3$ adds to a constant vector $[1,2,3]$.

In [2]:
with tf.Graph().as_default():
  img = tf.placeholder(name="img", dtype=tf.float32, shape=(1, 3))
  const = tf.constant([1., 2., 3.])
  val = img + const
  out = tf.fake_quant_with_min_max_args(val, min=0., max=1., name="output")
  
  with tf.Session() as sess:
    v = sess.run(out, {img:[[1,2,3]]})
v
Out[2]:
array([[1., 1., 1.]], dtype=float32)

OK, a quick glance hints the problem of the fake_quant_with_min_max_args(). Why using it in the origin code esp. the constants are all larger than 1? Further digging into the documentation, I found there are 6 different functions all start with fake_quant_with_min_max. If you are interested in understanding how to use them properly, follow the official documentation or check the stack-overflow explanation.

I will skip it for now, as my goal is to recover the quantized model first. And a quantized output with 0 and 1 clearly increases my difficulty to debug (or reduce the chance to find a bug).

For now, I replace it with an identity function and label the output name correspondingly. And because that, I add default_ranges_stats in below to quantify the range of the internal values, (see reference).

In [0]:
with tf.Graph().as_default():
  img = tf.placeholder(name="img", dtype=tf.float32, shape=(1, 3))
  const = tf.constant([1., 2., 3.])
  val = img + const
  out = tf.identity(val, name='output') # tf.fake_quant_with_min_max_args(val, min=0., max=1., name="output")

  with tf.Session() as sess:
    converter = tf.lite.TFLiteConverter.from_session(sess, [img], [out])
    converter.inference_type = tf.lite.constants.QUANTIZED_UINT8
    input_arrays = converter.get_input_arrays()
    converter.quantized_input_stats = {input_arrays[0] : (0, 1.)}  # mean, std_dev
    converter.default_ranges_stats = (0, 3)
    tflite_model = converter.convert()
    open("example1.tflite", "wb").write(tflite_model)

Originally, I searched for how to restore the model in Python under the Inference, but the reality is that it is only available under the Python API section.

In [4]:
itp_full = tf.lite.Interpreter(model_path="example1.tflite")
itp_full.allocate_tensors()
itp_full.get_input_details(), itp_full.get_output_details()
Out[4]:
([{'dtype': numpy.uint8,
   'index': 1,
   'name': 'img',
   'quantization': (1.0, 0),
   'shape': array([1, 3], dtype=int32)}],
 [{'dtype': numpy.uint8,
   'index': 2,
   'name': 'output',
   'quantization': (0.0117647061124444, 0),
   'shape': array([1, 3], dtype=int32)}])
In [5]:
imgx = [[2,1,0]]
itp_full.set_tensor(1, imgx)
itp_full.invoke()
res=  itp_full.get_tensor(2)
res
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-138418d20125> in <module>()
      1 imgx = [[2,1,0]]
----> 2 itp_full.set_tensor(1, imgx)
      3 itp_full.invoke()
      4 res=  itp_full.get_tensor(2)
      5 res

/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/interpreter.py in set_tensor(self, tensor_index, value)
    173       ValueError: If the interpreter could not set the tensor.
    174     """
--> 175     self._interpreter.SetTensor(tensor_index, value)
    176 
    177   def resize_tensor_input(self, input_index, tensor_size):

/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/interpreter_wrapper/tensorflow_wrap_interpreter_wrapper.py in SetTensor(self, i, value)
    134 
    135     def SetTensor(self, i, value):
--> 136         return _tensorflow_wrap_interpreter_wrapper.InterpreterWrapper_SetTensor(self, i, value)
    137 
    138     def GetTensor(self, i):

ValueError: Cannot set tensor: Got tensor of type 4 but expected type 3 for input 1 

Comparing with the input_details in the previous block, it is might be clear to the reader already that the type is wrong. But could TF kindly provide the type without any reference to find what's the hack the type 3 and 4 are???

A simple search guides me to this issue which states the type " internal TFLite type casted from C++ type". Fine, but still no clue on what are the... And beneath that, Onetaken provided a URL, which leads to 404..

Fine, I can spend an hour to dig into the details, but let's get back to the theme of this article -

In [6]:
imgx = np.array([[2,1,0]]).astype(np.uint8) # cast to uint8
itp_full.set_tensor(1, imgx)
itp_full.invoke()
res=  itp_full.get_tensor(2)
res
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-6-11989c7451e0> in <module>()
      1 imgx = np.array([[2,1,0]]).astype(np.uint8) # cast to uint8
      2 itp_full.set_tensor(1, imgx)
----> 3 itp_full.invoke()
      4 res=  itp_full.get_tensor(2)
      5 res

/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/interpreter.py in invoke(self)
    274       ValueError: When the underlying interpreter fails raise ValueError.
    275     """
--> 276     self._ensure_safe()
    277     self._interpreter.Invoke()
    278 

/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/interpreter.py in _ensure_safe(self)
     99       in the interpreter in the form of a numpy array or slice. Be sure to
    100       only hold the function returned from tensor() if you are using raw
--> 101       data access.""")
    102 
    103   def _get_tensor_details(self, tensor_index):

RuntimeError: There is at least 1 reference to internal data
      in the interpreter in the form of a numpy array or slice. Be sure to
      only hold the function returned from tensor() if you are using raw
      data access.

What's the problem?? It follows the official documentation Anyway, found a solution on Stack Overflow

Again, could TF documentation be more specific on the expected type of set_tensor()?

In [7]:
imgx = tf.convert_to_tensor([[2,1,0]], np.uint8) # cast to uint8
itp_full.set_tensor(1, imgx)
itp_full.invoke()
res=  itp_full.get_tensor(2)
res
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-da5825085942> in <module>()
      1 imgx = tf.convert_to_tensor([[2,1,0]], np.uint8) # cast to uint8
----> 2 itp_full.set_tensor(1, imgx)
      3 itp_full.invoke()
      4 res=  itp_full.get_tensor(2)
      5 res

/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/interpreter.py in set_tensor(self, tensor_index, value)
    173       ValueError: If the interpreter could not set the tensor.
    174     """
--> 175     self._interpreter.SetTensor(tensor_index, value)
    176 
    177   def resize_tensor_input(self, input_index, tensor_size):

/usr/local/lib/python3.6/dist-packages/tensorflow/lite/python/interpreter_wrapper/tensorflow_wrap_interpreter_wrapper.py in SetTensor(self, i, value)
    134 
    135     def SetTensor(self, i, value):
--> 136         return _tensorflow_wrap_interpreter_wrapper.InterpreterWrapper_SetTensor(self, i, value)
    137 
    138     def GetTensor(self, i):

ValueError: Cannot set tensor: Got tensor of type 5 but expected type 3 for input 1 

Still, weird problem. No luck?

In [8]:
# if failed, run twice
itp_full = tf.lite.Interpreter(model_path="example1.tflite")
itp_full.allocate_tensors()
imgx = np.array([[2,1,0]]).astype(np.uint8) # cast to uint8
itp_full.set_tensor(1, imgx)
itp_full.invoke()
res=  itp_full.get_tensor(2)
res
Out[8]:
array([[255, 255, 255]], dtype=uint8)

What just happened? Any clue is more than welcome!

OK, let's decipher the output. As we are in the wild of the TF documentation, I will briefly summarize the conversion - Long story short, here is the thing:

$ f = \frac{i-m}{s}, \text{ or } i = f\times s+m$

where $f$ is the float value (what you need), $i$ is the quantized value (what output), and $s,m$ are quantization value in the get_output_details. Notice the order, it is $s,m$.

In [0]:
v = itp_full.get_output_details()[0]['quantization']
print(v)
(res-v[1])*v[0]  # expect to see [3,3,3]
(0.0117647061124444, 0)
Out[0]:
array([[3.00000006, 3.00000006, 3.00000006]])

The above conversion is also explained in the post-training-quantization,

Great, it seems that we finished our job.
Think for one more second, did we miss something? What happens if the input/output is outside of the above scope, or if our input is in range of $[0,1]$ when as the network is trained?

In [0]:
itp_full.allocate_tensors()
imgx = np.array([[-1,0,1]]).astype(np.uint8) # cast to uint8
itp_full.set_tensor(1, imgx)
itp_full.invoke()
res=  itp_full.get_tensor(2)
v = itp_full.get_output_details()[0]['quantization']
(res-v[1])*v[0]  # float-percision result should be [0,2,4] !!
Out[0]:
array([[3.00000006, 2.00000004, 3.00000006]])

The right way is to set the quantized_input_stats() correspondingly. However, it is fairly misleading with the "mean, std_dev" appended in the tutorial...

Until I found out that they actually the mean_valuesand std_dev_values in command line reference with the true math formula attached.

And another piece of a potential problem - notice that the order of the quantization is $s,m$, whereas, in the quantized_input_stats, they are $m',s'$. And their relation is:

$s'=1/s, m=m'$.

If you felt confusing, take this, your original input data $f$ is in the range $f_{min}$, $f_{max}$, and you want to map them into $[0,255]$, therefore:

$s' = \frac{255}{f_{max}-f_{min}}$

and

$m = -f_{min}/s'$

In [0]:
with tf.Graph().as_default():
  img = tf.placeholder(name="img", dtype=tf.float32, shape=(1, 3))
  const = tf.constant([1., 2., 3.])
  val = img + const
  out = tf.identity(val, name='output') # tf.fake_quant_with_min_max_args(val, min=0., max=1., name="output")

  with tf.Session() as sess:
    converter = tf.lite.TFLiteConverter.from_session(sess, [img], [out])
    converter.inference_type = tf.lite.constants.QUANTIZED_UINT8
    input_arrays = converter.get_input_arrays()
    converter.quantized_input_stats = {input_arrays[0] : (128, 127.)}  # mean, std_dev
    converter.default_ranges_stats = (0, 3)
    tflite_model = converter.convert()
    open("example2.tflite", "wb").write(tflite_model)
In [0]:
itp_pruned = tf.lite.Interpreter(model_path="example2.tflite")
v_input = itp_pruned.get_input_details()[0]['quantization']
v_output = itp_pruned.get_output_details()[0]['quantization']
imgx = np.array([[0.1, 0.1, 0.1]])
imgb = (imgx / v_input[0] + v_input[1]).astype(np.uint8)

itp_pruned.allocate_tensors()
itp_pruned.set_tensor(1, imgb)
itp_pruned.invoke()
res=  itp_pruned.get_tensor(2)
(res-v_output[1])*v_output[0]  # float-percision result should be [1.1, 2.1, 3.1]
Out[0]:
array([[1.09411767, 2.09411769, 3.00000006]])

Now we successfully recover the first two values. And I believe that you can also figure out that the last value is clipped by 3, which is defined in converter.default_ranges_stats = (0, 3) in the previous block.

Last but not the least, the tutorial also fails -

In [0]:
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-42-c2316bf03cdb> in <module>()
----> 1 converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]

AttributeError: module 'tensorflow._api.v1.lite' has no attribute 'Optimize'

Summary

The well-documented first-half of tutorial intrigued me to try TensorFlow Lite, however, the second half makes me feeling like taught to sail on land and throw to a boat in the Pacific ocean directly. With this journey, I want to highlight a couple problems to use TensorFlow from scratch. The problems are not limited to TF Lite, they also represent the majority types of problems I encountered daily with Tensorflow.

  1. Version consistency - Although it is getting better to enforce version compatibility, TensorFlow is still poor in that aspect. Could TF provide an installation matrix similar PyTorch to simplify the installation process?
  2. Documentation - The documentation needs further cleanup and improve. Don't just provide a link to the source code. A good example in TF looks like TFLiteConverter. Not only contains explanations, but also provide examples. And bad one like this points to another function without a URL.
  3. Be more specific on type hint. TensorFlow is complicated due to the behind-the-scene computational graph. It is even worse as we got lost about the expected input and correct order to execute things (e.g. Session, Graph, Namespace, etc.). It is crucial to document the input type and the explanation correspondingly for each function.
  4. Clean out-dated documentation. Along with the version control, the mismatch between the code and documentation is still surprisingly bad. For instance, we failed to access the optimize variable in the latest tutorial with the latest version.
  5. Create a friendly Stack Overflow community. Just check the issue page of TensorFlow on GitHub. How many questions are directed to Stack Overflow? But how many questions and solutions on the Stack Overflow are outdated? The pain to using TF is also tightly linked with the difficulty to find a suitable solution. Given the scale of Google, it should not be a challenge to maintain a good community with reliable solutions. If problems are always directed to other parties without reliable solutions, users will probably leave the tool for other replacement.

Acknowledgement

I want to thank Google providing the colab for free. Without it, I cannot easily get Tensorflow and CUDA work coherently without any trouble.

Comments