DeepSpeech exporting INT64 instead of expected INT32 into SparseToDence opp, not friendly to mobile TensorFlow. #1309

evangel76 · 2018-03-19T19:18:25Z

tf.sparse_to_dense should be casted to be only INT32 parameters instead of INT64 default

Line 1697 in d68fde8

    
           [tf.sparse_tensor_to_dense(sparse_tensor) for sparse_tensor in decoded], name='output_node')

I used Tensorflow 1.6 on Ubuntu 16.04

The lack of casting makes the model import crash on iOS and android, and requires to post processing to be mobile friendly.
[tf.sparse_tensor_to_dense(sparse_tensor) for sparse_tensor in decoded], name='output_node')

should be something like this:

[tf.sparse_tensor_to_dense(tf.to_int32(sparse_tensor)) for sparse_tensor in decoded], name='output_node')

There are probably better solution that a brutal C like casting.

Because of it, the Import of the model under iOS gives:

No OpKernel was registered to support Op 'SparseToDense' with these attrs. Registered devices: [CPU], Registered kernels:
device='CPU'; T in [DT_INT32]; Tindices in [DT_INT32]
device='CPU'; T in [DT_INT32]; Tindices in [DT_INT64]
device='CPU'; T in [DT_FLOAT]; Tindices in [DT_INT32]
device='CPU'; T in [DT_FLOAT]; Tindices in [DT_INT64]
device='CPU'; T in [DT_BOOL]; Tindices in [DT_INT32]
device='CPU'; T in [DT_BOOL]; Tindices in [DT_INT64]
device='CPU'; T in [DT_STRING]; Tindices in [DT_INT32]
device='CPU'; T in [DT_STRING]; Tindices in [DT_INT64]

[[Node: SparseToDense = SparseToDense[T=DT_INT64, Tindices=DT_INT64, validate_indices=true](CTCBeamSearchDecoder, CTCBeamSearchDecoder:2, CTCBeamSearchDecoder:1, SparseToDense/default_value)]]

The text was updated successfully, but these errors were encountered:

lissyx · 2018-03-19T19:27:32Z

Thanks. To my knowledge, iOS, Android (and RPi3) builds are defining IS_MOBILE_PLATFORM that makes the SparseToDense op happier:

So i'm curious to know more about your build steps.

lissyx · 2018-03-19T19:31:24Z

See also #1095

evangel76 · 2018-03-19T19:37:00Z

I am running my own training of Deepspeech on the side, I got to pretty high accuracy on desktop.
I am attempting to move the all thing to iOS, and I got it to work, but I had to cast and remove the IS_MOBILE_PLATFORM, because it does removes "opps" ( features of TensorFlows) you need to run DeepSpeech.
I have posted the steps here: https://github.com/evangel76/TensorFlow15CocoPod

lissyx · 2018-03-19T19:46:17Z

@evangel76 Right, thanks. Then I would suggest you try to work on the streaming branch https://github.com/mozilla/DeepSpeech/tree/streaming-inference that @reuben is going to land, because you are loosing your time with that right now :-). Doing that kind of hack on the current model is a dead-end, and if we need this kind of changes, then we would rather do it on the streaming model and it might be of interest to @reuben if you can experiment on iOS, because none of us has time for that right now :).

FYI, with the streaming model, you can see also #1224, I've run some experiments with tococo and only TensorFlowMinimum is missing, and there's an upstream issue: tensorflow/tensorflow#17289

I'm going to close this, but really, thanks for hacking on iOS, we are interested in that :)

lissyx · 2018-03-19T19:57:16Z

@evangel76 Also, you might want to come and discuss with us on IRC (#machinelearning on irc.mozilla.org) and / or on Discourse: https://discourse.mozilla.org/c/deep-speech, I'd hate that you burn your time for nothing: contributor's time has too much value to me, especially when you are hacking stuff that are that useful ! So again, really, thanks for your feedback and issue :-)

evangel76 · 2018-03-19T20:33:03Z

That is an odd reply ... I got the all thing working pretty accurately on iOS, saying it is a dead-end is strange at best, but well, since we are sharing opinions, here is mine:
I know what you are referring to, you fear the memory foot print of the all thing. (There are programmers like me to solve those problems, performance experts are for that)
I am pretty happy with my fix, I 'll keep going in my direction, it works great for what I need it for.

lissyx · 2018-03-19T21:36:40Z

@evangel76 I never said it was not good, I said that the code you need to change is going away:

DeepSpeech/DeepSpeech.py

Lines 1720 to 1763 in 1926bce

    
           def create_inference_graph(batch_size=1, n_steps=16, use_new_decoder=False): 
        
               # Input tensor will be of shape [batch_size, n_steps, n_input + 2*n_input*n_context] 
        
               input_tensor = tf.placeholder(tf.float32, [batch_size, n_steps, n_input + 2*n_input*n_context], name='input_node') 
        
               seq_length = tf.placeholder(tf.int32, [batch_size], name='input_lengths') 
        
               previous_state_c = variable_on_worker_level('previous_state_c', [batch_size, n_cell_dim], initializer=None) 
        
               previous_state_h = variable_on_worker_level('previous_state_h', [batch_size, n_cell_dim], initializer=None) 
        
               previous_state = tf.contrib.rnn.LSTMStateTuple(previous_state_c, previous_state_h) 
        
               rnn_impl = partial(rnn_impl_inference, previous_state=previous_state) 
        
               logits, layers = BiRNN(batch_x=input_tensor, 
        
                                      seq_length=seq_length if FLAGS.use_seq_length else None, 
        
                                      dropout=no_dropout, 
        
                                      batch_size=batch_size, 
        
                                      n_steps=n_steps, 
        
                                      rnn_impl=rnn_impl) 
        
               fw_cell = layers['fw_cell'] 
        
               new_state_c, new_state_h = layers['rnn_output_state'] 
        
               # Initial zero state 
        
               zero_state_c, zero_state_h = fw_cell.zero_state(batch_size, tf.float32) 
        
               zero_state_c = tf.identity(zero_state_c, name='zero_state_c') 
        
               zero_state_h = tf.identity(zero_state_h, name='zero_state_h') 
        
               initialize_c = tf.assign(previous_state_c, zero_state_c) 
        
               initialize_h = tf.assign(previous_state_h, zero_state_h) 
        
               initialize_state = tf.group(initialize_c, initialize_h, name='initialize_state') 
        
               with tf.control_dependencies([tf.assign(previous_state_c, new_state_c), tf.assign(previous_state_h, new_state_h)]): 
        
                   logits = tf.identity(logits) 
        
               return ( 
        
                   { 
        
                       'input': input_tensor, 
        
                       'input_lengths': seq_length, 
        
                   }, 
        
                   { 
        
                       'outputs': logits, 
        
                       'initialize_state': initialize_state, 
        
                   } 
        
               )

there is no use of sparse_to_dense anymore. It's not just about memory footprint, it's about complexity of the network.

So, if you use that branch for your work, it should even be easier.

evangel76 · 2018-03-20T01:29:03Z

Fair, thanks

lissyx · 2018-03-20T15:06:02Z

@evangel76 Just to give more context: changing this datatype might have impact on the recognition rate itself (or it might not): this is something that would require precise investigation to ensure we are not breaking anything. Now, we still have several pending benchmarks before we can release v0.2.0, a version where we do not want to change the model. Then, streaming will arrive in v0.3.0. Still, we are very interested in any result you get :)

evangel76 · 2018-05-28T04:57:02Z

I started to replace the code with the streaming version , and replugging Metal acceleration into it for iOS.

lock · 2019-01-03T00:52:55Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

lissyx closed this as completed Mar 19, 2018

lissyx added the invalid label Mar 19, 2018

lock bot locked and limited conversation to collaborators Jan 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSpeech exporting INT64 instead of expected INT32 into SparseToDence opp, not friendly to mobile TensorFlow. #1309

DeepSpeech exporting INT64 instead of expected INT32 into SparseToDence opp, not friendly to mobile TensorFlow. #1309

evangel76 commented Mar 19, 2018 •

edited

Loading

lissyx commented Mar 19, 2018

lissyx commented Mar 19, 2018

evangel76 commented Mar 19, 2018

lissyx commented Mar 19, 2018

lissyx commented Mar 19, 2018 •

edited

Loading

evangel76 commented Mar 19, 2018

lissyx commented Mar 19, 2018 •

edited

Loading

evangel76 commented Mar 20, 2018

lissyx commented Mar 20, 2018

evangel76 commented May 28, 2018

lock bot commented Jan 3, 2019

DeepSpeech exporting INT64 instead of expected INT32 into SparseToDence opp, not friendly to mobile TensorFlow. #1309

DeepSpeech exporting INT64 instead of expected INT32 into SparseToDence opp, not friendly to mobile TensorFlow. #1309

Comments

evangel76 commented Mar 19, 2018 • edited Loading

lissyx commented Mar 19, 2018

lissyx commented Mar 19, 2018

evangel76 commented Mar 19, 2018

lissyx commented Mar 19, 2018

lissyx commented Mar 19, 2018 • edited Loading

evangel76 commented Mar 19, 2018

lissyx commented Mar 19, 2018 • edited Loading

evangel76 commented Mar 20, 2018

lissyx commented Mar 20, 2018

evangel76 commented May 28, 2018

lock bot commented Jan 3, 2019

evangel76 commented Mar 19, 2018 •

edited

Loading

lissyx commented Mar 19, 2018 •

edited

Loading

lissyx commented Mar 19, 2018 •

edited

Loading