Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepSpeech exporting INT64 instead of expected INT32 into SparseToDence opp, not friendly to mobile TensorFlow. #1309

Closed
evangel76 opened this issue Mar 19, 2018 · 11 comments
Labels

Comments

@evangel76
Copy link

evangel76 commented Mar 19, 2018

tf.sparse_to_dense should be casted to be only INT32 parameters instead of INT64 default

[tf.sparse_tensor_to_dense(sparse_tensor) for sparse_tensor in decoded], name='output_node')

I used Tensorflow 1.6 on Ubuntu 16.04

The lack of casting makes the model import crash on iOS and android, and requires to post processing to be mobile friendly.
[tf.sparse_tensor_to_dense(sparse_tensor) for sparse_tensor in decoded], name='output_node')

should be something like this:

[tf.sparse_tensor_to_dense(tf.to_int32(sparse_tensor)) for sparse_tensor in decoded], name='output_node')

There are probably better solution that a brutal C like casting.

Because of it, the Import of the model under iOS gives:

No OpKernel was registered to support Op 'SparseToDense' with these attrs.  Registered devices: [CPU], Registered kernels:
  device='CPU'; T in [DT_INT32]; Tindices in [DT_INT32]
  device='CPU'; T in [DT_INT32]; Tindices in [DT_INT64]
  device='CPU'; T in [DT_FLOAT]; Tindices in [DT_INT32]
  device='CPU'; T in [DT_FLOAT]; Tindices in [DT_INT64]
  device='CPU'; T in [DT_BOOL]; Tindices in [DT_INT32]
  device='CPU'; T in [DT_BOOL]; Tindices in [DT_INT64]
  device='CPU'; T in [DT_STRING]; Tindices in [DT_INT32]
  device='CPU'; T in [DT_STRING]; Tindices in [DT_INT64]

[[Node: SparseToDense = SparseToDense[T=DT_INT64, Tindices=DT_INT64, validate_indices=true](CTCBeamSearchDecoder, CTCBeamSearchDecoder:2, CTCBeamSearchDecoder:1, SparseToDense/default_value)]]
@lissyx
Copy link
Collaborator

lissyx commented Mar 19, 2018

See also #1095

@evangel76
Copy link
Author

I am running my own training of Deepspeech on the side, I got to pretty high accuracy on desktop.
I am attempting to move the all thing to iOS, and I got it to work, but I had to cast and remove the IS_MOBILE_PLATFORM, because it does removes "opps" ( features of TensorFlows) you need to run DeepSpeech.
I have posted the steps here: https://github.com/evangel76/TensorFlow15CocoPod

@lissyx
Copy link
Collaborator

lissyx commented Mar 19, 2018

@evangel76 Right, thanks. Then I would suggest you try to work on the streaming branch https://github.com/mozilla/DeepSpeech/tree/streaming-inference that @reuben is going to land, because you are loosing your time with that right now :-). Doing that kind of hack on the current model is a dead-end, and if we need this kind of changes, then we would rather do it on the streaming model and it might be of interest to @reuben if you can experiment on iOS, because none of us has time for that right now :).

FYI, with the streaming model, you can see also #1224, I've run some experiments with tococo and only TensorFlowMinimum is missing, and there's an upstream issue: tensorflow/tensorflow#17289

I'm going to close this, but really, thanks for hacking on iOS, we are interested in that :)

@lissyx lissyx closed this as completed Mar 19, 2018
@lissyx lissyx added the invalid label Mar 19, 2018
@lissyx
Copy link
Collaborator

lissyx commented Mar 19, 2018

@evangel76 Also, you might want to come and discuss with us on IRC (#machinelearning on irc.mozilla.org) and / or on Discourse: https://discourse.mozilla.org/c/deep-speech, I'd hate that you burn your time for nothing: contributor's time has too much value to me, especially when you are hacking stuff that are that useful ! So again, really, thanks for your feedback and issue :-)

@evangel76
Copy link
Author

That is an odd reply ... I got the all thing working pretty accurately on iOS, saying it is a dead-end is strange at best, but well, since we are sharing opinions, here is mine:
I know what you are referring to, you fear the memory foot print of the all thing. (There are programmers like me to solve those problems, performance experts are for that)
I am pretty happy with my fix, I 'll keep going in my direction, it works great for what I need it for.

@lissyx
Copy link
Collaborator

lissyx commented Mar 19, 2018

@evangel76 I never said it was not good, I said that the code you need to change is going away:

DeepSpeech/DeepSpeech.py

Lines 1720 to 1763 in 1926bce

def create_inference_graph(batch_size=1, n_steps=16, use_new_decoder=False):
# Input tensor will be of shape [batch_size, n_steps, n_input + 2*n_input*n_context]
input_tensor = tf.placeholder(tf.float32, [batch_size, n_steps, n_input + 2*n_input*n_context], name='input_node')
seq_length = tf.placeholder(tf.int32, [batch_size], name='input_lengths')
previous_state_c = variable_on_worker_level('previous_state_c', [batch_size, n_cell_dim], initializer=None)
previous_state_h = variable_on_worker_level('previous_state_h', [batch_size, n_cell_dim], initializer=None)
previous_state = tf.contrib.rnn.LSTMStateTuple(previous_state_c, previous_state_h)
rnn_impl = partial(rnn_impl_inference, previous_state=previous_state)
logits, layers = BiRNN(batch_x=input_tensor,
seq_length=seq_length if FLAGS.use_seq_length else None,
dropout=no_dropout,
batch_size=batch_size,
n_steps=n_steps,
rnn_impl=rnn_impl)
fw_cell = layers['fw_cell']
new_state_c, new_state_h = layers['rnn_output_state']
# Initial zero state
zero_state_c, zero_state_h = fw_cell.zero_state(batch_size, tf.float32)
zero_state_c = tf.identity(zero_state_c, name='zero_state_c')
zero_state_h = tf.identity(zero_state_h, name='zero_state_h')
initialize_c = tf.assign(previous_state_c, zero_state_c)
initialize_h = tf.assign(previous_state_h, zero_state_h)
initialize_state = tf.group(initialize_c, initialize_h, name='initialize_state')
with tf.control_dependencies([tf.assign(previous_state_c, new_state_c), tf.assign(previous_state_h, new_state_h)]):
logits = tf.identity(logits)
return (
{
'input': input_tensor,
'input_lengths': seq_length,
},
{
'outputs': logits,
'initialize_state': initialize_state,
}
)
there is no use of sparse_to_dense anymore. It's not just about memory footprint, it's about complexity of the network.

So, if you use that branch for your work, it should even be easier.

@evangel76
Copy link
Author

Fair, thanks

@lissyx
Copy link
Collaborator

lissyx commented Mar 20, 2018

@evangel76 Just to give more context: changing this datatype might have impact on the recognition rate itself (or it might not): this is something that would require precise investigation to ensure we are not breaking anything. Now, we still have several pending benchmarks before we can release v0.2.0, a version where we do not want to change the model. Then, streaming will arrive in v0.3.0. Still, we are very interested in any result you get :)

@evangel76
Copy link
Author

I started to replace the code with the streaming version , and replugging Metal acceleration into it for iOS.

@lock
Copy link

lock bot commented Jan 3, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Jan 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants