graph::SetDefaultDevice with shared library build #136

dshawul · 2018-09-07T02:44:30Z

Hi,

I have problem using graph::SetDefaultDevice using the shared library build. I understand why the static build will not work for my example below (it does more than inference). But the tensorflow_cc shared library should work the same as one built with bazel. If I build the code from within tensorflow using bazel it works fine but if I use libtensorflow_cc with cmake it doesn't, why ?

#include "tensorflow/cc/client/client_session.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/graph/default_device.h"

int main() {
  using namespace tensorflow;
  using namespace tensorflow::ops;
  Scope root = Scope::NewRootScope();
  // Matrix A = [3 2; -1 0]
  auto A = Const(root, { {3.f, 2.f}, {-1.f, 0.f} }); 
  // Vector b = [3 5]
  auto b = Const(root, { {3.f, 5.f} }); 
  // v = Ab^T
  auto v = MatMul(root.WithOpName("v"), A, b, MatMul::TransposeB(true));

  GraphDef def;
  TF_CHECK_OK(root.ToGraphDef(&def));

  graph::SetDefaultDevice(false ? "/device:GPU:0" : "/cpu:0", &def);
  for (auto &node: *def.mutable_node()) {
        node.set_device("/cpu:0");
        std::cout << node.name() << " = '" << node.device() <<"'"<< std::endl;
  }
  std::cout << "=======================\n";

  std::vector<Tensor> outputs;
  ClientSession session(root);
  // Run and fetch v
  TF_CHECK_OK(session.Run({v}, &outputs));
  // Expect outputs[0] == [19; -3]
  LOG(INFO) << outputs[0].matrix<float>();
  return 0;
}

I get the following error when running the resulting program

2018-09-06 18:18:13.853316: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-09-06 18:18:13.856079: F /home/daniel/tensorflow_cc/example/example.cpp:27] Non-OK-status: session->Create(def) status: Not found: No attr named '/cpu:0' in NodeDef:
     [[Node: Const = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [2,2] values: [3 2][-1]...>, _device="/cpu:0"]()]]
     [[Node: Const = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [2,2] values: [3 2][-1]...>, _device="/cpu:0"]()]]
Aborted (core dumped)

The text was updated successfully, but these errors were encountered:

FloopCZ · 2018-09-09T20:02:15Z

Hi, does it work correctly for you, if you move

  GraphDef def;
  TF_CHECK_OK(root.ToGraphDef(&def));
  graph::SetDefaultDevice(false ? "/device:GPU:0" : "/cpu:0", &def);

right after

  Scope root = Scope::NewRootScope();

?

dshawul · 2018-09-09T20:40:39Z

Thanks! yes, it does work that way.
How do I use this fix for a program that loads a pb file (not construct it) ?

Edit: But it is not printing the nodes after that fix so it is actually not setting the device type properly.

And more importantly why compiling the original program (without your fix) from within tensorflow using bazel works but not when using a shared tensorflow library. I have also tried the same program from windows using tensorflow.dll files I got from here and got the same problem.

Could it be related to the following issues:
tensorflow/tensorflow#5379
tensorflow/tensorflow#16291

FloopCZ · 2018-09-14T09:31:42Z

Sorry, I am not really sure. You can try to freeze your graph before trying to load it as it will drop the assigned device from the individual nodes. It could be related to the issues you mention. Let us know if you manage to make it work.

FloopCZ · 2018-11-16T20:33:45Z

Hi, you can try the latest master that stands on TF 1.12 if you still have the issue. Feel free to reopen if it persists.

ttdd11 · 2019-03-25T20:01:27Z

@dshawul Were you able to resolve this?

ttdd11 · 2019-03-25T20:07:06Z

@FloopCZ I am having this issue with the bazel build not being able to call SetDefaultDevice. Is there a solution for this that you know of?

dshawul · 2019-03-25T20:07:34Z

@ttdd11 No, but I used a workaround of using bazel to compile the program directly with a BUILD file.
That works correctly with the above program, but if i try to use the shared build of tensorflow_cc it still doesn't work even with latest version of tensorflow.

ttdd11 · 2019-03-25T20:10:29Z

@dshawul Can you explain a bit more? Instead of linking the tensorflow_cc to you app you are building your app with bazel and the source?

dshawul · 2019-03-25T20:17:16Z

Yes, that is correct.
Since my app was simple, I took that easy solution as soon as i figured out how to construct a bazel build file for my app. Also, I didn't need to ship libtensorflow_cc.so file along with my app when building with bazel, which was a plus.
I have spent days looking for solutions before going the bazel way. For more details on why monolithic tensorflow builds have this problem, please refer to the links I gave above.

ttdd11 · 2019-03-25T20:23:17Z

Thanks for the advice. My application is unfortunately somewhat complicated and we required debugging regularly in visual studio. However I may need to do this given tensorflow has had this issue for a while and we require rtx support which means cuda 10 and later versions of tensorflow.

ttdd11 · 2019-03-26T12:15:19Z

@dshawul I think I am going to try what you are doing and just maintain two build. Can you provide some advice on using the BUILD file or copy some samples on how you got tensorflow source into the build file?

dshawul · 2019-03-26T16:05:57Z

First I put my app somewhere inside the tensorflow source tree, e.g tensorflow/cc/myapp. This is a must to build with bazel. Then I write a a BUILD file and put in the myapp directory. Example build file:

load("//tensorflow:tensorflow.bzl", "tf_cc_shared_object")

tf_cc_shared_object(
    name = "myapp.dll",
    srcs = [
	"app.cpp",
	"app.h",
    ],
    deps = [
        "//tensorflow/cc:cc_ops",
        "//tensorflow/cc:client_session",
        "//tensorflow/core:tensorflow"
    ],
    defines = [ "TENSORFLOW" ]
)

My app is actually a shared object not an exe. Put in all your *.cpp and *.h in the "srcs". You can define predifined macros in defines -- if you want for example a -DTENORFLOW to be passed to the compiler.

Then execute:

bazel build --config=opt --config=monolithic //tensorflow/cc/myapp:myapp.dll

Then you should get myapp.dll that is a standalone binary (not libtensorflow dep) in

tensorflow/bazel-bin/tensorflow/cc/myapp

Hope that helps

ttdd11 · 2019-03-26T17:35:28Z

@dshawul Okay well this could actually work. Just need to isolate all tensorflow to a few files and build them as a dll, then link to those to get the tensorflow working. Is this basically what you did?

dshawul · 2019-03-26T17:38:34Z

Note that you can directly build an exe (your app) with bazel. My app was a dll from the beginning. It is not required that you build a shared library for the workaround. Good luck.

ttdd11 · 2019-03-26T17:40:57Z

Okay sounds good. I'm just thinking for ease of debugging if I build a dll then I can link it and test other code pretty easily. Does this sound like it should work?

dshawul · 2019-03-26T18:18:34Z

Yes that sounds good. It is also good to have all tensorflow code in a separate module.

ttdd11 · 2019-03-26T19:40:22Z

@dshawul this has been very helpful. How did you configure tensorflow with different options (such as CUDA 10) before running your build file? Did you just run the configure.py first and then run the build as you described above?

ttdd11 · 2019-03-27T13:14:30Z

@dshawul I've gotten the lib to build and sadly and unable to use it. Here is my code for the lib. It's very simple I just want to see if the SetDefaultGraph works on windows.

This is my header:

#ifndef INFERENCELIB_H
#define INFERENCELIB_H

#ifdef TF_API
#define TF_API __declspec(dllexport)
#else
#define TF_API __declspec(dllimport)
#endif

//Standard includes
#include

//tensorflow includes
#include "tensorflow/cc/ops/const_op.h"
#include "tensorflow/cc/ops/image_ops.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/graph.pb.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/graph/default_device.h"
#include "tensorflow/core/graph/graph_def_builder.h"
#include "tensorflow/core/lib/core/errors.h"
#include "tensorflow/core/lib/core/stringpiece.h"
#include "tensorflow/core/lib/core/threadpool.h"
#include "tensorflow/core/lib/io/path.h"
#include "tensorflow/core/lib/strings/stringprintf.h"
#include "tensorflow/core/platform/env.h"
#include "tensorflow/core/platform/init_main.h"
#include "tensorflow/core/platform/logging.h"
#include "tensorflow/core/platform/types.h"
#include "tensorflow/core/public/session.h"
#include "tensorflow/core/util/command_line_flags.h"

class TF_API InferenceLib
{
public:
InferenceLib(const char* strGraphPath, bool& bStatus, const int nGPU = -1);
~InferenceLib();

private:
std::unique_ptrtensorflow::Session m_TFSession;
bool loadGraphAndSession(const char* strPathGraph, int nGPU = -1);
};

#endif //INFERENCELIB_H

This is the cpp:

#include "InferenceLib.h"
using tensorflow::Flag;
using tensorflow::Tensor;
using tensorflow::Status;
using tensorflow::string;
using tensorflow::int32;

InferenceLib::InferenceLib(const char* strPathGraph,
bool& bStatus,
const int nGPU /=-1/)
{
if (loadGraphAndSession(strPathGraph, nGPU))
{
bStatus = true;
}
}

bool InferenceLib::loadGraphAndSession(const char* strPathGraph, int nGPU /= -1/)
{
bool bReturn = true;
std::string path = strPathGraph;
tensorflow::GraphDef graph_def;
Status load_graph_status = ReadBinaryProto(tensorflow::Env::Default(), path, &graph_def);
if (!load_graph_status.ok())
{
bReturn = false;
}

tensorflow::SessionOptions sessOps;
if (-1 != nGPU)	//here asssigning the graph to a specific gpu, useful is building multiple instances of this 
{
	std::string strGPU = std::to_string(nGPU);
	tensorflow::graph::SetDefaultDevice("/gpu:" + strGPU, &graph_def);
	sessOps.config.set_allow_soft_placement(true);
	sessOps.config.set_log_device_placement(true);
	sessOps.config.mutable_gpu_options()->set_allow_growth(true);					//this line avoids large preallocations (I don't care about this)																					//sessOps.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(50); //100 should be defauly
}

m_TFSession.reset(tensorflow::NewSession(sessOps));
Status session_create_status = (*this->m_TFSession).Create(graph_def);

if (!session_create_status.ok())
{
	bReturn = false;
}
return bReturn;

}

InferenceLib::~InferenceLib()
{

}

My build file is:

load("//tensorflow:tensorflow.bzl", "tf_cc_shared_object")

tf_cc_shared_object(
name = "InfLib2.dll",
srcs = [
"InferenceLib.cpp",
"InferenceLib.h",
],
deps = [
"//tensorflow/cc:cc_ops",
"//tensorflow/cc:client_session",
"//tensorflow/core:tensorflow"
],
defines = [ "TENSORFLOW",
"TF_API"]
)

I run this in Powershell:
bazel build --config=opt --config=monolithic //tensorflow/cc/InfLib2:InfLib2.dll

And the lib builds. But I get link errors when calling to any function and the .lib seems to be corrupted (at least depends says that its not a valid 32 or 64 bit lib).

Does this look correct? I've been on this for about a month now and don't seem to be any closer. Any help would be greatly appreciated. The symbols I need are actually in the dll so I'm thinking its how I built it.

dshawul · 2019-03-27T14:00:55Z

I think you need to put more dependencies 'deps' for your case that will cover all your tensorflow includes. I had only three because i only included these:

#ifdef TENSORFLOW
#include "tensorflow/core/platform/env.h"
#include "tensorflow/core/public/session.h"
#include "tensorflow/core/graph/default_device.h"
#endif

You have many more includes than me. A good test to see if bazel solution works is to compile the example I provided in my original post by putting it in //tensorflow/cc/example/example.cc and using
the below build file

   load("//tensorflow:tensorflow.bzl", "tf_cc_binary")
   
   tf_cc_binary(
       name = "example",
       srcs = ["example.cc"],
       deps = [
           "//tensorflow/cc:cc_ops",
           "//tensorflow/cc:client_session",
           "//tensorflow/core:tensorflow",
      ],
   )

ttdd11 · 2019-03-27T14:08:49Z

I don't think I need all those includes. I will try again without them. Here is the link error for reference:

__cdecl tensorflow::core::GetVarint32PtrFallback unresolved symbol

ttdd11 · 2019-03-27T17:37:38Z

@dshawul I think this is working now besides that one symbol. It is located in source\tensorflow\core\lib\core. I look at the BUILD file for core and I should be able to add "//tensorflow/core:lib" to get those symbols but that did not work. Are you running inference from a loaded .pb with your app? Seems like the trouble line is when I use TensorShape which your example doesn't use. Any ideas on how to get this tensorflow::core::GetVarint32PtrFallback symbol in the lib?

FloopCZ closed this as completed Nov 16, 2018

mrdaliri mentioned this issue Jul 2, 2020

Openspiel+Bazel+Tensorflow build failure google-deepmind/open_spiel#172

Closed

lancelot-ch mentioned this issue May 30, 2022

Duplicate #136 code and meet make errors #293

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

graph::SetDefaultDevice with shared library build #136

graph::SetDefaultDevice with shared library build #136

dshawul commented Sep 7, 2018

FloopCZ commented Sep 9, 2018

dshawul commented Sep 9, 2018 •

edited

Loading

FloopCZ commented Sep 14, 2018

FloopCZ commented Nov 16, 2018 •

edited

Loading

ttdd11 commented Mar 25, 2019

ttdd11 commented Mar 25, 2019

dshawul commented Mar 25, 2019

ttdd11 commented Mar 25, 2019

dshawul commented Mar 25, 2019

ttdd11 commented Mar 25, 2019

ttdd11 commented Mar 26, 2019

dshawul commented Mar 26, 2019

ttdd11 commented Mar 26, 2019

dshawul commented Mar 26, 2019

ttdd11 commented Mar 26, 2019

dshawul commented Mar 26, 2019

ttdd11 commented Mar 26, 2019

ttdd11 commented Mar 27, 2019

dshawul commented Mar 27, 2019

ttdd11 commented Mar 27, 2019

ttdd11 commented Mar 27, 2019 •

edited

Loading

graph::SetDefaultDevice with shared library build #136

graph::SetDefaultDevice with shared library build #136

Comments

dshawul commented Sep 7, 2018

FloopCZ commented Sep 9, 2018

dshawul commented Sep 9, 2018 • edited Loading

FloopCZ commented Sep 14, 2018

FloopCZ commented Nov 16, 2018 • edited Loading

ttdd11 commented Mar 25, 2019

ttdd11 commented Mar 25, 2019

dshawul commented Mar 25, 2019

ttdd11 commented Mar 25, 2019

dshawul commented Mar 25, 2019

ttdd11 commented Mar 25, 2019

ttdd11 commented Mar 26, 2019

dshawul commented Mar 26, 2019

ttdd11 commented Mar 26, 2019

dshawul commented Mar 26, 2019

ttdd11 commented Mar 26, 2019

dshawul commented Mar 26, 2019

ttdd11 commented Mar 26, 2019

ttdd11 commented Mar 27, 2019

dshawul commented Mar 27, 2019

ttdd11 commented Mar 27, 2019

ttdd11 commented Mar 27, 2019 • edited Loading

dshawul commented Sep 9, 2018 •

edited

Loading

FloopCZ commented Nov 16, 2018 •

edited

Loading

ttdd11 commented Mar 27, 2019 •

edited

Loading