Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graph::SetDefaultDevice with shared library build #136

Closed
dshawul opened this issue Sep 7, 2018 · 21 comments
Closed

graph::SetDefaultDevice with shared library build #136

dshawul opened this issue Sep 7, 2018 · 21 comments

Comments

@dshawul
Copy link

dshawul commented Sep 7, 2018

Hi,

I have problem using graph::SetDefaultDevice using the shared library build. I understand why the static build will not work for my example below (it does more than inference). But the tensorflow_cc shared library should work the same as one built with bazel. If I build the code from within tensorflow using bazel it works fine but if I use libtensorflow_cc with cmake it doesn't, why ?

#include "tensorflow/cc/client/client_session.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/graph/default_device.h"

int main() {
  using namespace tensorflow;
  using namespace tensorflow::ops;
  Scope root = Scope::NewRootScope();
  // Matrix A = [3 2; -1 0]
  auto A = Const(root, { {3.f, 2.f}, {-1.f, 0.f} }); 
  // Vector b = [3 5]
  auto b = Const(root, { {3.f, 5.f} }); 
  // v = Ab^T
  auto v = MatMul(root.WithOpName("v"), A, b, MatMul::TransposeB(true));

  GraphDef def;
  TF_CHECK_OK(root.ToGraphDef(&def));

  graph::SetDefaultDevice(false ? "/device:GPU:0" : "/cpu:0", &def);
  for (auto &node: *def.mutable_node()) {
        node.set_device("/cpu:0");
        std::cout << node.name() << " = '" << node.device() <<"'"<< std::endl;
  }
  std::cout << "=======================\n";

  std::vector<Tensor> outputs;
  ClientSession session(root);
  // Run and fetch v
  TF_CHECK_OK(session.Run({v}, &outputs));
  // Expect outputs[0] == [19; -3]
  LOG(INFO) << outputs[0].matrix<float>();
  return 0;
}

I get the following error when running the resulting program

2018-09-06 18:18:13.853316: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-09-06 18:18:13.856079: F /home/daniel/tensorflow_cc/example/example.cpp:27] Non-OK-status: session->Create(def) status: Not found: No attr named '/cpu:0' in NodeDef:
     [[Node: Const = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [2,2] values: [3 2][-1]...>, _device="/cpu:0"]()]]
     [[Node: Const = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [2,2] values: [3 2][-1]...>, _device="/cpu:0"]()]]
Aborted (core dumped)
@FloopCZ
Copy link
Owner

FloopCZ commented Sep 9, 2018

Hi, does it work correctly for you, if you move

  GraphDef def;
  TF_CHECK_OK(root.ToGraphDef(&def));
  graph::SetDefaultDevice(false ? "/device:GPU:0" : "/cpu:0", &def);

right after

  Scope root = Scope::NewRootScope();

?

@dshawul
Copy link
Author

dshawul commented Sep 9, 2018

Thanks! yes, it does work that way.
How do I use this fix for a program that loads a pb file (not construct it) ?

Edit: But it is not printing the nodes after that fix so it is actually not setting the device type properly.

And more importantly why compiling the original program (without your fix) from within tensorflow using bazel works but not when using a shared tensorflow library. I have also tried the same program from windows using tensorflow.dll files I got from here and got the same problem.

Could it be related to the following issues:
tensorflow/tensorflow#5379
tensorflow/tensorflow#16291

@FloopCZ
Copy link
Owner

FloopCZ commented Sep 14, 2018

Sorry, I am not really sure. You can try to freeze your graph before trying to load it as it will drop the assigned device from the individual nodes. It could be related to the issues you mention. Let us know if you manage to make it work.

@FloopCZ
Copy link
Owner

FloopCZ commented Nov 16, 2018

Hi, you can try the latest master that stands on TF 1.12 if you still have the issue. Feel free to reopen if it persists.

@FloopCZ FloopCZ closed this as completed Nov 16, 2018
@ttdd11
Copy link

ttdd11 commented Mar 25, 2019

@dshawul Were you able to resolve this?

@ttdd11
Copy link

ttdd11 commented Mar 25, 2019

@FloopCZ I am having this issue with the bazel build not being able to call SetDefaultDevice. Is there a solution for this that you know of?

@dshawul
Copy link
Author

dshawul commented Mar 25, 2019

@ttdd11 No, but I used a workaround of using bazel to compile the program directly with a BUILD file.
That works correctly with the above program, but if i try to use the shared build of tensorflow_cc it still doesn't work even with latest version of tensorflow.

@ttdd11
Copy link

ttdd11 commented Mar 25, 2019

@dshawul Can you explain a bit more? Instead of linking the tensorflow_cc to you app you are building your app with bazel and the source?

@dshawul
Copy link
Author

dshawul commented Mar 25, 2019

Yes, that is correct.
Since my app was simple, I took that easy solution as soon as i figured out how to construct a bazel build file for my app. Also, I didn't need to ship libtensorflow_cc.so file along with my app when building with bazel, which was a plus.
I have spent days looking for solutions before going the bazel way. For more details on why monolithic tensorflow builds have this problem, please refer to the links I gave above.

@ttdd11
Copy link

ttdd11 commented Mar 25, 2019

Thanks for the advice. My application is unfortunately somewhat complicated and we required debugging regularly in visual studio. However I may need to do this given tensorflow has had this issue for a while and we require rtx support which means cuda 10 and later versions of tensorflow.

@ttdd11
Copy link

ttdd11 commented Mar 26, 2019

@dshawul I think I am going to try what you are doing and just maintain two build. Can you provide some advice on using the BUILD file or copy some samples on how you got tensorflow source into the build file?

@dshawul
Copy link
Author

dshawul commented Mar 26, 2019

First I put my app somewhere inside the tensorflow source tree, e.g tensorflow/cc/myapp. This is a must to build with bazel. Then I write a a BUILD file and put in the myapp directory. Example build file:

load("//tensorflow:tensorflow.bzl", "tf_cc_shared_object")

tf_cc_shared_object(
    name = "myapp.dll",
    srcs = [
	"app.cpp",
	"app.h",
    ],
    deps = [
        "//tensorflow/cc:cc_ops",
        "//tensorflow/cc:client_session",
        "//tensorflow/core:tensorflow"
    ],
    defines = [ "TENSORFLOW" ]
)

My app is actually a shared object not an exe. Put in all your *.cpp and *.h in the "srcs". You can define predifined macros in defines -- if you want for example a -DTENORFLOW to be passed to the compiler.

Then execute:

bazel build --config=opt --config=monolithic //tensorflow/cc/myapp:myapp.dll

Then you should get myapp.dll that is a standalone binary (not libtensorflow dep) in

tensorflow/bazel-bin/tensorflow/cc/myapp

Hope that helps

@ttdd11
Copy link

ttdd11 commented Mar 26, 2019

@dshawul Okay well this could actually work. Just need to isolate all tensorflow to a few files and build them as a dll, then link to those to get the tensorflow working. Is this basically what you did?

@dshawul
Copy link
Author

dshawul commented Mar 26, 2019

Note that you can directly build an exe (your app) with bazel. My app was a dll from the beginning. It is not required that you build a shared library for the workaround. Good luck.

@ttdd11
Copy link

ttdd11 commented Mar 26, 2019

Okay sounds good. I'm just thinking for ease of debugging if I build a dll then I can link it and test other code pretty easily. Does this sound like it should work?

@dshawul
Copy link
Author

dshawul commented Mar 26, 2019

Yes that sounds good. It is also good to have all tensorflow code in a separate module.

@ttdd11
Copy link

ttdd11 commented Mar 26, 2019

@dshawul this has been very helpful. How did you configure tensorflow with different options (such as CUDA 10) before running your build file? Did you just run the configure.py first and then run the build as you described above?

@ttdd11
Copy link

ttdd11 commented Mar 27, 2019

@dshawul I've gotten the lib to build and sadly and unable to use it. Here is my code for the lib. It's very simple I just want to see if the SetDefaultGraph works on windows.

This is my header:

#ifndef INFERENCELIB_H
#define INFERENCELIB_H

#ifdef TF_API
#define TF_API __declspec(dllexport)
#else
#define TF_API __declspec(dllimport)
#endif

//Standard includes
#include

//tensorflow includes
#include "tensorflow/cc/ops/const_op.h"
#include "tensorflow/cc/ops/image_ops.h"
#include "tensorflow/cc/ops/standard_ops.h"
#include "tensorflow/core/framework/graph.pb.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/graph/default_device.h"
#include "tensorflow/core/graph/graph_def_builder.h"
#include "tensorflow/core/lib/core/errors.h"
#include "tensorflow/core/lib/core/stringpiece.h"
#include "tensorflow/core/lib/core/threadpool.h"
#include "tensorflow/core/lib/io/path.h"
#include "tensorflow/core/lib/strings/stringprintf.h"
#include "tensorflow/core/platform/env.h"
#include "tensorflow/core/platform/init_main.h"
#include "tensorflow/core/platform/logging.h"
#include "tensorflow/core/platform/types.h"
#include "tensorflow/core/public/session.h"
#include "tensorflow/core/util/command_line_flags.h"

class TF_API InferenceLib
{
public:
InferenceLib(const char* strGraphPath, bool& bStatus, const int nGPU = -1);
~InferenceLib();

private:
std::unique_ptrtensorflow::Session m_TFSession;
bool loadGraphAndSession(const char* strPathGraph, int nGPU = -1);
};

#endif //INFERENCELIB_H

This is the cpp:

#include "InferenceLib.h"
using tensorflow::Flag;
using tensorflow::Tensor;
using tensorflow::Status;
using tensorflow::string;
using tensorflow::int32;

InferenceLib::InferenceLib(const char* strPathGraph,
bool& bStatus,
const int nGPU /=-1/)
{
if (loadGraphAndSession(strPathGraph, nGPU))
{
bStatus = true;
}
}

bool InferenceLib::loadGraphAndSession(const char* strPathGraph, int nGPU /= -1/)
{
bool bReturn = true;
std::string path = strPathGraph;
tensorflow::GraphDef graph_def;
Status load_graph_status = ReadBinaryProto(tensorflow::Env::Default(), path, &graph_def);
if (!load_graph_status.ok())
{
bReturn = false;
}

tensorflow::SessionOptions sessOps;
if (-1 != nGPU)	//here asssigning the graph to a specific gpu, useful is building multiple instances of this 
{
	std::string strGPU = std::to_string(nGPU);
	tensorflow::graph::SetDefaultDevice("/gpu:" + strGPU, &graph_def);
	sessOps.config.set_allow_soft_placement(true);
	sessOps.config.set_log_device_placement(true);
	sessOps.config.mutable_gpu_options()->set_allow_growth(true);					//this line avoids large preallocations (I don't care about this)																					//sessOps.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(50); //100 should be defauly
}

m_TFSession.reset(tensorflow::NewSession(sessOps));
Status session_create_status = (*this->m_TFSession).Create(graph_def);

if (!session_create_status.ok())
{
	bReturn = false;
}
return bReturn;

}

InferenceLib::~InferenceLib()
{

}

My build file is:

load("//tensorflow:tensorflow.bzl", "tf_cc_shared_object")

tf_cc_shared_object(
name = "InfLib2.dll",
srcs = [
"InferenceLib.cpp",
"InferenceLib.h",
],
deps = [
"//tensorflow/cc:cc_ops",
"//tensorflow/cc:client_session",
"//tensorflow/core:tensorflow"
],
defines = [ "TENSORFLOW",
"TF_API"]
)

I run this in Powershell:
bazel build --config=opt --config=monolithic //tensorflow/cc/InfLib2:InfLib2.dll

And the lib builds. But I get link errors when calling to any function and the .lib seems to be corrupted (at least depends says that its not a valid 32 or 64 bit lib).

Does this look correct? I've been on this for about a month now and don't seem to be any closer. Any help would be greatly appreciated. The symbols I need are actually in the dll so I'm thinking its how I built it.

@dshawul
Copy link
Author

dshawul commented Mar 27, 2019

I think you need to put more dependencies 'deps' for your case that will cover all your tensorflow includes. I had only three because i only included these:

#ifdef TENSORFLOW
#include "tensorflow/core/platform/env.h"
#include "tensorflow/core/public/session.h"
#include "tensorflow/core/graph/default_device.h"
#endif

You have many more includes than me. A good test to see if bazel solution works is to compile the example I provided in my original post by putting it in //tensorflow/cc/example/example.cc and using
the below build file

   load("//tensorflow:tensorflow.bzl", "tf_cc_binary")
   
   tf_cc_binary(
       name = "example",
       srcs = ["example.cc"],
       deps = [
           "//tensorflow/cc:cc_ops",
           "//tensorflow/cc:client_session",
           "//tensorflow/core:tensorflow",
      ],
   )

@ttdd11
Copy link

ttdd11 commented Mar 27, 2019

I don't think I need all those includes. I will try again without them. Here is the link error for reference:

__cdecl tensorflow::core::GetVarint32PtrFallback unresolved symbol

@ttdd11
Copy link

ttdd11 commented Mar 27, 2019

@dshawul I think this is working now besides that one symbol. It is located in source\tensorflow\core\lib\core. I look at the BUILD file for core and I should be able to add "//tensorflow/core:lib" to get those symbols but that did not work. Are you running inference from a loaded .pb with your app? Seems like the trouble line is when I use TensorShape which your example doesn't use. Any ideas on how to get this tensorflow::core::GetVarint32PtrFallback symbol in the lib?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants