Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use Faiss GPU on IBM Power9 (Python) #846

Closed
2 of 4 tasks
satyakrishnagorti opened this issue May 29, 2019 · 5 comments
Closed
2 of 4 tasks

Unable to use Faiss GPU on IBM Power9 (Python) #846

satyakrishnagorti opened this issue May 29, 2019 · 5 comments

Comments

@satyakrishnagorti
Copy link

satyakrishnagorti commented May 29, 2019

Summary

Hi, I managed to install faiss on IBM power9 architecture. However I am unable to use faiss on GPU with python. I get an error
module 'faiss' has no attribute 'StandardGpuResources'

I am able to use the CPU version fine.

I am pasting my makefile.inc below (see end of issue).

Edit:

I noticed a gpu folder with a Makefile, so I tried:

cd gpu/
make

I get the following error:

um.cu -o impl/BroadcastSum.o
impl/../utils/Tensor.cuh(362): error: type name is not allowed

impl/../utils/Tensor.cuh(362): error: expected an expression

2 errors detected in the compilation of "/tmp/tmpxft_0000965d_00000000-4_BroadcastSum.cpp4.ii".                                    
Makefile:67: recipe for target 'impl/BroadcastSum.o' failed
make: *** [impl/BroadcastSum.o] Error 2

Also the configure argument --with-cuda-arch is not recognised. Is the documentation for a newer version of faiss > 1.4.0?

My g++ version is: 7.4.0
Cuda: 10.1

Thanks

Platform

IBM Power9

OS: Ubuntu 18.04 LTS

Faiss version: 1.4.0

Faiss compilation options:
LDFLAGS=-L/path/to/openblas/ ./configure --with-cuda=/usr/local/cuda

Running on:

  • CPU
  • GPU

Interface:

  • C++
  • Python

Reproduction instructions

import faiss
res = faiss.StandardGpuResources()

Here is my makefile.inc

# Copyright (c) 2015-present, Facebook, Inc.                                                                                       
# All rights reserved.                                                                                                             
#                                                                                                                                  
# This source code is licensed under the BSD+Patents license found in the                                                          
# LICENSE file in the root directory of this source tree.

CXX          = g++ -std=c++11
CXXCPP       = g++ -std=c++11 -E
# TODO: Investigate the LAPACKE wrapper for LAPACK, which defines the correct                                                      
#   type for FORTRAN integers.
CPPFLAGS     = -DFINTEGER=int
CXXFLAGS     = -fPIC -fopenmp -m64 -Wno-sign-compare -g -O3 -Wall -Wextra                                                          
CPUFLAGS     =
LDFLAGS      = -fopenmp -L/home/layer6/anaconda3/envs/pytorch/lib/
LIBS         = -lopenblas
PYTHONCFLAGS = -I/usr/include/python2.7 -I/usr/include/powerpc64le-linux-gnu/python2.7 -I/home/layer6/anaconda3/lib/python3.7/site-packages/numpy/core/include

NVCC         = /usr/local/cuda/bin/nvcc
NVCCLDFLAGS  = -L/usr/local/cuda/lib64 -L/home/layer6/anaconda3/envs/pytorch/lib/                                                  
NVCCLIBS     = -lcudart -lcublas -lcuda
CUDAROOT     = /usr/local/cuda
CUDACFLAGS   = -I/usr/local/cuda/include
NVCCFLAGS    = -I $(CUDAROOT)/targets/ppc64le-linux/include/ \
-Xcompiler -fPIC \
-Xcudafe --diag_suppress=unrecognized_attribute \
-gencode arch=compute_35,code="compute_35" \
-gencode arch=compute_52,code="compute_52" \
-gencode arch=compute_60,code="compute_60" \
-gencode arch=compute_61,code="compute_61" \
-gencode arch=compute_70,code="compute_70" \
-gencode arch=compute_72,code="compute_72" \
-gencode arch=compute_75,code="compute_75" \
-lineinfo \
-ccbin $(CXX) -DFAISS_USE_FLOAT16

OS = $(shell uname -s)

SHAREDEXT   = so
SHAREDFLAGS = -shared

ifeq ($(OS),Darwin)
        SHAREDEXT   = dylib
        SHAREDFLAGS = -dynamiclib -undefined dynamic_lookup
endif

MKDIR_P      = /bin/mkdir -p
PYTHON       = python
SWIG         = swig

prefix      ?= /usr/local
exec_prefix ?= ${prefix}
libdir       = ${exec_prefix}/lib
includedir   = ${prefix}/include
@satyakrishnagorti
Copy link
Author

satyakrishnagorti commented May 29, 2019

Also it is worth mentioning I changed NVCCFLAGS from -I $(CUDAROOT)/targets/x86-64-linux/include/ to -I $(CUDAROOT)/targets/ppc64le-linux/include/ and removed Intel flags from CPUFLAGS.

@satyakrishnagorti satyakrishnagorti changed the title Unable to use Faiss GPU on IBM Power9 Unable to use Faiss GPU on IBM Power9 (Python) May 29, 2019
@mdouze
Copy link
Contributor

mdouze commented May 30, 2019

This is interesting. AFAIK, it is the first time that FAISS is compiled on a PPC architecture (so far only x64 and ARM). Is the PPC running in little endian mode?

The error that you get looks like a g++ error. This is surprising because the gcc version you are using is quite recent. Are you sure that nvcc is indeed using that compiler?

@satyakrishnagorti
Copy link
Author

satyakrishnagorti commented May 30, 2019

Hi @mdouze yes, the PPC is running in little endian mode.

Things I found so far:

  1. I tried to fix the issue by following: GPU compilation fails: Tensor.cuh(362): type name is not allowed #751 (as v1.4 didn't have that fix)

  2. Since the machine has a Telsa V100, and version 1.4 seems to be requiring CUDA_ARCH <= 70, I copied over the latest code from https://github.com/facebookresearch/faiss/blob/master/gpu/utils/DeviceDefs.cuh. Adding these two fixes lets me compile gpu code to get libgpufaiss.so

  3. I only now have trouble with python. As I get the following error after importing faiss in python.
    ModuleNotFoundError: No module named '_swigfaiss'

I have _swigfaiss.so and _swigfaiss_gpu.so in PYTHONPATH

  1. I also found only building v1.4 works fine on PPC, any higher version although I can't seem to find any intel specific flags in makefile.inc uses them and fails.

Do you have an idea of what is going wrong?
Thanks!

@mdouze
Copy link
Contributor

mdouze commented Jun 3, 2019

Could you try to import _swigfaiss and _swigfaiss directly? If there are missing symbols, they should appear in the error message.

@satyakrishnagorti
Copy link
Author

I got a little busy with other things and cannot try this right away. I will close the issue for now and open another one if need be. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants