-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Matrix class to abstract algorithms away from data storage details #54
Comments
I am in general against writing a matrix class, or using an existing matrix I am also a little against Matlab style implementations. For example, the activations = input.exp(); effectively allocates two arrays, activations and probs, and then discards I do like the idea of separating interface from actual implementations. The Yangqing On Thu, Jan 23, 2014 at 9:20 PM, kloudkl notifications@github.com wrote:
|
Thanks for your suggestions! In a larger context of this proposal, I am wondering for a while what are the vision, scope, dos with priorities and dont's of Caffe? If you have a plan that can direct the community towards a shared destination, it would concentrate the limited resources out there and lead to more effective development and wider adoption in the near future. |
Closed per #85. |
Mali GPU does not support host unified memory in fact #53
Currently, the algorithm codes are quite aware of the memory layout of the underlying data. Adding a Matrix class in-between helps separate concerns of different modules which is a good practice in software engineering.
The biggest benefit is to simplify coding and improve the development productivity. It will also ease understanding of the existing and future algorithms. As a result, we will see accelerated development and adoption progresses.
The Matrix class is intended to be a view of 2D array contained in a Blob. Its main functionality is to provide high level wrappers of the common operations.
So that we can write like codes like the following snippets.
The convolution:
The fully connected layer:
The ReLU activation:
activation = input.max(0);
The Softmax activation
As you can see, the API is highly inspired by MATLAB which also motivates ArrayFire C++. But of course the snippets are only rough sketches. Many more details need to be considered. For example, if the performance price of boost move operations is too high, it could be replaced by shared_ptr which would complicate the user codes a little. Another question is should we pass in the shared_ptr of the result matrix instead of returning it. More importantly, the GPU codes may greatly differ from the CPU codes depending on whether CUDA can play well with the proposed API syntax.
Therefore, this issue's scope is limited to the implementation of the Matrix classes for both kinds of devices. Porting algorithms should be put into independent issues until benchmark results show no performance gap between the low level API and the proposed high level API.
Welcome efforts to refine the API and help implement it.
The text was updated successfully, but these errors were encountered: