-
Notifications
You must be signed in to change notification settings - Fork 82
Intelligent volume provisioning in GD2 #466
Comments
Part of #417 |
@prashanthpai @aravindavk I feel this feature should be a part of core glusterd2, i.e I should update the existing create volume code , I will detect whether size is passed in the request or not , and if the size is passed then the dynamic volume provisioning part of code will be executed. |
@rishubhjain I can see that it's easy for you to implement it in volume create. May be it's suitable for a later refactor in far future, but not right away. Volume create handler is already not trivial and is quite long. I don't prefer to add more complexity there right now. Besides, if parts of heketi becomes a library, it's cleaner for it to be imported in plugin and middleware. |
Once the initial functionality is in, would it be possible to allow the administrator to specify the workload or type for the volume to be created? This would enable some flexibility, which would be great to have in gadmin. Also, would it make sense to allow the administrator to optionally specify the hosts on which the bricks would be spread out? To sum up, I'm wondering if the following format of the volume create command in gadmin would make sense:
|
We can either have two separate request type API for volume create or add fields to share the existing one. I'd like inputs from @aravindavk, @kshlm and @raghavendra-talur on this. For the initial implementation, we are looking at the request API to be minimal, something along these lines: type VolCreateDynamicReq struct {
Name string `json:"name"`
Size string `json:"size"`
Type string `json:"type,omitempty"`
} The One approach is for the API endpoint to be func Heketi(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
q := r.URL.Query()
if _, ok := q["dynamic"]; !ok {
// pass through
next.ServeHTTP(w, r.WithContext(ctx))
}
// unmarshall request into VolCreateDynamicReq type
// process the dynamic request
// do the heketi magic to figure out bricks, replicas and subvols
// prepare devices/bricks
// this is where you can (optionally) introduce heketi's
// async model and reply to client.
// send a normal volume create request to gd2
})
} With a separate API, the middleware can have a simple and efficient pass-through. Without a separate API for dynamic volume create, we'll have to:
|
With middleware approach, we can't make prepare bricks and actual volume create as single Transaction. I think we should have this as part of Volume Create Handler itself.
Preparing the brick can be one of the Transaction step, On success populate Subvols info in volinfo which will be used by other transaction steps. We can also add |
@prashanthpai Sharing one API seems to be a better approach as it makes the volume create a less complex process. I think introducing a small flow in already existing volume create handler should do the job though it then doesn't make the dynamic_volume pluggable. Also, I think we should discuss on which approach (normal volume create or dynamic volume create) will be used more w.r.t the new direction that was discussed in meetings and according to that, we can restructure the documentation as well. |
I'm okay with making it part of volume create. However, I prefer the volume create handler to be not async. IIRC, the heketi way is async i.e give back the client a ID/URI of some sort that the client can check back on. Can someone from heketi confirm this ?
May be.
May be not. Now you'll have more steps, some steps being conditional. @kshlm Thoughts on this ? |
Yes the volume create is Async operation, but it seems keeping the volume create operation async is a better approach for components such as openshift and kubernetes. |
What's the exact need of having a volume create to be an async operation? IMO, commands which do not involve a heavy lifting transaction workflow can remain as not an async operation and volume create is definitely one of them. |
I am yet to go through the complete discussion but I suggested it to be a async op because with brick creation and other transactions coming in with plugin it will become a long operation. |
Yes, we discussed the same. Async is required if the unique id is generated by server and served back later. In GD2, client gives the unique id for volume and it is fine if you choose synchronous model. There are other benefits in using async though. |
@prashanthpai @kshlm @aravindavk
The other model where the URL changes for dynamic is not acceptable. Either of the above two models should work. I suggest Type to be replaced by Options if you wish you use it later for other purposes. It could be a map of key value pairs. |
@rishubhjain @raghavendra-talur The glusterd2 process/service can support async requests if deemed necessary for certain operations. However, the volume create REST handler will remain synchronous. This means that the core handler will not be dealing with serving/replying to asynchronous requests. The "asynchronicity" shall be added by the heketi middle-ware. It will handle only async requests, convert them to synchronous requests and pass it down. The middle-ware will maintain state of asynchronous requests (job queue/id) in its store namespace.
|
I still think, we should not implement this as Middleware. Because Volume create with Size is not treated as single Transaction. If Middleware is successful and actual Volume Create fails, then no rollback available for Middleware.(Another API required to Cleanup) We can split this into two parts,
With this approach, all the steps will be executed in same Transaction. Let me know your thoughts. |
Agreed. That is a better approach. I'm only not in favour of making the volume create handler async. The middleware should be the one tracking state of the async request. All our core handlers will be synchronous. |
We can make requests Async later if required. We need to work on providing general framework for handling Async requests including Volume Create. |
Sharing some notes about Intelligent Volume Provisioning required from Glusterd2. Please add if any change required with the logic/implementation. Cluster SetupAttach all the nodes using Status: Already available Register the available devicesRegister available devices for each Peer. This also prepare the device by creating Example Request
This needs to be done for all the Peer nodes Status: Already Available Peer GroupingIf all Peers belongs to different failure domain, then configuring
Once the grouping is available, If a Replica 3 Volume is requested,
Without the Group information, Bricks will be created as, If Group information is modified after Volume Create, no change to be Note: If Group is not configured, Status: Patch under review Choosing Bricks for a VolumeHeketi's Simple Ring Allocator Storing the Ring in Db is not Flexible since we need to update the This gives more flexibility to choose bricks compared to choosing
Psudo code to choose bricks,
Above logic looks similar to Simple Ring Allocator, but Glusterd2 will Following details are required from user
FAQs
|
That general framework is the async middleware described above |
From a gadmin perspective (well, UX perspective in general), it would be nice to be able to split up a transaction into steps and provide updates to the user about the steps, so as to provide a 'progress report'. I would like all your thoughts regarding the following:
Looking at debugging scenarios where there's a problem that needs to be traced to the block devices, the composition of the volume via subvolumes on specific per-node block devices would probably be necessary information. We need to consider this as a user experience problem across all the gd2 APIs, rather than just this particular API. The consistency of information presented should, IMHO, be a prime concern. |
The comment #466 (comment) talks about multiple APIs(Cluster register, add devices, volume create). Glusterd2 will not combine all these steps into single transaction. Async APIs are low priority right now, we would like to see the functionality working with Synchronous API.
Transaction can have multiple steps(Already supported), once we add the Async API support, status breakdown is possible for Transaction steps(Example, Step1: Complete, Step2: Progress etc). It is also possible to give other informations like number of steps in Transaction, time taken for each step etc. Note: We are not planning the API to accept the list of steps from user and make it into Transaction. Glusterd2 new API should be implemented as plugin or middleware only.
Volume create will return Volume info of the volume created, which will have sub volume details.
Sub volume information and devices information is already available. Please provide more details to this use case. |
@aravindavk If group is changed and no changes are made to the already created volumes then won't the volume loose its property of being distributed or replica ? |
Volume functionality is unaffected, but if two peers were in different zones earlier now moved to same zone then bricks distributions are not optimal. It is also possible that multiple bricks of same replica reside in same zone. But moving bricks on zone/group change is expensive operation, for now we can keep this as known issue. |
New issues opened for the enhancements for this feature. Closing this issue since the main feature is merged. |
As of now in GD1 an user has to provide the exact volume & brick topology details as part of the volume create request to carve out a storage. Going forward with GD2 it should have a way to intelligently provision volumes where one can just mention the size of it and GD2 should depend on its algorithm to carve out the relevant volume for the user.
This github is a tracker id to assess the work required to enable this capability in GD2.
Checklist for Definition of Done
The text was updated successfully, but these errors were encountered: