-
Notifications
You must be signed in to change notification settings - Fork 990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Tencent GooseFS Cache Engine to fluid backend #912
Conversation
523a03b
to
ed12216
Compare
c0209fe
to
ec806f5
Compare
Codecov Report
@@ Coverage Diff @@
## master #912 +/- ##
==========================================
+ Coverage 43.57% 46.76% +3.18%
==========================================
Files 122 162 +40
Lines 6079 8780 +2701
==========================================
+ Hits 2649 4106 +1457
- Misses 3179 4221 +1042
- Partials 251 453 +202
Continue to review full report at Codecov.
|
ec806f5
to
4eec434
Compare
6dab0ae
to
614a6b4
Compare
case common.ALLUXIO_RUNTIME: | ||
imageName, imageTag = docker.GetWorkerImage(r.Client, databackup.Spec.Dataset, "alluxio", databackup.Namespace) | ||
javaEnv = "-Dalluxio.master.hostname=" + ip + " -Dalluxio.master.rpc.port=" + strconv.Itoa(int(rpcPort)) | ||
runtimeType = "alluxio" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use common.ALLUXIO_RUNTIME
as runtimeType ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
ctx.Log.Error(err, "Failed to get the ddc runtime") | ||
return utils.RequeueIfError(errors.Wrap(err, "Unable to get ddc runtime")) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest this error processing logic :
return ctrl.Result{},errors.Wrapf(utils.IgnoreNotFound(err),"failed to get ddc runtime %s",ctx.NamespacedName)
utils.RequeueIfError
there is no need for encapsulation, and it is not very readable
ctx.Log.V(1).Info("process the request", "request", req) | ||
|
||
// 1.Load the Runtime | ||
runtime, err := r.getRuntime(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getRuntime
there is no need for encapsulation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, i am not clear your question. Can you describe more details? Thanks a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that getRuntime
is a simple logic, and we don't really need to wrap a function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though the logic is sample, but i think warp function is ok for me : ) . WDYT @cheyang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think getRuntime is need for encapsulation because every developer need to implement their own getRuntime function
pkg/databackup/constants.go
Outdated
FINALIZER = "fluid-databackup-controller-finalizer" | ||
ALLUXIO_BACPUP_PATH_POD = "/alluxio_backups" | ||
GOOSEFS_BACPUP_PATH_POD = "/goosefs_backups" | ||
DATABACKUP_CHART = "fluid-databackup" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CamelCase ? CodeReviewComments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a folder name, we can follow the naming habits of folders. : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean the name of the variable, not the value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, clear, lgtm
pkg/ddc/base/runtime.go
Outdated
@@ -297,6 +297,17 @@ func GetRuntimeInfo(client client.Client, name, namespace string) (RuntimeInfoIn | |||
} | |||
runtimeInfo.SetupFuseDeployMode(jindoRuntime.Spec.Fuse.Global, jindoRuntime.Spec.Fuse.NodeSelector) | |||
return runtimeInfo, nil | |||
case "goosefs": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unifies a constant called common.GooseFS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tks, lgtm
@@ -8,6 +8,7 @@ CRD_OPTIONS ?= "crd:trivialVersions=true" | |||
DATASET_CONTROLLER_IMG ?= registry.aliyuncs.com/fluid/dataset-controller | |||
ALLUXIORUNTIME_CONTROLLER_IMG ?= registry.aliyuncs.com/fluid/alluxioruntime-controller | |||
JINDORUNTIME_CONTROLLER_IMG ?= registry.aliyuncs.com/fluid/jindoruntime-controller | |||
GOOSEFSRUNTIME_CONTROLLER_IMG ?= registry.aliyuncs.com/fluid/goosefsruntime-controller |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for updating it.
release: {{ .Release.Name }} | ||
role: dataload-job | ||
data: | ||
dataloader.init: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to change to dataloader.goosefs.init
to be in parallel with the other two runtimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
pkg/ddc/alluxio/utils.go
Outdated
@@ -243,7 +243,7 @@ func (e *AlluxioEngine) parseFuseImage(image string, tag string, imagePullPolicy | |||
} | |||
|
|||
func (e *AlluxioEngine) GetMetadataInfoFile() string { | |||
return cdatabackup.BACPUP_PATH_POD + "/" + e.GetMetadataInfoFileName() | |||
return cdatabackup.ALLUXIO_BACPUP_PATH_POD + "/" + e.GetMetadataInfoFileName() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo here. Should be ALLUXIO_BACKUP_PATH_POD
and GOOSEFS_BACKUP_PATH_POD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tks, i will fix it.
@@ -0,0 +1,7 @@ | |||
FROM centos:centos8.2.2004 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need runtime-specific scripts to do initialization like init-users.sh
or chmod_fuse_mountpoint.sh
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of maintainability and extensibility, I feel like runtime-specific will be better. WDYT @cheyang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good code.
1、In the future, we can refactor the logic of backup to logic of engine like dataload.
2、After discuss, we decide to change the name of the constant to CamelCase
3. In the past, there were many irregularities in naming when developing AlluxioRuntime. During your developing, it’s not necessary to copy the original naming convention(such as the name is same as package)
cmd/goosefs/main.go
Outdated
func main() { | ||
if err := cmd.Execute(); err != nil { | ||
fmt.Fprintf(os.Stderr, "%s", err.Error()) | ||
os.Exit(1) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the discuss, we decide to split main to multi files.
You can refer to main of webhook
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it may be not in this PR. we can do this in future. Thanks.
pkg/common/goosefs.go
Outdated
GOOSEFS_RUNTIME = "goosefs" | ||
|
||
GOOSEFS_MOUNT_TYPE = "fuse.goosefs-fuse" | ||
|
||
GOOSEFS_NAMESPACE = "goosefs-system" | ||
|
||
GOOSEFS_CHART = GOOSEFS_RUNTIME | ||
|
||
GOOSEFS_RUNTIME_IMAGE_ENV = "GOOSEFS_RUNTIME_IMAGE_ENV" | ||
|
||
GOOSEFS_FUSE_IMAGE_ENV = "GOOSEFS_FUSE_IMAGE_ENV" | ||
|
||
DEFAULT_GOOSEFS_RUNTIME_IMAGE = "registry.aliyuncs.com/fluid/goosefs:v1.0.1" | ||
|
||
DEFAULT_GOOSEFS_FUSE_IMAGE = "registry.aliyuncs.com/fluid/goosefs-fuse:v1.0.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the discuss, we decide to use CamelCase in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
api/v1alpha1/goosefsruntime_types.go
Outdated
RunAs *User `json:"runAs,omitempty"` | ||
|
||
// Disable monitoring for GooseFS Runtime | ||
// Promethous is enabled by default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prometheus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tjs, i will fix it.
ctx.Log.V(1).Info("process the request", "request", req) | ||
|
||
// 1.Load the Runtime | ||
runtime, err := r.getRuntime(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think getRuntime is need for encapsulation because every developer need to implement their own getRuntime function
pkg/ddc/goosefs/worker.go
Outdated
} | ||
|
||
e.Log.Info("check the desired and current replicas", | ||
"desriedReplicas", replicas, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
desried
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tks , i will fix it.
|
||
// transform dataset which has ufsPaths and ufsVolumes | ||
func (e *GooseFSEngine) transformInitUsers(runtime *datav1alpha1.GooseFSRuntime, value *GooseFS) { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think every independent function need to check nil pointer at the beginning, not only transform.go
pkg/ddc/alluxio/metadata.go
Outdated
@@ -260,7 +260,7 @@ func (e *AlluxioEngine) syncMetadataInternal() (err error) { | |||
// sync local dir if necessary | |||
for _, mount := range dataset.Spec.Mounts { | |||
if common.IsFluidNativeScheme(mount.MountPoint) { | |||
localDirPath := utils.UFSPathBuilder{}.GenLocalStoragePath(mount) | |||
localDirPath := utils.UFSPathBuilder{}.GenAlluxioLocalStoragePath(mount) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about GenAlluxioLocalStoragePath
-> GenLocalStoragePathForAlluxio
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
pkg/utils/dataset.go
Outdated
@@ -73,7 +73,8 @@ func GetAccessModesOfDataset(client client.Client, name, namespace string) (acce | |||
func IsTargetPathUnderFluidNativeMounts(targetPath string, dataset datav1alpha1.Dataset) bool { | |||
for _, mount := range dataset.Spec.Mounts { | |||
|
|||
mPath := UFSPathBuilder{}.GenAlluxioMountPath(mount, dataset.Spec.Mounts) | |||
mAlluxioPath := UFSPathBuilder{}.GenAlluxioMountPath(mount, dataset.Spec.Mounts) | |||
mGooseFSPath := UFSPathBuilder{}.GenGooseFSMountPath(mount, dataset.Spec.Mounts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wonder what does the mGooseFSPath
look like? Do we need duplicate GenGooseFSUFSRootPath
? What's your opinion? @allenhaozi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a DataSet be used by two Runtime simultaneously ?
Or whether GooseFS can follow the same rules as Aluxio,
So we don't need a duplicate GenXXXMountPath
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly, Dataset should can be used for different type runtime.
Secondly, at present, GooseFS can follow ths same rules as Alluxio, but the future may be different.
pkg/utils/ufs_path_builder.go
Outdated
@@ -41,6 +41,22 @@ func (u UFSPathBuilder) GenAlluxioMountPath(curMount datav1alpha1.Mount, mounts | |||
return fmt.Sprintf(common.AlluxioMountPathFormat, curMount.Name) | |||
} | |||
|
|||
// dataset.spec.mounts mount to goosefs instance strategy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have concern on this package. Seems it's the utils for alluxio or Goose instead of all the runtime. If we really think it's not common. Maybe the common part can be placed in UFSPathBuilder. How about provide AlluxioUFSPathBuilder and GooseUFSPathBuilder?
type GooseUFSPathBuilder struct{
*UFSPathBuilder
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @allenhaozi WDYT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to know the difference ?
Whether they can all follow one rule ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At present, it seems that it can be consistent, but we want to maintain expansibility and isolation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it fine to be consistent now? We can support GooseUFSPathBuilder
in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will seperate it in the future. @cheyang
v1.EventTypeNormal, | ||
common.RuntimeNotReady, | ||
"Bounded accelerate runtime not supported") | ||
} | ||
|
||
if len(imageName) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we wrap docker.GetImageTagFromEnv
and docker.GetImageRepoFromEnv
in docker.GetWorkerImage
function ?
imageName, imageTag := docker.GetWorkerImage(r.Client, databackup.Spec.Dataset, runtimeType, databackup.Namespace)
image := fmt.Sprintf("%s:%s", imageName, imageTag)
Do we have the case where the image name and the image tag come from different places ?
Or can it be avoided ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we get image info from configmap and host env, will double check. But i think it is too strict.
defaultImageInfo := strings.Split(common.DEFAULT_ALLUXIO_RUNTIME_IMAGE, ":") | ||
if len(defaultImageInfo) < 2 { | ||
defaultImageInfo := strings.Split(defaultImage, ":") | ||
if len(defaultImageInfo) < 1 { | ||
panic("invalid default databackup image!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it have to be a panic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tks, i think it is necessary be panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here if it's panic and what we see ? pod restart ?
return endpoint, err | ||
} | ||
|
||
if svc == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a situation where err
and svc
are both nil ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean you have right endpoint in kubernetes cluster but err
and svc
is still nil?
) | ||
|
||
const ( | ||
runtimeType = "goosefs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common.GooseFSRuntime.String()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common.GooseFSRuntime, lgtm
pkg/utils/databackup.go
Outdated
func GetAddressOfMaster(pod *v1.Pod) (nodeName string, ip string, rpcPort int32) { | ||
// TODO: Get address of master by calling runtime controller interface instead of reading pod object | ||
nodeName = pod.Spec.NodeName | ||
ip = pod.Status.HostIP | ||
for _, container := range pod.Spec.Containers { | ||
if container.Name == "alluxio-master" { | ||
if container.Name == "alluxio-master" || container.Name == "goosefs-master" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest:
wrap a function with ut
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
value = q.String() | ||
|
||
if strings.HasSuffix(value, "i") { | ||
value = strings.ReplaceAll(value, "i", "B") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logic strings.ReplaceAll
means that 1Ki == 1KB is true ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, Ki is legal for kubernetes and KB is legal for Alluxio and GooseFS.
046ffef
to
3f04dff
Compare
@cheyang @TrafalgarZZZ @allenhaozi @yangyuliufeng Thanks for taking time for reviewing, i have fix some comment. Can we check the pr again. If any problem, ping me is welcome : ) |
43cf762
to
3febac9
Compare
Signed-off-by: xieydd <xieydd@gmail.com> resolve ci bug Signed-off-by: xieydd <xieydd@gmail.com> Sync test Signed-off-by: xieydd <xieydd@gmail.com>
…engine type Signed-off-by: xieydd <xieydd@gmail.com>
Signed-off-by: xieydd <xieydd@gmail.com>
Signed-off-by: xieydd <xieydd@gmail.com>
Signed-off-by: xieydd <xieydd@gmail.com>
Signed-off-by: xieydd <xieydd@gmail.com>
da35497
to
681a908
Compare
Thanks, @xieydd . Great work! |
Thank your for contributing the Tecent GooseFS Cache Engine as a Fuild backend. @xieydd ! |
Signed-off-by: chrisydxie chrisydxie@tencent.com
Ⅰ. Describe what this PR does
Add support for tencent goosefs engine for fluid backend.
Proposal can be found in https://docs.google.com/document/d/1t6wQNP1X7TDlWVZCq0dURK0msIAAN5XWmR-ubIWju6Q/edit#heading=h.8nkpj2y22jpd.
Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.
In next few pr, i will add addation test case for improving the test cover rate.
Ⅴ. Special notes for reviews
@cheyang @RongGu @TrafalgarZZZ cc