Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Tencent GooseFS Cache Engine to fluid backend #912

Merged
merged 8 commits into from
Jul 20, 2021

Conversation

xieydd
Copy link
Member

@xieydd xieydd commented Jul 9, 2021

Signed-off-by: chrisydxie chrisydxie@tencent.com

Ⅰ. Describe what this PR does

Add support for tencent goosefs engine for fluid backend.
Proposal can be found in https://docs.google.com/document/d/1t6wQNP1X7TDlWVZCq0dURK0msIAAN5XWmR-ubIWju6Q/edit#heading=h.8nkpj2y22jpd.

Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

In next few pr, i will add addation test case for improving the test cover rate.

Ⅴ. Special notes for reviews

@cheyang @RongGu @TrafalgarZZZ cc

@xieydd xieydd force-pushed the add-goosefs-runtime branch from 523a03b to ed12216 Compare July 9, 2021 04:18
@xieydd xieydd closed this Jul 9, 2021
@xieydd xieydd reopened this Jul 9, 2021
@xieydd xieydd force-pushed the add-goosefs-runtime branch 2 times, most recently from c0209fe to ec806f5 Compare July 9, 2021 09:35
@codecov
Copy link

codecov bot commented Jul 9, 2021

Codecov Report

Merging #912 (5db4011) into master (df7e5cf) will increase coverage by 3.18%.
The diff coverage is 43.57%.

❗ Current head 5db4011 differs from pull request most recent head 681a908. Consider uploading reports for the commit 681a908 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #912      +/-   ##
==========================================
+ Coverage   43.57%   46.76%   +3.18%     
==========================================
  Files         122      162      +40     
  Lines        6079     8780    +2701     
==========================================
+ Hits         2649     4106    +1457     
- Misses       3179     4221    +1042     
- Partials      251      453     +202     
Impacted Files Coverage Δ
pkg/ddc/alluxio/shutdown.go 22.59% <ø> (ø)
pkg/ddc/goosefs/create_volume.go 0.00% <0.00%> (ø)
pkg/ddc/goosefs/operations/cached.go 0.00% <0.00%> (ø)
pkg/ddc/goosefs/operations/conf.go 0.00% <0.00%> (ø)
pkg/ddc/goosefs/replicas.go 0.00% <0.00%> (ø)
pkg/ddc/goosefs/status.go 0.00% <0.00%> (ø)
pkg/ddc/goosefs/transform.go 0.00% <0.00%> (ø)
pkg/ddc/goosefs/transform_hadoop_config.go 0.00% <0.00%> (ø)
pkg/ddc/goosefs/types.go 100.00% <ø> (ø)
pkg/ddc/goosefs/types_selector.go 90.00% <ø> (ø)
... and 90 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update df7e5cf...681a908. Read the comment docs.

@xieydd xieydd force-pushed the add-goosefs-runtime branch from ec806f5 to 4eec434 Compare July 9, 2021 09:43
@cheyang cheyang requested a review from yangyuliufeng July 14, 2021 07:02
@xieydd xieydd force-pushed the add-goosefs-runtime branch from 6dab0ae to 614a6b4 Compare July 14, 2021 08:02
case common.ALLUXIO_RUNTIME:
imageName, imageTag = docker.GetWorkerImage(r.Client, databackup.Spec.Dataset, "alluxio", databackup.Namespace)
javaEnv = "-Dalluxio.master.hostname=" + ip + " -Dalluxio.master.rpc.port=" + strconv.Itoa(int(rpcPort))
runtimeType = "alluxio"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use common.ALLUXIO_RUNTIME as runtimeType ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@cheyang cheyang requested a review from TrafalgarZZZ July 14, 2021 10:53
ctx.Log.Error(err, "Failed to get the ddc runtime")
return utils.RequeueIfError(errors.Wrap(err, "Unable to get ddc runtime"))
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest this error processing logic :

return ctrl.Result{},errors.Wrapf(utils.IgnoreNotFound(err),"failed to get ddc runtime %s",ctx.NamespacedName)

utils.RequeueIfError there is no need for encapsulation, and it is not very readable

ctx.Log.V(1).Info("process the request", "request", req)

// 1.Load the Runtime
runtime, err := r.getRuntime(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getRuntime there is no need for encapsulation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, i am not clear your question. Can you describe more details? Thanks a lot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that getRuntime is a simple logic, and we don't really need to wrap a function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though the logic is sample, but i think warp function is ok for me : ) . WDYT @cheyang

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think getRuntime is need for encapsulation because every developer need to implement their own getRuntime function

FINALIZER = "fluid-databackup-controller-finalizer"
ALLUXIO_BACPUP_PATH_POD = "/alluxio_backups"
GOOSEFS_BACPUP_PATH_POD = "/goosefs_backups"
DATABACKUP_CHART = "fluid-databackup"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CamelCase ? CodeReviewComments

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a folder name, we can follow the naming habits of folders. : )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the name of the variable, not the value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, clear, lgtm

@@ -297,6 +297,17 @@ func GetRuntimeInfo(client client.Client, name, namespace string) (RuntimeInfoIn
}
runtimeInfo.SetupFuseDeployMode(jindoRuntime.Spec.Fuse.Global, jindoRuntime.Spec.Fuse.NodeSelector)
return runtimeInfo, nil
case "goosefs":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unifies a constant called common.GooseFS ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tks, lgtm

@@ -8,6 +8,7 @@ CRD_OPTIONS ?= "crd:trivialVersions=true"
DATASET_CONTROLLER_IMG ?= registry.aliyuncs.com/fluid/dataset-controller
ALLUXIORUNTIME_CONTROLLER_IMG ?= registry.aliyuncs.com/fluid/alluxioruntime-controller
JINDORUNTIME_CONTROLLER_IMG ?= registry.aliyuncs.com/fluid/jindoruntime-controller
GOOSEFSRUNTIME_CONTROLLER_IMG ?= registry.aliyuncs.com/fluid/goosefsruntime-controller
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for updating it.

release: {{ .Release.Name }}
role: dataload-job
data:
dataloader.init: |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to change to dataloader.goosefs.init to be in parallel with the other two runtimes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -243,7 +243,7 @@ func (e *AlluxioEngine) parseFuseImage(image string, tag string, imagePullPolicy
}

func (e *AlluxioEngine) GetMetadataInfoFile() string {
return cdatabackup.BACPUP_PATH_POD + "/" + e.GetMetadataInfoFileName()
return cdatabackup.ALLUXIO_BACPUP_PATH_POD + "/" + e.GetMetadataInfoFileName()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo here. Should be ALLUXIO_BACKUP_PATH_POD and GOOSEFS_BACKUP_PATH_POD

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tks, i will fix it.

@@ -0,0 +1,7 @@
FROM centos:centos8.2.2004
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need runtime-specific scripts to do initialization like init-users.sh or chmod_fuse_mountpoint.sh?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of maintainability and extensibility, I feel like runtime-specific will be better. WDYT @cheyang

Copy link
Member

@yangyuliufeng yangyuliufeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good code.
1、In the future, we can refactor the logic of backup to logic of engine like dataload.
2、After discuss, we decide to change the name of the constant to CamelCase
3. In the past, there were many irregularities in naming when developing AlluxioRuntime. During your developing, it’s not necessary to copy the original naming convention(such as the name is same as package)

Comment on lines 95 to 101
func main() {
if err := cmd.Execute(); err != nil {
fmt.Fprintf(os.Stderr, "%s", err.Error())
os.Exit(1)
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the discuss, we decide to split main to multi files.
You can refer to main of webhook

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may be not in this PR. we can do this in future. Thanks.

Comment on lines 5 to 19
GOOSEFS_RUNTIME = "goosefs"

GOOSEFS_MOUNT_TYPE = "fuse.goosefs-fuse"

GOOSEFS_NAMESPACE = "goosefs-system"

GOOSEFS_CHART = GOOSEFS_RUNTIME

GOOSEFS_RUNTIME_IMAGE_ENV = "GOOSEFS_RUNTIME_IMAGE_ENV"

GOOSEFS_FUSE_IMAGE_ENV = "GOOSEFS_FUSE_IMAGE_ENV"

DEFAULT_GOOSEFS_RUNTIME_IMAGE = "registry.aliyuncs.com/fluid/goosefs:v1.0.1"

DEFAULT_GOOSEFS_FUSE_IMAGE = "registry.aliyuncs.com/fluid/goosefs-fuse:v1.0.1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the discuss, we decide to use CamelCase in the future

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

RunAs *User `json:"runAs,omitempty"`

// Disable monitoring for GooseFS Runtime
// Promethous is enabled by default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prometheus

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tjs, i will fix it.

ctx.Log.V(1).Info("process the request", "request", req)

// 1.Load the Runtime
runtime, err := r.getRuntime(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think getRuntime is need for encapsulation because every developer need to implement their own getRuntime function

}

e.Log.Info("check the desired and current replicas",
"desriedReplicas", replicas,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

desried

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tks , i will fix it.


// transform dataset which has ufsPaths and ufsVolumes
func (e *GooseFSEngine) transformInitUsers(runtime *datav1alpha1.GooseFSRuntime, value *GooseFS) {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think every independent function need to check nil pointer at the beginning, not only transform.go

@@ -260,7 +260,7 @@ func (e *AlluxioEngine) syncMetadataInternal() (err error) {
// sync local dir if necessary
for _, mount := range dataset.Spec.Mounts {
if common.IsFluidNativeScheme(mount.MountPoint) {
localDirPath := utils.UFSPathBuilder{}.GenLocalStoragePath(mount)
localDirPath := utils.UFSPathBuilder{}.GenAlluxioLocalStoragePath(mount)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about GenAlluxioLocalStoragePath -> GenLocalStoragePathForAlluxio ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@@ -73,7 +73,8 @@ func GetAccessModesOfDataset(client client.Client, name, namespace string) (acce
func IsTargetPathUnderFluidNativeMounts(targetPath string, dataset datav1alpha1.Dataset) bool {
for _, mount := range dataset.Spec.Mounts {

mPath := UFSPathBuilder{}.GenAlluxioMountPath(mount, dataset.Spec.Mounts)
mAlluxioPath := UFSPathBuilder{}.GenAlluxioMountPath(mount, dataset.Spec.Mounts)
mGooseFSPath := UFSPathBuilder{}.GenGooseFSMountPath(mount, dataset.Spec.Mounts)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wonder what does the mGooseFSPath look like? Do we need duplicate GenGooseFSUFSRootPath ? What's your opinion? @allenhaozi

Copy link
Contributor

@allenhaozi allenhaozi Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a DataSet be used by two Runtime simultaneously ?
Or whether GooseFS can follow the same rules as Aluxio,

So we don't need a duplicate GenXXXMountPath ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, Dataset should can be used for different type runtime.
Secondly, at present, GooseFS can follow ths same rules as Alluxio, but the future may be different.

@@ -41,6 +41,22 @@ func (u UFSPathBuilder) GenAlluxioMountPath(curMount datav1alpha1.Mount, mounts
return fmt.Sprintf(common.AlluxioMountPathFormat, curMount.Name)
}

// dataset.spec.mounts mount to goosefs instance strategy:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concern on this package. Seems it's the utils for alluxio or Goose instead of all the runtime. If we really think it's not common. Maybe the common part can be placed in UFSPathBuilder. How about provide AlluxioUFSPathBuilder and GooseUFSPathBuilder?

type GooseUFSPathBuilder struct{
*UFSPathBuilder
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @allenhaozi WDYT

Copy link
Contributor

@allenhaozi allenhaozi Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to know the difference ?
Whether they can all follow one rule ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At present, it seems that it can be consistent, but we want to maintain expansibility and isolation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fine to be consistent now? We can support GooseUFSPathBuilder in future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will seperate it in the future. @cheyang

v1.EventTypeNormal,
common.RuntimeNotReady,
"Bounded accelerate runtime not supported")
}

if len(imageName) == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we wrap docker.GetImageTagFromEnv and docker.GetImageRepoFromEnv in docker.GetWorkerImage function ?

imageName, imageTag := docker.GetWorkerImage(r.Client, databackup.Spec.Dataset, runtimeType, databackup.Namespace)
image := fmt.Sprintf("%s:%s", imageName, imageTag)

Do we have the case where the image name and the image tag come from different places ?
Or can it be avoided ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we get image info from configmap and host env, will double check. But i think it is too strict.

defaultImageInfo := strings.Split(common.DEFAULT_ALLUXIO_RUNTIME_IMAGE, ":")
if len(defaultImageInfo) < 2 {
defaultImageInfo := strings.Split(defaultImage, ":")
if len(defaultImageInfo) < 1 {
panic("invalid default databackup image!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it have to be a panic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tks, i think it is necessary be panic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here if it's panic and what we see ? pod restart ?

return endpoint, err
}

if svc == nil {
Copy link
Contributor

@allenhaozi allenhaozi Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a situation where err and svc are both nil ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean you have right endpoint in kubernetes cluster but err and svc is still nil?

)

const (
runtimeType = "goosefs"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common.GooseFSRuntime.String() ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common.GooseFSRuntime, lgtm

func GetAddressOfMaster(pod *v1.Pod) (nodeName string, ip string, rpcPort int32) {
// TODO: Get address of master by calling runtime controller interface instead of reading pod object
nodeName = pod.Spec.NodeName
ip = pod.Status.HostIP
for _, container := range pod.Spec.Containers {
if container.Name == "alluxio-master" {
if container.Name == "alluxio-master" || container.Name == "goosefs-master" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest:
wrap a function with ut

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

value = q.String()

if strings.HasSuffix(value, "i") {
value = strings.ReplaceAll(value, "i", "B")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic strings.ReplaceAll means that 1Ki == 1KB is true ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, Ki is legal for kubernetes and KB is legal for Alluxio and GooseFS.

@xieydd xieydd force-pushed the add-goosefs-runtime branch from 046ffef to 3f04dff Compare July 17, 2021 16:31
@xieydd
Copy link
Member Author

xieydd commented Jul 17, 2021

@cheyang @TrafalgarZZZ @allenhaozi @yangyuliufeng Thanks for taking time for reviewing, i have fix some comment. Can we check the pr again. If any problem, ping me is welcome : )

@xieydd xieydd force-pushed the add-goosefs-runtime branch 3 times, most recently from 43cf762 to 3febac9 Compare July 19, 2021 04:31
xieydd added 8 commits July 20, 2021 00:09
Signed-off-by: xieydd <xieydd@gmail.com>

resolve ci bug

Signed-off-by: xieydd <xieydd@gmail.com>

Sync test

Signed-off-by: xieydd <xieydd@gmail.com>
…engine type

Signed-off-by: xieydd <xieydd@gmail.com>
Signed-off-by: xieydd <xieydd@gmail.com>
Signed-off-by: xieydd <xieydd@gmail.com>
Signed-off-by: xieydd <xieydd@gmail.com>
Signed-off-by: xieydd <xieydd@gmail.com>
Signed-off-by: xieydd <xieydd@gmail.com>
Signed-off-by: xieydd <xieydd@gmail.com>
@xieydd xieydd force-pushed the add-goosefs-runtime branch from da35497 to 681a908 Compare July 19, 2021 16:09
@cheyang
Copy link
Collaborator

cheyang commented Jul 20, 2021

Thanks, @xieydd . Great work!

@cheyang cheyang merged commit 9a965ec into fluid-cloudnative:master Jul 20, 2021
@RongGu
Copy link
Member

RongGu commented Jul 20, 2021

Thank your for contributing the Tecent GooseFS Cache Engine as a Fuild backend. @xieydd !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants