Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Driver does not work for DIND Kubernetes (important since standard dev environment for FfDL) #14

Open
fplk opened this issue Jan 14, 2019 · 2 comments

Comments

@fplk
Copy link
Contributor

fplk commented Jan 14, 2019

I just PRed helm charts for the driver which work fine for IBM Cloud. However, both with the helm charts and with the manual driver installation I run into issues for DIND: The deployers will go into Error and CrashLoopBackOff after a while. I think the underlying error is:

+ chmod 700 /host/root/.ssh/
+ chmod 600 /host/root/.ssh/authorized_keys
+ touch /host/etc/ssh/sshd_config
touch: cannot touch '/host/etc/ssh/sshd_config': No such file or directory

The following workaround (as described in https://github.com/fplk/ffdl-trainer/blob/dind_support/bin/README.md) helped me deploy the driver on DIND:

# Create storage volumes
cd ${GOPATH}/src/github.com/AISphere/ffdl-trainer/bin/cos_storage_driver
make deploy-nfs-volume
make setup-cos-plugin
make create-volumes

# Currently necessary to copy over library
docker cp 5f64e5843b30:/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 ../libcrypto.so.1.1

declare -a arrNodes=($(docker ps --format '{{.Names}}' | grep "kube-node-\|kube-master"))
for node in "${arrNodes[@]}"
do
docker cp /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 ${node}:/root/libcrypto.so.1.0.0
docker cp ~/libcrypto.so.1.1 ${node}:/root/libcrypto.so.1.1
docker exec -i ${node} /bin/bash <<_EOF
cp /root/libcrypto.so.1.0.0 /lib/x86_64-linux-gnu/
cp /root/libcrypto.so.1.1 /lib/x86_64-linux-gnu/
_EOF
done

The main idea is that the libcrypto libraries are missing and so I copy them over (s3fs is compiled and copied inside the Make targets I call first). Once the binary and libraries are manually deployed, I can just launch the plugin and it will work.

However, it would be better if you could integrate this fix with the deployer so we can just use the helm charts from #13 for all relevant targets. Since DIND is the main local development environment for FfDL it would be great if you could test against it.

@fplk
Copy link
Contributor Author

fplk commented Jan 15, 2019

Ultimately, it would be great if the helm charts from https://github.ibm.com/watson-foundation-services/kubernetes-cluster-admin/issues/2366 would work. However, when I try the last step throws the following error:

ERROR: Cluster provider is not supported. Exiting!!!

Any thoughts on this? Could you add support for DIND?

@MohammedFadin
Copy link

MohammedFadin commented Jun 25, 2019

@fplk I'm facing the same issue. I'm not sure why this helm chart, hasn't been updated until now, yet, It's included in the official documentation!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants