Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[native] PrestoCpp build from source pipeline #18572

Merged
merged 4 commits into from
Nov 28, 2022

Conversation

Mionsz
Copy link
Contributor

@Mionsz Mionsz commented Oct 27, 2022

Fully automated build from source process proposal for presto-native-execution (PrestoCpp and Velox).
README file added for clarification. appreciate any and all of the feedback.

Prestissimo - Dockerfile build

💡 PrestoDB repository: Presto - https://github.com/prestodb/presto

💡 Velox repository: Velox - https://github.com/facebookincubator/velox

Practical Velox implementation using PrestoCpp

📝 Note: This readme and the build process was adapted from internal pipeline. You can e-mail the author if you've got questions milosz.linkiewicz@intel.com

Prestissimo, marked in PrestoDB GitHub repository as 'presto-native-execution', is effort of making PrestoDB even better using Velox library as a starting point. Both of mentioned - PrestoCpp and Velox - are mainly written using low level C and C++ 17 languages, which makes the build-from-scratch process humongously complicated. To make this process simple, Intel Cloud Native Data Services Team is introducing 3-stage, fully automated Docker build process based on unmodified project GitHub repository.

Quick Start

1. Clone this repository

git clone https://github.com/prestodb/presto prestodb

2. (Optional) Define and export Docker registry, image name and image tag variables

📝 Note: Remember to end your IMAGE_REGISTRY with / as this is required for full tag generation.

💡 Tip: Depending on your configuration you may need to run all bellow commands as root user, to switch type as your first command sudo su

💡 Tip: If IMAGE_REGISTRY is not specified IMAGE_PUSH should be set '0' or docker image push stage will fail.

Type in you console, changing variables values to meet your needs:

# defaults to 'avx', more info on Velox GitHub
export CPU_TARGET="avx"
# defaults to 'presto/prestissimo-${CPU_TARGET}-centos'
export IMAGE_NAME='presto/prestissimo-${CPU_TARGET}-centos'
# defaults to 'latest'
export IMAGE_TAG='latest'
# defaults to ''
export IMAGE_REGISTRY='https://my_docker_registry.com/'
# defaults to '0'
export IMAGE_PUSH='0'

3. Make sure Docker daemon is running

(Ubuntu users) Type in your console:

systemctl status docker

4. Build Dockerfile repo

Type in your console:

cd prestodb/presto-native-execution
make runtime-container

The process is fully automated and require no interaction for user. The process of building images for the first time can take up to couple of hours (~1-2h using 10 processor cores).

5. Run container

📝 Note: Remember that you should start Presto JAVA server first

Depending on values you have set the container tag is defined as

PRESTO_CPP_TAG="${IMAGE_REGISTRY}${IMAGE_NAME}:${IMAGE_TAG}"

for default values this will be just:

PRESTO_CPP_TAG=presto/prestissimo-avx-centos:latest

to run container build with default tag execute:

docker run "presto/prestissimo-avx-centos:latest" \
            --use-env-params \
            --discovery-uri=http://localhost:8080 \
            --http-server-port=8080"

to run container interactively, not executing entrypoint file:

docker run -it --entrypoint=/bin/bash "presto/prestissimo-avx-centos:latest"

Container manual build

For manual build outside Intel network or without access to Cloud Native Data Services Poland Docker registry follow the steps bellow.
In you terminal - with the same session that you want to build the images - define and export environment variables:

export CPU_TARGET="avx"
export IMAGE_NAME='presto/prestissimo-${CPU_TARGET}-centos'
export IMAGE_TAG='latest'
export IMAGE_REGISTRY='some-registry.my-domain.com/'
export IMAGE_PUSH='0'
export PRESTODB_REPOSITORY=$(git config --get remote.origin.url)
export PRESTODB_CHECKOUT=$(git show -s --format="%H" HEAD)

Where IMAGE_NAME and IMAGE_TAG will be the prestissimo release image name and tag, IMAGE_REGISTRY will be the registry that the image will be tagged with and witch will be used to download the images from previous stages in case there are no cached images locally. The CPU_TARGET will be unchanged for most of the cases, for more info read the Velox documentation. The PRESTODB_REPOSITORY and PRESTODB_CHECKOUT will be used as a build repository and branch inside the container. You can set them manually or as provided using git commands.

Then for example to build containers when being behind a proxy server, change dir to and type:

cd presto-native-execution/scripts/release-centos-dockerfile
docker build \
    --network=host \
    --build-arg http_proxy  \
    --build-arg https_proxy \
    --build-arg no_proxy    \
    --build-arg CPU_TARGET  \
    --build-arg PRESTODB_REPOSITORY \
    --build-arg PRESTODB_CHECKOUT \
    --tag "${IMAGE_REGISTRY}${IMAGE_NAME}:${IMAGE_TAG}" .

Build process - more info - prestissimo (with artifacts ~35 GB, without ~10 GB)

Most of runtime and build time dependencies are downloaded, configured and installed in this step. The result from this step is a starting point for both second and third stage. This container will be build 'once per breaking change' in any of repositories. It can be used as starting point for Ci/Cd integrated systems.
This step install Maven, Java 8, Python3-Dev, libboost-dev and lots of other massive frameworks, libraries and applications and ensures that all of steps from 2 stage will run with no errors.

On-top of container from step 1 repository is initialized, Velox and submodules are updated, adapters, connectors and side-dependencies are build and configured. PrestoDB native, full repository build, using Meta wrapper mvnw for Maven is being done. After all of those partial steps, make and build are being run for PrestoCpp and Velox with Parquet, ORC, Hive connector with Thrift with S3-EMRFS filesystem implementation (schema s3://) and Hadoop filesystem implementation.

### DIRECTORY AND MAIN BUILD ARTIFACTS
## Native Presto JAVA build artifacts:
/root/.m2/

## Build, third party dependencies, mostly for adapters
/opt/dependency/
/opt/dependency/aws-sdk-cpp
/opt/dependency/install/
/opt/dependency/install/run/
/opt/dependency/install/bin/
/opt/dependency/install/lib64/

## Root PrestoDB application directory
/opt/presto/

## Root GitHub clone of PrestoDB repository
/opt/presto/_repo/

## Root PrestoCpp subdirectory
/opt/presto/_repo/presto-native-execution/

## Root Velox GitHub repository directory, as PrestoDB submodule
/opt/presto/_repo/presto-native-execution/Velox

## Root build results directory for PrestoCpp with Velox
/opt/presto/_repo/presto-native-execution/_build/release/
/opt/presto/_repo/presto-native-execution/_build/release/velox/
/opt/presto/_repo/presto-native-execution/_build/release/presto_cpp/

Release container build - mostly with only the must-have runtime files, including presto_server build presto executable and some libraries. What will be used in the final released container depends on user needs and can be adjusted.

Prestissimo - runtime configuration and settings

⚠️ _Notice: Presto-native-execution binary requires 32Gb of RAM at runtime to start (default settings). To override this and overcome runtime error add node.memory_gb=8 line in node.properties.

Presto server with all dependencies can be found inside /opt/presto/, runtime name is presto_server. There are 2 ways of starting PrestoCpp using provided entry point /opt/entrypoint.sh.

1) Quick start - pass parameters to entrypoint

This is valid when running using docker and using kubernetes. It is not advised to use this method. User should prefer mounting configuration files using Kubernetes.

"/opt/entrypoint.sh --use-env-params --discovery-uri=http://presto-coordinaator.default.svc.cluster.local:8080 --http-server-port=8080"

2) Using in Kubernetes environment:

Mount config file inside a container as /opt/presto/node.properties.template. Replace each variable with you configuration values or leave it as is:

Notice: set up same values for JAVA coordinator as for prestoCpp - version, location and environment should be the same or you will get connection errors.

presto.version=0.273.3
node.location=datacenter-warsaw
node.environment=test-environment
node.data-dir=/var/presto/data
catalog.config-dir=/opt/presto/catalog
plugin.dir=/opt/presto/plugin
# node.id is generated and filled during machine startup if not specified

Mount config file inside a container as /opt/presto/config.properties.template. Replace each variable with you configuration values:

coordinator=false
http-server.http.port=8080
discovery.uri=http://presto-coordinaator.default.svc.cluster.local:8080

3) Hive-Metastore connector and S3 configuration:

For minimum required configuration just mount file /opt/presto/catalog/hive.properties inside container at give path (fill hive.metastore.uri with you metastore endpoint address):

connector.name=hive-hadoop2
hive.metastore.uri=thrift://hive-metastore-service.default.svc:9098
hive.pushdown-filter-enabled=true
cache.enabled=true

Setting required by S3 connector and Velox query engine, replace with your values, reefer to presto hive connector settings help:

hive.s3.path-style-access={{ isPathstyle }}
hive.s3.endpoint={{ scheme }}://{{ serviceFqdnTpl . }}:{{ portRest }}
hive.s3.aws-access-key={{ accessKey }}
hive.s3.aws-secret-key={{ secretKey }}
hive.s3.ssl.enabled={{ sslEnabled }}
hive.s3select-pushdown.enabled={{ s3selectPushdownFilterEnabled }}
hive.parquet.pushdown-filter-enabled={{ parquetPushdownFilterEnabled }}

image

Signed-off-by: Linkiewicz, Milosz milosz.linkiewicz@intel.com

@Mionsz Mionsz requested a review from a team as a code owner October 27, 2022 20:44
@Mionsz Mionsz force-pushed the private/mlinkiew/docker_release_build branch 3 times, most recently from 3f862d5 to 7d539c2 Compare October 27, 2022 22:25
@Mionsz
Copy link
Contributor Author

Mionsz commented Oct 31, 2022

@majetideepak what do you think

@Mionsz
Copy link
Contributor Author

Mionsz commented Oct 31, 2022

@mbasmanova

@mbasmanova
Copy link
Contributor

CC: @mshang816 @majetideepak Michael, Deepak, would you take a look?

@mbasmanova
Copy link
Contributor

CC: @raulcd

@mbasmanova
Copy link
Contributor

CC: @kgpai

@Mionsz
Copy link
Contributor Author

Mionsz commented Oct 31, 2022

This pipeline builds whole PrestoCpp from source in a container. I almost finished modifying the branch from which it is being build now to be included in this PR. I will finish the Job in 2-3 days from now as there is a freed day tomorrow in Poland :)

edit: here is the branch: master...Mionsz:presto:private/mlinkiew/docker_release_build_work

@Mionsz
Copy link
Contributor Author

Mionsz commented Nov 2, 2022

@mbasmanova I have got first reply from user that managed to build prestoCpp from source using this script - #18497

@mbasmanova
Copy link
Contributor

@mbasmanova I have got first reply from user that managed to build prestoCpp from source using this script - #18497

Nice. What did they say?

Copy link
Contributor

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Mionsz ! Thanks! I tried to build using the README but should we refer on the README to build using the build-images.sh script or using the Makefile target directly instead of the docker command manually? I tried to follow the README to build but for example there's no reference on how to build the second image (release-dockerfile/Dockerfile). I'll give a try to build using the provided Makefile target instead of following the README.

@Mionsz
Copy link
Contributor Author

Mionsz commented Nov 3, 2022

@mbasmanova I have got first reply from user that managed to build prestoCpp from source using this script - #18497

Nice. What did they say?

They have managed to build the prestocpp using make command :)

@Mionsz Mionsz force-pushed the private/mlinkiew/docker_release_build branch 3 times, most recently from 96cf333 to beab4d3 Compare November 4, 2022 10:51
@Mionsz
Copy link
Contributor Author

Mionsz commented Nov 4, 2022

@raulcd I have dramatically simplified the process - can you take a look?

@Mionsz Mionsz force-pushed the private/mlinkiew/docker_release_build branch 2 times, most recently from 63bbf72 to efac05f Compare November 5, 2022 06:53
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 5, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: Mionsz / name: Miłosz Linkiewicz (99cd291, 048bb77629bde8247c04a7b34e0fdb5abeafa71f)

@Mionsz Mionsz force-pushed the private/mlinkiew/docker_release_build branch 4 times, most recently from 24f3d59 to 048bb77 Compare November 8, 2022 10:59
@raulcd
Copy link
Contributor

raulcd commented Nov 8, 2022

@raulcd I have dramatically simplified the process - can you take a look?

I have built successfully using make runtime-container

@Mionsz
Copy link
Contributor Author

Mionsz commented Nov 19, 2022

@mbasmanova @majetideepak @raulcd @kgpai are we considering moving this forward and addressing issues in new PR as this is becoming big and hard to maintain?
edit: By the way I have added MacOS workflow in the script if anyone would like to test.

@Mionsz
Copy link
Contributor Author

Mionsz commented Nov 19, 2022

@kesavkolla Try with this build and latest commits as those fixes lots of possible issues

@kgpai
Copy link
Contributor

kgpai commented Nov 19, 2022

@Mionsz Thanks, I get another failure now though :

#11 951.2 make[1]: Leaving directory '/opt/dependency/gperf-3.1/doc'
#11 951.2 + ln -s /usr/local/gperf/3_1/bin/gperf /usr/local/bin/
#11 951.2 + git clone https://github.com/facebook/folly
#11 951.2 fatal: destination path 'folly' already exists and is not an empty directory.

This happens when building presto_native after velox. Recently some changes readded folly to the setup so I believe thats the reason for this failure.

@mbasmanova @majetideepak @raulcd @kgpai are we considering moving this forward and addressing issues in new PR as this is becoming big and hard to maintain?
I think we should attempt to get this PR in a good shape before we can merge.

@Mionsz
Copy link
Contributor Author

Mionsz commented Nov 19, 2022

@Mionsz Thanks, I get another failure now though :

#11 951.2 make[1]: Leaving directory '/opt/dependency/gperf-3.1/doc'
#11 951.2 + ln -s /usr/local/gperf/3_1/bin/gperf /usr/local/bin/
#11 951.2 + git clone https://github.com/facebook/folly
#11 951.2 fatal: destination path 'folly' already exists and is not an empty directory.

This happens when building presto_native after velox. Recently some changes readded folly to the setup so I believe thats the reason for this failure.

@mbasmanova @majetideepak @raulcd @kgpai are we considering moving this forward and addressing issues in new PR as this is becoming big and hard to maintain?
I think we should attempt to get this PR in a good shape before we can merge.

Nah, you should fetch the repository and use this commit for building - i have fixed this couple of hours ago. They have reverted the folly build in scripting - so the ! notation in our centos build is enough, it will ignore Folly clone errors and continue building.
image

@raulcd
Copy link
Contributor

raulcd commented Nov 21, 2022

Hi,
As there has been several changes on this PR, I've come back to try again.

I've been able to build the current PR on an Ubuntu 22.04.
Using the following environment variables:

export CPU_TARGET="avx"
export IMAGE_NAME='presto/prestissimo-avx-centos'
export IMAGE_TAG='latest'
export IMAGE_PUSH='0'
export PRESTODB_REPOSITORY=https://github.com/Mionsz/presto
export PRESTODB_CHECKOUT=2b8f6ca9364a8c8a2d3d3d0fdc2e1c6bde1591dd

Executing:

make runtime-container

I only had to modify DOCKER_BUILDKIT to 0 to avoid some ssh requirements:

-export DOCKER_BUILDKIT=1
+export DOCKER_BUILDKIT=0

When running the container using:

$ docker run presto/prestissimo-avx-centos:latest --use-env-params --discovery-uri=http://presto-coordinaator.default.svc.cluster.local:8080 --http-server-port=8080

I got the following error:

+ /opt/presto//presto_server --logtostderr=1 --v=1 -
ERROR: something wrong with flag 'folly_hazptr_use_executor' in file '/opt/dependency/folly/folly/synchronization/Hazptr.cpp'.  One possibility: file '/opt/dependency/folly/folly/synchronization/Hazptr.cpp' is being linked both statically and dynamically into this executable.

The folder /opt/dependency does not seem to exist in the container:

$ docker run --entrypoint /bin/bash -it presto/prestissimo-avx-centos:latest
bash-4.4$ ls -lrt /opt/   
total 12
-rwxrwxr-x 1 presto presto 3395 Nov 21 15:31 entrypoint.sh
-rwxrwxr-- 1 presto presto 3924 Nov 21 15:31 common.sh
drwxrwxr-x 1 presto root   4096 Nov 21 15:50 presto

I am confused because if I check the build logs folly seems to be installed as a system dependency as per log when building velox:

-- Found folly: /usr/local

and inside the container I can see:

$ docker run --entrypoint /bin/bash -it presto/prestissimo-avx-centos:latest
bash-4.4$ ls -lrt  /usr/local/lib/libfolly*
-rw-r--r-- 1 presto presto   5702634 Nov 21 15:17 /usr/local/lib/libfolly_test_util.a
-rw-r--r-- 1 presto presto   2140834 Nov 21 15:17 /usr/local/lib/libfolly_exception_tracer_base.a
-rw-r--r-- 1 presto presto   2729498 Nov 21 15:17 /usr/local/lib/libfolly_exception_tracer.a
-rw-r--r-- 1 presto presto   2770450 Nov 21 15:17 /usr/local/lib/libfolly_exception_counter.a
-rw-r--r-- 1 presto presto 492457304 Nov 21 15:17 /usr/local/lib/libfolly.a
-rw-r--r-- 1 presto presto   6926458 Nov 21 15:17 /usr/local/lib/libfollybenchmark.a

@Mionsz
Copy link
Contributor Author

Mionsz commented Nov 21, 2022

@raulcd Could you try building it now? I have made simple fix.
I had the same issue some time ago and as far as I remember adding -DBUILD_SHARED_LIBS=ON should help but that would need a Velox PR. I have just added a sed command for commenting Folly wget and cmak lines from Velox and it should be build by PrestoCpp script with proper flags.

There is a real need of pipeline between Velox and PrestoCpp so that one does not break another, this starts to get annoying as main focus seems like keeping up with and fixing the breaking changes :-/

@kgpai
Copy link
Contributor

kgpai commented Nov 21, 2022

@Mionsz I was able to build the docker image fine with your latest changes. I am going to test the image out.

@raulcd
Copy link
Contributor

raulcd commented Nov 22, 2022

@raulcd Could you try building it now?

I have tried and I can run presto now. I had to limit the node.memory_gb but all worked now for me. Thanks for the fix!

@Mionsz
Copy link
Contributor Author

Mionsz commented Nov 22, 2022

@raulcd no problem, I am glad it helped :-)!
So you hade to add something like node.memory_gb=8 in node.properties yes? As the default is set to 32 as far as I remember. I will include it later in README as this can be a common type of error.

@raulcd
Copy link
Contributor

raulcd commented Nov 22, 2022

@Mionsz yes. I added this to node.properties via the entrypoint.sh file:

+++ b/presto-native-execution/scripts/release-centos-dockerfile/opt/entrypoint.sh
@@ -55,6 +55,7 @@ function node_command_line_config()
   printf "node.location=torun-cluster\n"  >> "${PRESTO_HOME}/node.properties"
   printf "node.id=${NODE_UUID}\n"        >> "${PRESTO_HOME}/node.properties"
   printf "node.ip=$(hostname -I)\n"      >> "${PRESTO_HOME}/node.properties"
+  printf "node.memory_gb=8\n" >> "${PRESTO_HOME}/node.properties"
 }

@Mionsz
Copy link
Contributor Author

Mionsz commented Nov 22, 2022

@raulcd check the common.sh and entrypoint.sh now :-). I have added automated way for handling the memory 32 GB constrain.

Leaving the ENV variable undefined allows script for probing a valid for runtime machine configuration by trying to lower node memory starting from 32 down to 4.
If the value was set in any way - by parameter, env variable (script strictly test is value is valid) or when attaching prefilled template - script ignores and does not override.
Edit:
For testing the best option now is just to pass --node-memory-gb=8 as a parameter to the script.

Mionsz and others added 4 commits November 23, 2022 13:58
Fully automated build process for PrestoCpp and Velox from source.
README file added for clarification.

Signed-off-by: Linkiewicz, Milosz <milosz.linkiewicz@intel.com>
…andline flags provided nor config file mounted.

Fixes referencing reviewers suggestions
FBThrift paths issue fix
Added python six required dependency for FBThrift
Add cache registry parameter for base image.
Added MacOS scripting full support
GitHub SSH cloned repository origin support
Minor improvements

Co-authored-by: Raúl Cumplido <raulcumplido@gmail.com>
Signed-off-by: Linkiewicz, Milosz <milosz.linkiewicz@intel.com>
Signed-off-by: Linkiewicz, Milosz <milosz.linkiewicz@intel.com>
Due to high requirements for memory from PrestoCpp side
script for preflight checking have been introduced.
In case of user defined values for memory the constrains are stith
For default value script tries to lower 32GB to match hardware

Signed-off-by: Linkiewicz, Milosz <milosz.linkiewicz@intel.com>
@Mionsz Mionsz force-pushed the private/mlinkiew/docker_release_build branch from 9cd5d50 to 3d3a92d Compare November 23, 2022 13:59
@kgpai
Copy link
Contributor

kgpai commented Nov 27, 2022

@Mionsz Was able to test this out and works as expected.

@Mionsz
Copy link
Contributor Author

Mionsz commented Nov 28, 2022

@mbasmanova @majetideepak

@Mionsz
Copy link
Contributor Author

Mionsz commented Nov 28, 2022

Could you review @kgpai ?

@kgpai
Copy link
Contributor

kgpai commented Nov 28, 2022

@Mionsz Can you fix/validate that the other modules failure is unrelated ?

@kgpai kgpai closed this Nov 28, 2022
@kgpai kgpai reopened this Nov 28, 2022
@Mionsz
Copy link
Contributor Author

Mionsz commented Nov 28, 2022

@kgpai Yes. Let us wait for the build result and In case something failed I will check/fix it.

@Mionsz
Copy link
Contributor Author

Mionsz commented Nov 28, 2022

@kgpai The only one seems to have an issue is a Maven JAVA test. It is totally unrelated as no changes beyond presto-native-execution were made:
image

Edit: BTW other branches occurs have passed this but still have error, why?:
https://github.com/prestodb/presto/actions/runs/3566313041

@Mionsz
Copy link
Contributor Author

Mionsz commented Nov 28, 2022

@kgpai Notice there is no longer warning/issue with fbthrift present in other builds:

https://github.com/prestodb/presto/actions/runs/3566915657/jobs/5993964104#step:8:67

-- Found FBThrift: /usr/local

https://github.com/prestodb/presto/actions/runs/3566313044/jobs/5992587501#step:8:66

CMake Warning at CMakeLists.txt:159 (find_package):
  By not providing "FindFBTHRIFT.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "FBTHRIFT",
  but CMake did not find one.

  Could not find a package configuration file provided by "FBTHRIFT" with any
  of the following names:

    FBTHRIFTConfig.cmake
    fbthrift-config.cmake

  Add the installation prefix of "FBTHRIFT" to CMAKE_PREFIX_PATH or set
  "FBTHRIFT_DIR" to a directory containing one of the above files.  If
  "FBTHRIFT" provides a separate development package or SDK, be sure it has
  been installed.

@Mionsz
Copy link
Contributor Author

Mionsz commented Nov 28, 2022

@kgpai all other green :-)

@kgpai kgpai merged commit bffedbb into prestodb:master Nov 28, 2022
@wanglinsong wanglinsong mentioned this pull request Jan 12, 2023
30 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants