Skip to content

Commit

Permalink
Adding support for sanitised builds (#182)
Browse files Browse the repository at this point in the history
* adding sanitiser flags and workflow file

* adding files to diff against

* cleaning up workflow file

* run only on workflow dispatch

* properly use caching

* setting flag as default to false

* cache the right folder

* Fix ASAN_OPTIONS typo

* Try using cachev2 instead of artifact for conan dependencies

* Use a local conan cache folder when using the cli, just like in faasm

* UBSan fix: Make sure created int pointer are 4-aligned

* ASan,TSan: Latch destruction race-condition fix

* ASan: MPI World test buffer too small

* ASan: fix mismatched new[]-delete

* ASan: out-of-bound read, this was a simple copy-paste error

* ASan: Copy temporary strings in RapidJSON

* fix #183: Use unique_ptr for all redis reply objects so they get cleaned up on exceptions and any scope exit

* ASan/leak: use unique_ptr for pulled and dirtyMask to free the arrays automatically in the destructor

* Change the BYTES macro to a template to catch potential type errors

* Replace all other uses of new/delete with unique_ptr/vector.
Also minor bugfix to the redis api, dequeueBytes did not return the actual number of bytes read - fixed that and the corresponding test.

* Add a TSan ignore list

* TSan: Fix data race in Scheduler testing code

* Ignore zeromq races

* TSan: fix shared_ptr data races (in a shared_ptr the counter is atomic, but ptr is not, and requires explicit atomic accesses)

C++20 has a different api for this, but not yet supported by libstdc++

* Ignore config change data races, as they could only affect tests

* TSan: Missing lock on scheduler reset

* Ignore catch2 bad signal handler errors in tsan

* TSan: Fix racy executor cleanup/deadlock

* Missing locks in SnapshotClient mock code

* MockMode needs to be atomic to work across threads without a mutex

* Missing mutexes in FunctionCallClient mock code

* More missing locks on reads in the scheduler

* Missing locks on *ThisHostResources in Scheduler

* Skip mpi::allGather as I don't know what this exactly does

* Catch2 REQUIRE is not thread-safe, use assert

* TSan snapshot fixes

* Run TSan with maximum history size so it doesn't lose stack traces

* Missing shared lock in KV State

* Disable two tests that break in TSan builds

* Stop gid generation from tripping TSan using relaxed atomics

* Avoid racy destruction on FlagWaiter just like the Latch

* Missing lock on a queue push

* Fix formatting

* Address review comments

* Remove mode checking from python as it's already done by cmake

* Fix gha cache key, add restore-keys to fall back on cache misses and try a different key

* Another attempt at fixing checksum hashes, this time use bash instead of GHA builtins

* Show full test and sanitiser logs, it halts on errors so still should be readable

Co-authored-by: Jakub Szewczyk <git@kubasz.xyz>
  • Loading branch information
csegarragonz and eigenraven authored Nov 30, 2021
1 parent dba98f1 commit 27ffd58
Show file tree
Hide file tree
Showing 43 changed files with 755 additions and 314 deletions.
264 changes: 264 additions & 0 deletions .github/workflows/sanitisers.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,264 @@
name: Sanitiser Checks

on:
push:
branches: [master]
pull_request:
branches: [master]
types: [opened, synchronize, reopened, ready_for_review]

jobs:
conan-cache:
if: github.event.pull_request.draft == false
runs-on: ubuntu-latest
env:
HOST_TYPE: ci
REDIS_QUEUE_HOST: redis
REDIS_STATE_HOST: redis
container:
image: faasm/faabric:0.2.2
defaults:
run:
working-directory: /code/faabric
services:
redis:
image: redis
steps:
# --- Code update ---
- name: "Fetch ref"
run: git fetch origin ${GITHUB_REF}:ci-branch
- name: "Check out branch"
run: git checkout --force ci-branch
# --- Set-up ---
- name: "Ping redis"
run: redis-cli -h redis ping
# --- Cache based on Conan version and ExternalProjects cmake source
- name: Get Conan version
id: get-conan-version
run: |
echo "::set-output name=conan-version::$(conan --version | tr -d '[:alpha:] ')"
shell: bash
- name: Hash ExternalProjects
id: hash-external
run: |
echo "::set-output name=hash-external::$(sha256sum cmake/ExternalProjects.cmake | cut '-d ' -f 1)"
shell: bash
- uses: actions/cache@v2
with:
path: '~/.conan'
key: ${{ runner.os }}-${{ steps.get-conan-version.outputs.conan-version }}-${{ steps.hash-external.outputs.hash-external }}
restore-keys: |
${{ runner.os }}-${{ steps.get-conan-version.outputs.conan-version }}-
${{ runner.os }}-
# --- Build dependencies
- name: "Build dependencies to be shared by all sanitiser runs"
run: inv dev.cmake -b Debug

address-sanitiser:
if: github.event.pull_request.draft == false
needs: [conan-cache]
runs-on: ubuntu-latest
env:
HOST_TYPE: ci
REDIS_QUEUE_HOST: redis
REDIS_STATE_HOST: redis
container:
image: faasm/faabric:0.2.2
defaults:
run:
working-directory: /code/faabric
services:
redis:
image: redis
steps:
# --- Code update ---
- name: "Fetch ref"
run: git fetch origin ${GITHUB_REF}:ci-branch
- name: "Check out branch"
run: git checkout --force ci-branch
# --- Set-up ---
- name: "Ping redis"
run: redis-cli -h redis ping
# --- Cache based on Conan version and ExternalProjects cmake source
- name: Get Conan version
id: get-conan-version
run: |
echo "::set-output name=conan-version::$(conan --version | tr -d '[:alpha:] ')"
shell: bash
- name: Hash ExternalProjects
id: hash-external
run: |
echo "::set-output name=hash-external::$(sha256sum cmake/ExternalProjects.cmake | cut '-d ' -f 1)"
shell: bash
- uses: actions/cache@v2
with:
path: '~/.conan'
key: ${{ runner.os }}-${{ steps.get-conan-version.outputs.conan-version }}-${{ steps.hash-external.outputs.hash-external }}
restore-keys: |
${{ runner.os }}-${{ steps.get-conan-version.outputs.conan-version }}-
${{ runner.os }}-
# --- Build with thread sanitising
- name: "Build tests with address sanitising"
run: inv dev.sanitise Address --noclean
- name: "Run tests with address sanitising on and print to stderr to file"
run: ./bin/faabric_tests
working-directory: /build/faabric/static
env:
ASAN_OPTIONS: "verbosity=1:halt_on_error=1"

thread-sanitiser:
if: github.event.pull_request.draft == false
needs: [conan-cache]
runs-on: ubuntu-latest
env:
HOST_TYPE: ci
REDIS_QUEUE_HOST: redis
REDIS_STATE_HOST: redis
container:
image: faasm/faabric:0.2.2
defaults:
run:
working-directory: /code/faabric
services:
redis:
image: redis
steps:
# --- Code update ---
- name: "Fetch ref"
run: git fetch origin ${GITHUB_REF}:ci-branch
- name: "Check out branch"
run: git checkout --force ci-branch
# --- Set-up ---
- name: "Ping redis"
run: redis-cli -h redis ping
# --- Cache based on Conan version and ExternalProjects cmake source
- name: Get Conan version
id: get-conan-version
run: |
echo "::set-output name=conan-version::$(conan --version | tr -d '[:alpha:] ')"
shell: bash
- name: Hash ExternalProjects
id: hash-external
run: |
echo "::set-output name=hash-external::$(sha256sum cmake/ExternalProjects.cmake | cut '-d ' -f 1)"
shell: bash
- uses: actions/cache@v2
with:
path: '~/.conan'
key: ${{ runner.os }}-${{ steps.get-conan-version.outputs.conan-version }}-${{ steps.hash-external.outputs.hash-external }}
restore-keys: |
${{ runner.os }}-${{ steps.get-conan-version.outputs.conan-version }}-
${{ runner.os }}-
# --- Build with thread sanitising
- name: "Build tests with thread sanitising"
run: inv dev.sanitise Thread --noclean
# --- Tests ---
- name: "Run tests with thread sanitising on and print stderr to file"
run: ./bin/faabric_tests
working-directory: /build/faabric/static
env:
TSAN_OPTIONS: "verbosity=1:halt_on_error=1:suppressions=/code/faabric/thread-sanitizer-ignorelist.txt:history_size=7"

undefined-sanitiser:
if: github.event.pull_request.draft == false
needs: [conan-cache]
runs-on: ubuntu-latest
env:
HOST_TYPE: ci
REDIS_QUEUE_HOST: redis
REDIS_STATE_HOST: redis
container:
image: faasm/faabric:0.2.2
defaults:
run:
working-directory: /code/faabric
services:
redis:
image: redis
steps:
# --- Code update ---
- name: "Fetch ref"
run: git fetch origin ${GITHUB_REF}:ci-branch
- name: "Check out branch"
run: git checkout --force ci-branch
# --- Set-up ---
- name: "Ping redis"
run: redis-cli -h redis ping
# --- Cache based on Conan version and ExternalProjects cmake source
- name: Get Conan version
id: get-conan-version
run: |
echo "::set-output name=conan-version::$(conan --version | tr -d '[:alpha:] ')"
shell: bash
- name: Hash ExternalProjects
id: hash-external
run: |
echo "::set-output name=hash-external::$(sha256sum cmake/ExternalProjects.cmake | cut '-d ' -f 1)"
shell: bash
- uses: actions/cache@v2
with:
path: '~/.conan'
key: ${{ runner.os }}-${{ steps.get-conan-version.outputs.conan-version }}-${{ steps.hash-external.outputs.hash-external }}
restore-keys: |
${{ runner.os }}-${{ steps.get-conan-version.outputs.conan-version }}-
${{ runner.os }}-
# --- Build with thread sanitising
- name: "Build tests with undefined sanitising"
run: inv dev.sanitise Undefined --noclean
# --- Tests ---
- name: "Run tests with undefined sanitising on and print stderr to file"
run: ./bin/faabric_tests
working-directory: /build/faabric/static
env:
UBSAN_OPTIONS: "print_stacktrace=1:halt_on_error=1"

leak-sanitiser:
if: github.event.pull_request.draft == false
needs: [conan-cache]
runs-on: ubuntu-latest
env:
HOST_TYPE: ci
REDIS_QUEUE_HOST: redis
REDIS_STATE_HOST: redis
container:
image: faasm/faabric:0.2.2
defaults:
run:
working-directory: /code/faabric
services:
redis:
image: redis
steps:
# --- Code update ---
- name: "Fetch ref"
run: git fetch origin ${GITHUB_REF}:ci-branch
- name: "Check out branch"
run: git checkout --force ci-branch
# --- Set-up ---
- name: "Ping redis"
run: redis-cli -h redis ping
# --- Cache based on Conan version and ExternalProjects cmake source
- name: Get Conan version
id: get-conan-version
run: |
echo "::set-output name=conan-version::$(conan --version | tr -d '[:alpha:] ')"
shell: bash
- name: Hash ExternalProjects
id: hash-external
run: |
echo "::set-output name=hash-external::$(sha256sum cmake/ExternalProjects.cmake | cut '-d ' -f 1)"
shell: bash
- uses: actions/cache@v2
with:
path: '~/.conan'
key: ${{ runner.os }}-${{ steps.get-conan-version.outputs.conan-version }}-${{ steps.hash-external.outputs.hash-external }}
restore-keys: |
${{ runner.os }}-${{ steps.get-conan-version.outputs.conan-version }}-
${{ runner.os }}-
# --- Build with thread sanitising
- name: "Build tests with leak sanitising"
run: inv dev.sanitise Leak --noclean
# --- Tests ---
- name: "Run tests with leak sanitising on and print stderr to file"
run: ./bin/faabric_tests
working-directory: /build/faabric/static
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
build/

conan-cache/
# Clang
.clangd
compile_commands.json
Expand Down
15 changes: 15 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,21 @@ set(CMAKE_EXE_LINKER_FLAGS "-fuse-ld=lld")
# Compile comamnds for clang tools
set(CMAKE_EXPORT_COMPILE_COMMANDS ON)

# Set-up use of sanitisers
if (FAABRIC_USE_SANITISER STREQUAL "Address")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=address")
elseif (FAABRIC_USE_SANITISER STREQUAL "Thread")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=thread -fsanitize-ignorelist=${CMAKE_CURRENT_LIST_DIR}/thread-sanitizer-ignorelist.txt")
elseif (FAABRIC_USE_SANITISER STREQUAL "Undefined")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=undefined")
elseif (FAABRIC_USE_SANITISER STREQUAL "Leak")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=leak")
elseif (FAABRIC_USE_SANITISER STREQUAL "Memory")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=memory")
elseif (NOT ((FAABRIC_USE_SANITISER STREQUAL "None") OR (NOT FAABRIC_USE_SANITISER)))
message(FATAL_ERROR "Invalid FAABRIC_USE_SANITISER setting: ${FAABRIC_USE_SANITISER}")
endif()

# Global include dir
set(FAABRIC_INCLUDE_DIR ${CMAKE_CURRENT_LIST_DIR}/include)

Expand Down
2 changes: 2 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ services:
- /usr/bin/docker:/usr/bin/docker
- ./:/code/faabric
- ./build:/build/faabric
- ./conan-cache/:/root/.conan
working_dir: /code/faabric
stdin_open: true
tty: true
Expand All @@ -27,6 +28,7 @@ services:
volumes:
- ./:/code/faabric
- ./build:/build/faabric
- ./conan-cache/:/root/.conan
working_dir: /build/faabric/static
environment:
- LOG_LEVEL=debug
Expand Down
14 changes: 9 additions & 5 deletions include/faabric/redis/Redis.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
#include <faabric/util/exception.h>

#include <hiredis/hiredis.h>
#include <memory>
#include <mutex>
#include <set>
#include <string>
Expand Down Expand Up @@ -64,6 +65,9 @@ return 0
)---";
};

using UniqueRedisReply =
std::unique_ptr<redisReply, decltype(&freeReplyObject)>;

class Redis
{

Expand Down Expand Up @@ -189,10 +193,10 @@ class Redis
std::vector<uint8_t> dequeueBytes(const std::string& queueName,
int timeout = DEFAULT_TIMEOUT);

void dequeueBytes(const std::string& queueName,
uint8_t* buffer,
size_t bufferLen,
int timeout = DEFAULT_TIMEOUT);
size_t dequeueBytes(const std::string& queueName,
uint8_t* buffer,
size_t bufferLen,
int timeout = DEFAULT_TIMEOUT);

void dequeueMultiple(const std::string& queueName,
uint8_t* buff,
Expand All @@ -211,7 +215,7 @@ class Redis

const RedisInstance& instance;

redisReply* dequeueBase(const std::string& queueName, int timeout);
UniqueRedisReply dequeueBase(const std::string& queueName, int timeout);
};

class RedisNoResponseException : public faabric::util::FaabricException
Expand Down
3 changes: 2 additions & 1 deletion include/faabric/scheduler/Scheduler.h
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,8 @@ class Scheduler
int getFunctionRegisteredHostCount(const faabric::Message& msg);

std::set<std::string> getFunctionRegisteredHosts(
const faabric::Message& msg);
const faabric::Message& msg,
bool acquireLock = true);

void broadcastFlush();

Expand Down
7 changes: 7 additions & 0 deletions include/faabric/snapshot/SnapshotServer.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#pragma once

#include <atomic>

#include <faabric/flat/faabric_generated.h>
#include <faabric/scheduler/Scheduler.h>
#include <faabric/snapshot/SnapshotApi.h>
Expand All @@ -12,6 +14,10 @@ class SnapshotServer final : public faabric::transport::MessageEndpointServer
public:
SnapshotServer();

// Returns how many diffs have been applied since started, useful for
// testing
size_t diffsApplied() const;

protected:
void doAsyncRecv(int header,
const uint8_t* buffer,
Expand All @@ -34,5 +40,6 @@ class SnapshotServer final : public faabric::transport::MessageEndpointServer

private:
faabric::transport::PointToPointBroker& broker;
std::atomic_size_t diffsAppliedCounter;
};
}
Loading

0 comments on commit 27ffd58

Please sign in to comment.