Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project: Alpaka task-parallel constructs for CMS #88

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions projects/cms-alpaka.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
name: Alpaka for CMS
postdate: 2024-03-06
categories:
- Analysis tools
- Open science
- Computing
durations:
- 3 months
experiments:
- CMS
skillset:
- C++
- CUDA
status:
- Available
project:
- IRIS-HEP
location:
- Any
commitment:
- Any
program:
- IRIS-HEP fellow
shortdescription: Extending the [Alpaka](https://github.com/alpaka-group/alpaka) performance portability library with task-parallel constructs for the [CMS pixel reconstruction](https://github.com/cms-patatrack/pixeltrack-standalone/)
description: >
This project proposes to extend the [Alpaka](https://github.com/alpaka-group/alpaka) performance portability library with task-parallel constructs,
like task graphs and cooperative groups, and to evaluate their performance using them in the [pixel track](https://github.com/cms-patatrack/pixeltrack-standalone/)
reconstruction software of the CMS experiment at CERN. As data volume and complexity surge, the CMS pixel
reconstruction process, crucial for accurate particle tracking and collision event analysis, demands optimized
computational strategies for timely data processing. Alpaka, facilitating development across diverse
hardware architectures by providing a unified API for writing parallel software for CPUs, GPUs, and FPGAs,
will be extended with task graph and cooperative groups APIs to meet this demand.

Integrating task graphs into Alpaka will streamline the scheduling and execution of interdependent tasks,
optimizing resource utilization and reducing time-to-solution for complex data analyses.
Cooperative groups will facilitate more flexible and efficient thread collaboration, crucial
for fine-grained parallelism and dynamic workload distribution. These developments aim to
improve the performance and scalability of CMS pixel track reconstruction algorithms,
ensuring faster and more accurate data analysis for high-energy physics research.

Students participating in this project will have the opportunity to contribute to programming
a state-of-the-art C++ library, engaging directly with a developer group that adheres
to the best software programming practices. They will gain hands-on experience in developing
and implementing advanced computational solutions within a real-world scientific framework,
enhancing their technical skills in high-performance computing and software development
within a collaborative and cutting-edge research environment.
contacts:
- name: Jiri Vyskocil
email: jiri@vyskocil.com
- name: Volodymyr Bezguba
email: v.bezguba@kau.edu.ua
60 changes: 30 additions & 30 deletions projects/uproot-awkwardforth-refactor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,39 +20,39 @@ commitment:
program:
- IRIS-HEP fellow
shortdescription: >
Keeping the functionality of Uproot's accelerated reading through AwkwardForth, but
making it more maintainable by removing mutable state/coding it in a functional style.
Keeping the functionality of Uproot's accelerated reading through AwkwardForth, but
making it more maintainable by removing mutable state/coding it in a functional style.
description: >
Uproot is a Python library for reading and writing ROOT files (the most common file
format in particle physics). While it is relatively fast at reading "columnar" data,
either arrays of numbers or arrays of numbers that are grouped into variable-length
lists, any other data type requires iteration, which is a performance limitation in
the Python language. ("for" loops in Python are 100's of times slower than in compiled
languages.) To improve this situation, we introduced a domain-specific language (DSL)
called AwkwardForth, in which loops are much faster to execute than they are in Python
(factors of 100's again). This language was created in 2021 (https://arxiv.org/abs/2102.13516)
and added to Uproot in 2022 (https://arxiv.org/abs/2303.02202). In the end, an example
data structure (std::vector<std::vector<float>>) could be read 400× faster with
AwkwardForth than with Python. Users of Uproot don't have to opt in or change their
code, it just runs faster.
Uproot is a Python library for reading and writing ROOT files (the most common file
format in particle physics). While it is relatively fast at reading "columnar" data,
either arrays of numbers or arrays of numbers that are grouped into variable-length
lists, any other data type requires iteration, which is a performance limitation in
the Python language. ("for" loops in Python are 100's of times slower than in compiled
languages.) To improve this situation, we introduced a domain-specific language (DSL)
called AwkwardForth, in which loops are much faster to execute than they are in Python
(factors of 100's again). This language was created in 2021 (https://arxiv.org/abs/2102.13516)
and added to Uproot in 2022 (https://arxiv.org/abs/2303.02202). In the end, an example
data structure (std::vector<std::vector<float>>) could be read 400× faster with
AwkwardForth than with Python. Users of Uproot don't have to opt in or change their
code, it just runs faster.

That would be the end of the story, except that the AwkwardForth-generating code in
Uproot has been very hard to maintain. In part, it's because it's doing something
complicated: generating code that runs later or generating code that generates code
that runs later. But it is also more complicated than it needs to be, with Python
objects that change their own attributes in arbitrary ways as information about what
AwkwardForth needs to be generated accumulates. The code would be much easier to read
and reason about if it were stateless or append-only (see: functional programming),
and it easily could be. This project would be to restructure the AwkwardForth-generating
code in a functional style, to "remove the moving parts."
That would be the end of the story, except that the AwkwardForth-generating code in
Uproot has been very hard to maintain. In part, it's because it's doing something
complicated: generating code that runs later or generating code that generates code
that runs later. But it is also more complicated than it needs to be, with Python
objects that change their own attributes in arbitrary ways as information about what
AwkwardForth needs to be generated accumulates. The code would be much easier to read
and reason about if it were stateless or append-only (see: functional programming),
and it easily could be. This project would be to restructure the AwkwardForth-generating
code in a functional style, to "remove the moving parts."

To be clear, the project will not require you to understand the AwkwardForth that is
being generated (though that's not a bad thing), and it will not require you to figure
out how to generate the right AwkwardForth for a given data type. This part of the problem
has been solved and there are many unit tests that can check correctness, to allow you to
do test-driven development. The project is about software engineering: how to structure
code so that it can be read and understood, while keeping the problem-solving aspect
unchanged.
To be clear, the project will not require you to understand the AwkwardForth that is
being generated (though that's not a bad thing), and it will not require you to figure
out how to generate the right AwkwardForth for a given data type. This part of the problem
has been solved and there are many unit tests that can check correctness, to allow you to
do test-driven development. The project is about software engineering: how to structure
code so that it can be read and understood, while keeping the problem-solving aspect
unchanged.
contacts:
- name: Ioana Ifrim
email: ioana.ifrim@cern.ch
Expand Down
Loading