OpenCL Homework: Parallel Prefix Minimum Computation
Background:
Hillis and Steele introduced the parallel prefix sum algorithm [1] (Algorithm 1). In a subsequent paper [2], the authors, in Algorithm 4 (prefixMin2), adapted the parallel prefix sum algorithm to function as a parallel prefix minimum algorithm.
Dear Students, The Prefix minimum algorithms (Algorithm 3 and Algorithm 4) are new algorithms published two years ago, with limited documentation available. AI tools like ChatGPT may not provide direct solutions for this homework. Please study the paper [2] carefully (the PrefixMin2 sections) and reach out via the course forum or during office hours if you need help.
Objective:
Develop an OpenCL kernel to compute the parallel prefix minimum of a given vector [[b]], following the conditions defined in the vector [[T]]. In the paper[2], two versions of the PrefixMin2 algorithm are presented, labeled as Algorithm 3 and Algorithm 4. The creation of this algorithm (prefix minimum or PrefixMin2) is based on the parallel prefix sum algorithm. First, you need to implement Algorithm 3 or Algorithm 4 in C++ and then convert this C++ code into OpenCL/C++ code.
Requirements:
1- Implement the Kernel:
Write an OpenCL kernel function that computes the prefix minimum for elements of vector [[b]] based on ranges specified in vector [[T]].
- This example may help you gain a deeper understanding of the prefix minimum concept.
Input Data : vector<int> b = { 3, 7, 2, 8, 4, 6, 5, 9, 1, 0, 4, 5, 6, 2, 3 } vector<int> t = { 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3 } // Result [ 3, 3, 2, 8, 4, 4, 5, 5, 1, 0, 4, 4, 4, 2, 2 ]
2- Host Code (C++):
- Configure the OpenCL environment using C++.
- Create and initialize vectors [[b]] and [[T]].
- Create OpenCL buffers and assign kernel arguments.
- Launch the kernel execution and retrieve the results.
- Display the resulting vector after performing the prefix minimum computation.
3- Testing:
- Test the kernel with various input sizes and different vector values to ensure correctness.
- Compare the output of the OpenCL implementation with a sequential CPU-based implementation for correctness validation.
4- Report:
- Provide a report detailing the implementation, including an explanation of how the kernel computes the prefix minimum and the role of the vector [[T]].
- Discuss the parallelism in the kernel and the benefits of using OpenCL for such computations.
- Include a performance analysis comparing the execution times of CPU and GPU implementations across different input sizes.
Hints
- The notation [[b]] denotes a vector of values (array).
- For [[e]] ← min([[b]], [[d]]), the result is the minimum value in the corresponding value either in [[b]] or [[d]].
- The 'choose' operation, denoted as [[b]] ← choose([[T]] = [[U]], [[e]], [[b]]), signifies that the resulting value ([[b]]) is [[e]] if the condition ([[T]]=[[U]]) holds true, and it remains [[b]] if the condition ([[T]]=[[U]]) is false.
Notes:
This assignment offers students an opportunity to practice parallel programming concepts using OpenCL, understand data parallelism, concurrency, and explore optimizations for parallel algorithms. It also encourages them to analyze performance and understand the trade-offs between CPU and GPU implementations.