OpenCL Homework: Parallel Prefix Minimum Computation
Background:
Hillis and Steele introduced the parallel prefix sum algorithm [1] (Algorithm 1). In a subsequent paper [2], the authors, in Algorithm 4 (prefixMin2), adapted the parallel prefix sum algorithm to function as a parallel prefix minimum algorithm.
Objective:
Implement an OpenCL kernel for computing the parallel prefix minimum of a given vector [[b]] based on conditions specified in vector [[T]].
Requirements:
1- Implement the Kernel:
Write an OpenCL kernel function that computes the prefix minimum for elements of vector b based on ranges specified in vector T.
2- Host Code (C++):
- Set up the OpenCL environment in C++.
- Create and initialize vectors [[b]] and [[T]].
- Create OpenCL buffers and set kernel arguments.
- Execute the kernel and retrieve the result.
- Print the resultant vector after the prefix minimum computation.
3- Testing:
- Test the kernel with various input sizes and different vector values to ensure correctness.
- Compare the output of the OpenCL implementation with a sequential CPU-based implementation for correctness validation.
4- Report:
- Provide a report explaining the implementation details, including how the kernel computes the prefix minimum and the role of the vector [[T]].
- Discuss the parallelism in the kernel and the advantages of using OpenCL for such computations.
- Include the performance analysis, comparing the execution time of the CPU and GPU implementations for different input sizes.
Hints
- The paper[2] presents an alternative version of this algorithm, labeled as Algorithm 3. The creation of this algorithm (prefix minimum or prefixMin2 ) is based on the parallel prefix sum algorithm. Anyway, You need first to write a C++ code for Algorithm 4 or Algorithm 3. Later, convert this C++ code to OpenCL/C++ code.
- The notation [[b]] denotes a vector of values (array).
- For [[e]] ← min([[b]], [[d]]), the result is the minimum value in the corresponding value either in [[b]] or [[d]].
- The 'choose' operation, denoted as [[b]] ← choose([[T]] = [[U]], [[e]], [[b]]), signifies that the resulting value ([[b]]) is [[e]] if the condition ([[T]]=[[U]]) holds true, and it remains [[b]] if the condition ([[T]]=[[U]]) is false.
Notes:
This assignment offers students an opportunity to practice parallel programming concepts using OpenCL, understand data parallelism, concurrency, and explore optimizations for parallel algorithms. It also encourages them to analyze performance and understand the trade-offs between CPU and GPU implementations.