CSC 5241 - GPU Programming

4 lecture hours 0 lab hours 4 credits

Course Description
This course provides an introduction to GPU programming. Topics include parallel programming paradigms, CUDA programming model and libraries, code profiling, optimization strategies, GPU architecture, parallel algorithms, and applications of GPU acceleration. Students will implement linear algebra operations, image processing algorithms, application case studies, parallel algorithm patterns, and compare their implementations with CUDA libraries. The course ends with a multi-week team-based project.
Prereq: CSC 2210 or instructor consent (quarter system prereq: CS 2040 or consent of instructor)
Note: This course is open to qualified undergraduate students.

Course Learning Outcomes
Upon successful completion of this course, the student will be able to:

Learn to program massively parallel processors using the CUDA programming API, tools, and techniques
Design and develop algorithms that take advantage of highly parallel co-processors to solve technical and scientific problems
Learn principles and patterns of parallel algorithms
Learn NVIDIA GPU architecture features and constraints
Understand data-parallel hardware in order to develop efficient algorithms
Design numerical methods optimized for data parallel architectures
Leverage data-parallel hardware to process large data sets common in modern big data applications
Implement and analyze parallel algorithm patterns in the CUDA programming model
Identify performance bottlenecks in parallel code.
Improve performance by applying common parallel techniques
Compare the performance of a from-scratch parallel implementation to an existing CUDA library when applied for accelerating a unique applicatio
Review and analyze latest research papers on using GPU programming for accelerating scientific computing applications

Prerequisites by Topic

A working knowledge of the C/C++ programming language
Familiarity with basic linear algebra

Course Topics

Heterogeneous parallel computing
GPU architecture
Data parallelism
CUDA programming structure
Mapping threads to multidimensional data
Memory access efficiency
CUDA memory types
Warps and SIMD hardware
Floating-point data representation
Numerical stability
Parallel patterns

Laboratory Topics

Query of hardware resources and coding environment/tools
Linear algebra operations such as vector addition, matrix multiplication
Performance optimization leveraging shared and constant memory
Image processing algorithms, such as image blur, color to grayscale conversion, convolution, stencil
Using CUDA libraries
Common parallel programming patterns
Application case studies

Coordinator
Dr. Sebastian Berisha

Add to Portfolio.

Close Window