Mô tả

This course is all about CUDA programming. We will start our discussion by looking at basic concepts including CUDA programming model, execution model, and memory model. Then we will show you how to implement advance algorithms using CUDA. CUDA programming is all about performance. So through out this course you will learn multiple optimization techniques and how to use those to implement algorithms. Also we will extensively discuss profiling techniques and some of the tools including nvprof, nvvp, CUDA Memcheck, CUDA-GDB tools in the CUDA toolkit. This course contains following sections.

                                             Introduction to CUDA programming and CUDA programming model

                                             CUDA Execution model

                                             CUDA memory model-Global memory

                                             CUDA memory model-Shared and Constant memory

                                             CUDA streams

                                             Tuning CUDA instruction level primitives

                                             Algorithm implementation with CUDA

                                             CUDA tools

With this course we include lots of programming exercises and quizzes as well. Answering all those will help you to digest the concepts we discuss here.

This course is the first course of the CUDA master class series we are current working on. So the knowledge you gain here is essential of following those course as well.

Bạn sẽ học được gì

All the basic knowladge about CUDA programming

Ability to desing and implement optimized parallel algorithms

Basic work flow of parallel algorithm design

Advance CUDA concepts

Yêu cầu

  • Basic C or C++ programming knowladge
  • How to use Visual studio IDE
  • CUDA toolkit
  • Nvidia GPU
  • You should be familiar with basic setup of a C++ project, how to change project properties etc

Nội dung khoá học

8 sections

Introduction to CUDA programming and CUDA programming model

20 lectures
Very very important
07:48
Introduction to parallel programming
08:50
Parallel computing and Super computing
07:19
Let's investigate some background.
4 questions
How to install CUDA toolkit and first look at CUDA program
06:12
Basic elements of CUDA program
16:50
Organization of threads in a CUDA program - threadIdx
08:38
Organization of thread in a CUDA program - blockIdx,blockDim,gridDim
06:14
Programming exercise 1
00:29
Unique index calculation using threadIdx blockId and blockDim
09:20
Unique index calculation for 2D grid 1
05:53
Unique index calculation for 2D grid 2
05:09
Memory transfer between host and device
11:13
Programming exercise 2
01:04
Sum array example with validity check
09:13
Sum array example with error handling
04:32
Sum array example with timing
08:18
Extend sum array implementation to sum up 3 arrays
1 question
Device properties
05:30
Summary
04:17

CUDA Execution model

16 lectures
Understand the device better
08:46
All about warps
09:43
Warp divergence
12:28
Resource partitioning and latency hiding 1
05:35
Resource partitioning and latency hiding 2
10:41
Occupancy
11:16
Profile driven optimization with nvprof
12:04
Parallel reduction as synchronization example
19:08
Parallel reduction as warp divergence example
10:11
Parallel reduction with loop unrolling
07:03
Parallel reduction as warp unrolling
06:48
Reduction with complete unrolling
04:09
Performance comparison of reduction kernels
05:18
CUDA Dynamic parallelism
10:03
Reduction with dynamic parallelism
05:33
Summary
04:36

CUDA memory model

12 lectures
CUDA memory model
06:49
Different memory types in CUDA
09:04
Memory management and pinned memory
07:19
Zero copy memory
08:45
Unified memory
04:39
Global memory access patterns
12:55
Global memory writes
03:53
AOS vs SOA
06:03
Matrix transpose
19:34
Matrix transpose with unrolling
06:21
Matrix transpose with diagonal coordinate system
08:36
Summary
03:00

CUDA Shared memory and constant memory

13 lectures
Introduction to CUDA shared memory
09:04
Shared memory access modes and memory banks
09:06
Row major and Column major access to shared memory
08:51
Static and Dynamic shared memory
04:19
Shared memory padding
05:44
Parallel reduction with shared memory
04:44
Synchronization in CUDA
03:38
Matrix transpose with shared memory
11:53
CUDA constant memory
13:10
Matrix transpose with Shared memory padding
05:47
CUDA warp shuffle instructions
14:59
Parallel reduction with warp shuffle instructions
03:50
Summary
02:10

CUDA Streams

8 lectures
Introduction to CUDA streams and events
06:25
How to use CUDA asynchronous functions
07:10
How to use CUDA streams
10:28
Overlapping memory transfer and kernel execution
05:23
Stream synchronization and blocking behavious of NULL stream
06:57
Explicit and implicit synchronization
02:31
CUDA events and timing with CUDA events
06:03
Creating inter stream dependencies with events
04:31

Performance Tuning with CUDA instruction level primitives

4 lectures
Introduction to different types of instructions in CUDA
04:01
Floating point operations
06:46
Standard and Instrict functions
08:29
Atomic functions
08:22

Parallel Patterns and Applications

6 lectures
Scan algorithm introduction
05:38
Simple parallel scan
08:24
Work efficient parallel exclusive scan
09:33
Work efficient parallel inclusive scan
07:41
Parallel scan for large data sets
04:52
Parallel Compact algorithm
07:49

Bonus: Introduction to Image processing with CUDA

6 lectures
Introduction part 1
08:04
Introduction part 2
11:41
Digital image processing
09:39
Digital image fundametals : Human perception
11:10
Digital image fundamentals : Image formation
15:22
OpenCV installation
06:28

Đánh giá của học viên

Chưa có đánh giá
Course Rating
5
0%
4
0%
3
0%
2
0%
1
0%

Bình luận khách hàng

Viết Bình Luận

Bạn đánh giá khoá học này thế nào?

image

Đăng ký get khoá học Udemy - Unica - Gitiho giá chỉ 50k!

Get khoá học giá rẻ ngay trước khi bị fix.