Skip to content

Hyeonjoon-Nam/Cuda-Study-Journey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 

Repository files navigation

CUDA Learning Journey

This repository documents my progress in mastering CUDA programming and High-Performance Computing (HPC). My goal is to understand the hardware architecture deeply and write highly optimized kernels.

Environment

  • GPU: NVIDIA GeForce RTX 3070 Laptop GPU
  • IDE: Visual Studio 2022
  • Toolkit: CUDA 13.1
  • Profiler: NVIDIA Nsight Compute / Nsight Systems

Project List

# Project Key Concepts Status
01 Vector Addition Grid-Stride Loop, Unified Memory, Profiling Done
02 Matrix Multiplication Shared Memory, Tiling, Vectorized Access (float4) Done
03 Parallel Reduction Warp Divergence, Loop Unrolling, Volatile, Bank Conflicts Done
04 N-Body Simulation Compute vs Memory Bound, Tiling, Thread Coarsening, Occupancy Done
05 Spatial Partitioning Uniform Grid, Atomic Operations Integrated into Project 06
06 Heterogeneous HPC System Wireless UDP, CUDA-GL Interop, Swarm Logistics In Progress

Current Focus: Project 06 - Heterogeneous HPC Simulation System

This project builds a comprehensive control pipeline that bridges Low-level Hardware, System Programming, and High-Performance Computing. It now supports both wired (Bare-metal) and wireless (UDP) telemetry.

System Architecture

The system simulates an Edge Computing environment where an external input node (ESP32 or Arduino) controls a massive particle simulation ($N=16,384$) in real-time via a dedicated I/O thread.

graph LR
    subgraph Input_Nodes
        A[Wireless: ESP32] -- "UDP (Wi-Fi)" --> B
        E[Legacy: Arduino] -- "UART (Serial)" --> B
    end
    
    subgraph Host_PC
        B[IO Thread: Udp/Serial Reader] -- std::atomic --> C[HPC Core: CUDA Kernel]
        C -- Zero-Copy Interop --> D[Render: OpenGL]
    end
Loading

(Text Representation) [ESP32/Arduino] --(UDP/UART)--> [IO Thread: Receiver] --(Atomic Memory)--> [HPC Core: CUDA Kernel] --(Interop)--> [Render: OpenGL]

Key Technical Objectives & Results

  • Wireless Modernization: Implemented UDP Telemetry via ESP32-S3 (SoftAP) and C++ WinSock2, breaking physical USB constraints.
  • HPC Core (CUDA & OpenGL): Zero-copy rendering with Spatial Partitioning (Uniform Grid) for real-time performance.
  • Embedded Interface (Bare-metal): Direct register manipulation (ADMUX, UBRR0) replacing standard Arduino libraries for ultra-low latency.

View Full Project & Code


Future Roadmap: From Simulation to Solution

Goal 1: Logistics Swarm Simulator (Ongoing)

Transforming simple boids into a massive Multi-Agent Pathfinding (MAPF) simulation mimicking thousands of AGVs in a warehouse.

  • Environmental Physics: Implemented a potential field using constant memory to handle static obstacles parsed from 2D floor plans. (Done)
  • HPC Routing: Achieved $O(1)$ path lookup for massive agent counts by transitioning from A* to Vector Flow Fields stored in GPU constant memory. (Done)
  • Local Avoidance: Implementing GPU-accelerated collision avoidance to resolve traffic deadlocks in narrow corridors. (Next)

Goal 2: Unified HPC Sandbox Architecture

Consolidating standalone projects into a single, cohesive engine framework.

  • Framework: Integrating Dear ImGui over the GLFW/OpenGL pipeline.
  • System Design: Abstracting simulations into a Scene management system.

Goal 3: Cross-Platform HPC Deployment (AMD ROCm)

Expanding the system's hardware abstraction by porting the CUDA-based simulation to the AMD ROCm (HIP) ecosystem.

  • Objective: Cross-validate the simulation's throughput across different GPU architectures.

About

High-Performance Computing (HPC) & Optimization studies using CUDA C++. Includes Grid-Stride Loops, Shared Memory tiling, and Nsight Compute profiling analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors