Language Breakdown
Lines of code distribution across 17 owned repositories
T-Shaped Developer
T-shapedDeep in C with broad versatility
Collaboration Network
Global Impact visualization
Repos
18
PRs
0
Growth
+18%
Top Collaborators
No collaborator data yet.
Coding Streak
Contribution activity over the past year
Top Repositories
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Hooked CUDA-related dynamic libraries by using automated code generation tools.
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
NCU-driven iterative optimization workflow for CUDA/CUTLASS/Triton/CuTe DSL kernels.
Multiple GEMM operators are constructed with cutlass to support LLM inference.
Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
Simple and efficient memory pool is implemented with C++11.
Open Source Impact
Contributions to external projects