Nsight Systems & CUPTI: Installation and Container Mounting Guide

Why Mount Custom Profiler Versions?

The Problem

Container images (like NVIDIA NeMo, PyTorch NGC, etc.) ship with pre-installed versions of Nsight Systems (nsys) and CUPTI libraries. However, these bundled versions may contain bugs or lack features needed for your specific profiling requirements.

✗ Container's Built-in Version

• May contain known bugs
• Cannot be updated without rebuilding image
• Version locked to container release
• Missing latest profiling features

✓ Mounted Custom Version

• Use any version you need
• Quickly swap versions without rebuilding
• Apply bug fixes immediately
• Test new profiler features

Architecture Overview

Host System

Nsight Systems

/opt/tools/nsys/2025.5.1/

CUPTI Library

/opt/tools/cupti/13.0.85/

→

Container

Mount

Container

Original (Shadowed)

/usr/local/.../nsys

Mounted (Active)

Host version active!

The mount overlays the container's built-in version with your custom version from the host

Installing Nsight Systems

Download the Installer

Download from NVIDIA Developer portal

# Set your desired version (include build number)
NSYS_VERSION="2025.5.1.121-3638078"

# For x86_64 systems:
NSYS_URL="https://developer.download.nvidia.com/devtools/nsight-systems/NsightSystems-linux-public-${NSYS_VERSION}.run"

# For ARM64/aarch64 systems (Grace Hopper, etc.):
NSYS_URL="https://developer.download.nvidia.com/devtools/nsight-systems/NsightSystems-linux-sbsa-public-${NSYS_VERSION}.run"

# Download
wget -O nsys_installer.run "${NSYS_URL}"
chmod +x nsys_installer.run

Finding versions: Visit developer.nvidia.com/nsight-systems for available versions

Run the Installer

Install to a custom directory (non-interactive mode)

# Define installation directory
INSTALL_DIR="/opt/tools/nsys/${NSYS_VERSION}"

# Create installation directory
mkdir -p "${INSTALL_DIR}"

# Run installer in non-interactive mode
# --target: temporary extraction location
# -noprompt: no user interaction
# -targetpath: final installation location
./nsys_installer.run --target /tmp/nsys_temp -- -noprompt -targetpath "${INSTALL_DIR}"

# Cleanup temporary files
rm -rf /tmp/nsys_temp

Verify Installation

Check that nsys binary exists and works

# Verify binary exists
ls -la "${INSTALL_DIR}/bin/nsys"

# Check version
"${INSTALL_DIR}/bin/nsys" --version

# Expected output:
# NVIDIA Nsight Systems version 2025.5.1.121-250505693968v0

Installation Directory Structure

/opt/tools/nsys/2025.5.1.121-3638078/
├── bin/
│   └── nsys              # Main profiler binary
├── target-linux-x64/     # x86_64 target libraries
│   ├── libcupti.so.13.0
│   └── ...
├── target-linux-sbsa-armv8/  # ARM64 target libraries
│   ├── libcupti-sbsa.so.13.0
│   └── ...
└── host-linux-x64/       # Host-side components

Installing CUPTI Library

When to Install CUPTI Separately?

CUPTI (CUDA Profiling Tools Interface) is included with Nsight Systems. Install it separately only when you need a specific CUPTI version that differs from what's in your nsys installation or container.

Download CUPTI Archive

From NVIDIA CUDA redistributables

# Set CUPTI version
CUPTI_VERSION="13.0.85"

# For x86_64:
CUPTI_URL="https://developer.download.nvidia.com/compute/cuda/redist/cuda_cupti/linux-x86_64/cuda_cupti-linux-x86_64-${CUPTI_VERSION}-archive.tar.xz"

# For ARM64/aarch64:
CUPTI_URL="https://developer.download.nvidia.com/compute/cuda/redist/cuda_cupti/linux-sbsa/cuda_cupti-linux-sbsa-${CUPTI_VERSION}-archive.tar.xz"

# Download
wget -O cupti.tar.xz "${CUPTI_URL}"

Extract to Installation Directory

Extract and strip top-level directory

# Create installation directory
CUPTI_DIR="/opt/tools/cupti/${CUPTI_VERSION}"
mkdir -p "${CUPTI_DIR}"

# Extract (strip first directory level)
tar -xf cupti.tar.xz -C "${CUPTI_DIR}" --strip-components=1

# Verify
ls -la "${CUPTI_DIR}/lib/"
# Should show: libcupti.so, libcupti.so.13, libcupti.so.13.0.85, etc.

Container Mounting Strategy

Key Concept: Mount Over Container Path

To replace the container's built-in nsys, you mount your host directory over the exact path where nsys is installed inside the container. This "shadows" the original installation.

Finding the Container's Nsys Path

# Run a shell in the container to find nsys location
docker run --rm -it nvcr.io/nvidia/nemo:25.07.01 bash

# Inside container, find nsys
which nsys
# Output: /usr/local/cuda-12.9/NsightSystems-cli-2025.1.1/bin/nsys

# The parent directory is what you need to mount over:
# /usr/local/cuda-12.9/NsightSystems-cli-2025.1.1

Common Container Nsys Paths

Container Image	Nsys Install Path
`nvcr.io/nvidia/nemo:25.07.01`	`/usr/local/cuda-12.9/NsightSystems-cli-2025.1.1`
`nvcr.io/nvidia/nemo:25.09.00`	`/usr/local/cuda-12.9/NsightSystems-cli-2025.4.1`
`nvcr.io/nvidia/pytorch:xx.xx`	`Check with which nsys`

Enroot/Pyxis Configuration (SLURM)

Enroot is commonly used with SLURM clusters and Pyxis plugin for containerized HPC workloads.

# Mount format: host_path:container_path

# Mount nsys installation
NSYS_MOUNT="/opt/tools/nsys/2025.5.1.121-3638078:/usr/local/cuda-12.9/NsightSystems-cli-2025.1.1"

# Mount CUPTI library (if needed separately)
CUPTI_MOUNT="/opt/tools/cupti/13.0.85/lib/libcupti.so.13.0.85:/usr/local/cuda-12.9/NsightSystems-cli-2025.4.1/target-linux-x64/libcupti.so.13.0"

# Combine mounts (comma-separated)
CONTAINER_MOUNTS="${NSYS_MOUNT},${CUPTI_MOUNT}"

# Use with srun/sbatch
srun --container-image=nvcr.io/nvidia/nemo:25.07.01 \
     --container-mounts="${CONTAINER_MOUNTS}" \
     nsys profile --stats=true ./my_training_script.py

Environment Variable Approach

Many job launchers support passing mounts via environment variables:

# Set mounts as environment variable
export RUN_CONF_MOUNTS="${NSYS_MOUNT}"

# Job launcher reads RUN_CONF_MOUNTS and applies

Docker/Podman Configuration

# Docker mount format: -v host_path:container_path

# Define paths
HOST_NSYS="/opt/tools/nsys/2025.5.1.121-3638078"
CONTAINER_NSYS="/usr/local/cuda-12.9/NsightSystems-cli-2025.1.1"

# Run container with nsys mount
docker run --gpus all --rm -it \
    -v "${HOST_NSYS}:${CONTAINER_NSYS}" \
    nvcr.io/nvidia/nemo:25.07.01 \
    nsys --version

# Should output your custom version, not container's built-in

Docker Compose Example

version: '3.8'
services:
  training:
    image: nvcr.io/nvidia/nemo:25.07.01
    volumes:
      # Mount custom nsys over container path
      - /opt/tools/nsys/2025.5.1:/usr/local/cuda-12.9/NsightSystems-cli-2025.1.1
      # Your training code
      - ./workspace:/workspace
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Verification

# Inside the container, verify the mounted version is active

# Check nsys version
nsys --version
# Should show your custom version: NVIDIA Nsight Systems version 2025.5.1.xxx

# Check which binary is being used
which nsys
# Should point to the mounted path

# Verify CUPTI library (if mounted)
ldd $(which nsys) | grep cupti
# Should show your mounted library path

# Test a quick profile
nsys profile -o test_profile --stats=true python -c "import torch; x = torch.randn(1000, 1000, device='cuda'); torch.cuda.synchronize()"

# If successful, you'll see profiling output and test_profile.nsys-rep file

Version Mismatch Check

If nsys --version shows the container's original version, the mount didn't work. Common causes:

• Incorrect container path (check which nsys in unmodified container)
• Mount syntax error
• Host path doesn't exist or isn't accessible

Troubleshooting

Error: "nsys: error while loading shared libraries"

The mounted nsys can't find required libraries.

Solution: Mount the entire nsys installation directory, not just the binary. The directory contains required libraries and dependencies.

Error: "CUPTI_ERROR_INSUFFICIENT_PRIVILEGES"

Profiling requires elevated privileges.

Solution: Run with --privileged or set --cap-add=SYS_ADMIN in Docker. For SLURM, ensure node configuration allows profiling.

Error: Architecture mismatch

x86_64 nsys on ARM64 container (or vice versa).

Solution: Download the correct architecture version. Use sbsa for ARM64, linux-public for x86_64.

Tip: Preserve Original as Fallback

Keep your mount configuration modular so you can quickly disable it if issues arise. Use environment variables or config files to toggle mounts.

Quick Reference

Nsys Download URLs

Base URL: developer.download.nvidia.com/devtools/nsight-systems/
x86_64: NsightSystems-linux-public-{version}.run
ARM64: NsightSystems-linux-sbsa-public-{version}.run

CUPTI Download URLs

Base: developer.download.nvidia.com/compute/cuda/redist/cuda_cupti/
x86_64: linux-x86_64/cuda_cupti-linux-x86_64-{version}-archive.tar.xz
ARM64: linux-sbsa/cuda_cupti-linux-sbsa-{version}-archive.tar.xz

Installer Flags Reference

./nsys_installer.run --help   # Show all options
--target DIR                   # Temporary extraction directory
--                             # Separator before installer-specific flags
-noprompt                      # Non-interactive installation
-targetpath DIR                # Final installation directory