Nsight Systems & CUPTI

Installation and Container Mounting Guide

Mounting custom profiler versions into containerized GPU workloads

Why Mount Custom Profiler Versions?

The Problem

Container images (like NVIDIA NeMo, PyTorch NGC, etc.) ship with pre-installed versions of Nsight Systems (nsys) and CUPTI libraries. However, these bundled versions may contain bugs or lack features needed for your specific profiling requirements.

Container's Built-in Version

  • • May contain known bugs
  • • Cannot be updated without rebuilding image
  • • Version locked to container release
  • • Missing latest profiling features

Mounted Custom Version

  • • Use any version you need
  • • Quickly swap versions without rebuilding
  • • Apply bug fixes immediately
  • • Test new profiler features

Architecture Overview

Host System

Nsight Systems
/opt/tools/nsys/2025.5.1/
CUPTI Library
/opt/tools/cupti/13.0.85/
Container
Mount

Container

Original (Shadowed)
/usr/local/.../nsys
Mounted (Active)
Host version active!

The mount overlays the container's built-in version with your custom version from the host

Installing Nsight Systems

1

Download the Installer

Download from NVIDIA Developer portal

# Set your desired version (include build number) NSYS_VERSION="2025.5.1.121-3638078" # For x86_64 systems: NSYS_URL="https://developer.download.nvidia.com/devtools/nsight-systems/NsightSystems-linux-public-${NSYS_VERSION}.run" # For ARM64/aarch64 systems (Grace Hopper, etc.): NSYS_URL="https://developer.download.nvidia.com/devtools/nsight-systems/NsightSystems-linux-sbsa-public-${NSYS_VERSION}.run" # Download wget -O nsys_installer.run "${NSYS_URL}" chmod +x nsys_installer.run
Finding versions: Visit developer.nvidia.com/nsight-systems for available versions
2

Run the Installer

Install to a custom directory (non-interactive mode)

# Define installation directory INSTALL_DIR="/opt/tools/nsys/${NSYS_VERSION}" # Create installation directory mkdir -p "${INSTALL_DIR}" # Run installer in non-interactive mode # --target: temporary extraction location # -noprompt: no user interaction # -targetpath: final installation location ./nsys_installer.run --target /tmp/nsys_temp -- -noprompt -targetpath "${INSTALL_DIR}" # Cleanup temporary files rm -rf /tmp/nsys_temp
3

Verify Installation

Check that nsys binary exists and works

# Verify binary exists ls -la "${INSTALL_DIR}/bin/nsys" # Check version "${INSTALL_DIR}/bin/nsys" --version # Expected output: # NVIDIA Nsight Systems version 2025.5.1.121-250505693968v0

Installation Directory Structure

/opt/tools/nsys/2025.5.1.121-3638078/ ├── bin/ │ └── nsys # Main profiler binary ├── target-linux-x64/ # x86_64 target libraries │ ├── libcupti.so.13.0 │ └── ... ├── target-linux-sbsa-armv8/ # ARM64 target libraries │ ├── libcupti-sbsa.so.13.0 │ └── ... └── host-linux-x64/ # Host-side components

Installing CUPTI Library

When to Install CUPTI Separately?

CUPTI (CUDA Profiling Tools Interface) is included with Nsight Systems. Install it separately only when you need a specific CUPTI version that differs from what's in your nsys installation or container.

1

Download CUPTI Archive

From NVIDIA CUDA redistributables

# Set CUPTI version CUPTI_VERSION="13.0.85" # For x86_64: CUPTI_URL="https://developer.download.nvidia.com/compute/cuda/redist/cuda_cupti/linux-x86_64/cuda_cupti-linux-x86_64-${CUPTI_VERSION}-archive.tar.xz" # For ARM64/aarch64: CUPTI_URL="https://developer.download.nvidia.com/compute/cuda/redist/cuda_cupti/linux-sbsa/cuda_cupti-linux-sbsa-${CUPTI_VERSION}-archive.tar.xz" # Download wget -O cupti.tar.xz "${CUPTI_URL}"
2

Extract to Installation Directory

Extract and strip top-level directory

# Create installation directory CUPTI_DIR="/opt/tools/cupti/${CUPTI_VERSION}" mkdir -p "${CUPTI_DIR}" # Extract (strip first directory level) tar -xf cupti.tar.xz -C "${CUPTI_DIR}" --strip-components=1 # Verify ls -la "${CUPTI_DIR}/lib/" # Should show: libcupti.so, libcupti.so.13, libcupti.so.13.0.85, etc.

Container Mounting Strategy

Key Concept: Mount Over Container Path

To replace the container's built-in nsys, you mount your host directory over the exact path where nsys is installed inside the container. This "shadows" the original installation.

Finding the Container's Nsys Path

# Run a shell in the container to find nsys location docker run --rm -it nvcr.io/nvidia/nemo:25.07.01 bash # Inside container, find nsys which nsys # Output: /usr/local/cuda-12.9/NsightSystems-cli-2025.1.1/bin/nsys # The parent directory is what you need to mount over: # /usr/local/cuda-12.9/NsightSystems-cli-2025.1.1

Common Container Nsys Paths

Container Image Nsys Install Path
nvcr.io/nvidia/nemo:25.07.01 /usr/local/cuda-12.9/NsightSystems-cli-2025.1.1
nvcr.io/nvidia/nemo:25.09.00 /usr/local/cuda-12.9/NsightSystems-cli-2025.4.1
nvcr.io/nvidia/pytorch:xx.xx Check with which nsys

Enroot/Pyxis Configuration (SLURM)

Enroot is commonly used with SLURM clusters and Pyxis plugin for containerized HPC workloads.

# Mount format: host_path:container_path # Mount nsys installation NSYS_MOUNT="/opt/tools/nsys/2025.5.1.121-3638078:/usr/local/cuda-12.9/NsightSystems-cli-2025.1.1" # Mount CUPTI library (if needed separately) CUPTI_MOUNT="/opt/tools/cupti/13.0.85/lib/libcupti.so.13.0.85:/usr/local/cuda-12.9/NsightSystems-cli-2025.4.1/target-linux-x64/libcupti.so.13.0" # Combine mounts (comma-separated) CONTAINER_MOUNTS="${NSYS_MOUNT},${CUPTI_MOUNT}" # Use with srun/sbatch srun --container-image=nvcr.io/nvidia/nemo:25.07.01 \ --container-mounts="${CONTAINER_MOUNTS}" \ nsys profile --stats=true ./my_training_script.py

Environment Variable Approach

Many job launchers support passing mounts via environment variables:

# Set mounts as environment variable export RUN_CONF_MOUNTS="${NSYS_MOUNT}" # Job launcher reads RUN_CONF_MOUNTS and applies

Docker/Podman Configuration

# Docker mount format: -v host_path:container_path # Define paths HOST_NSYS="/opt/tools/nsys/2025.5.1.121-3638078" CONTAINER_NSYS="/usr/local/cuda-12.9/NsightSystems-cli-2025.1.1" # Run container with nsys mount docker run --gpus all --rm -it \ -v "${HOST_NSYS}:${CONTAINER_NSYS}" \ nvcr.io/nvidia/nemo:25.07.01 \ nsys --version # Should output your custom version, not container's built-in

Docker Compose Example

version: '3.8' services: training: image: nvcr.io/nvidia/nemo:25.07.01 volumes: # Mount custom nsys over container path - /opt/tools/nsys/2025.5.1:/usr/local/cuda-12.9/NsightSystems-cli-2025.1.1 # Your training code - ./workspace:/workspace deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu]

Verification

# Inside the container, verify the mounted version is active # Check nsys version nsys --version # Should show your custom version: NVIDIA Nsight Systems version 2025.5.1.xxx # Check which binary is being used which nsys # Should point to the mounted path # Verify CUPTI library (if mounted) ldd $(which nsys) | grep cupti # Should show your mounted library path # Test a quick profile nsys profile -o test_profile --stats=true python -c "import torch; x = torch.randn(1000, 1000, device='cuda'); torch.cuda.synchronize()" # If successful, you'll see profiling output and test_profile.nsys-rep file

Version Mismatch Check

If nsys --version shows the container's original version, the mount didn't work. Common causes:

  • • Incorrect container path (check which nsys in unmodified container)
  • • Mount syntax error
  • • Host path doesn't exist or isn't accessible

Troubleshooting

Error: "nsys: error while loading shared libraries"

The mounted nsys can't find required libraries.

Solution: Mount the entire nsys installation directory, not just the binary. The directory contains required libraries and dependencies.

Error: "CUPTI_ERROR_INSUFFICIENT_PRIVILEGES"

Profiling requires elevated privileges.

Solution: Run with --privileged or set --cap-add=SYS_ADMIN in Docker. For SLURM, ensure node configuration allows profiling.

Error: Architecture mismatch

x86_64 nsys on ARM64 container (or vice versa).

Solution: Download the correct architecture version. Use sbsa for ARM64, linux-public for x86_64.

Tip: Preserve Original as Fallback

Keep your mount configuration modular so you can quickly disable it if issues arise. Use environment variables or config files to toggle mounts.

Quick Reference

Nsys Download URLs

  • Base URL: developer.download.nvidia.com/devtools/nsight-systems/
  • x86_64: NsightSystems-linux-public-{version}.run
  • ARM64: NsightSystems-linux-sbsa-public-{version}.run

CUPTI Download URLs

  • Base: developer.download.nvidia.com/compute/cuda/redist/cuda_cupti/
  • x86_64: linux-x86_64/cuda_cupti-linux-x86_64-{version}-archive.tar.xz
  • ARM64: linux-sbsa/cuda_cupti-linux-sbsa-{version}-archive.tar.xz

Installer Flags Reference

./nsys_installer.run --help # Show all options --target DIR # Temporary extraction directory -- # Separator before installer-specific flags -noprompt # Non-interactive installation -targetpath DIR # Final installation directory