Arvutiteaduse instituut
  1. Kursused
  2. 2024/25 kevad
  3. Arvutigraafika projekt (MTAT.03.328)
EN
Logi sisse

Arvutigraafika projekt 2024/25 kevad

  • Main
  • Projects
  • Topics
  • Results and Schedule
  • Formatting Hints
  • Links

optixVR engine

Artem Kabakov

This project is a rewrite of the Optix 8.1 engine, updated to support multi-GPU configurations and VR integration. The code has been migrated to Optix 9 with CUDA 12.8 on Linux, addressing previous limitations and enabling advanced rendering techniques. Detailed benchmarks, hardware configuration, and demo videos are provided below. For a full project plan, please refer to the linked documentation (if available).

Final build, repository link, and a brief (10-20 sec) final result video will be added once the project is completed.

Milestone 1 (Target Date: 10.03)

  • Rendering Fixes (estimated: 2-3 hours): Resolve artifacts in Optix 9 videos caused by incorrect handling of object pointers between Python and C++ bindings.
  • Memory & Shader Updates (estimated: 2 hours): Improve texture memory management and apply necessary shader fixes.
  • GPU Selection Feature (estimated: 1-2 hours): Implement a feature to select the target GPU in code (instead of using CUDA_VISIBLE_DEVICES).

**Hardware & Benchmark Details:**

  • System Configuration: 2 × NVIDIA GeForce RTX 5090, Intel i9-12900K, 96GB DDR4 3200.

Transfer speed benchmark Report:

  1. GPU 0 (PCIe Root Complex: 0000:01)
    - Host → Device: 33638 MB/s
    - Device → Host: 41879.4 MB/s
    - NVML PCIe Throughput: TX 21012 KB/s, RX 410 KB/s
  2. GPU 1 (PCIe Root Complex: 0000:05)
    - Host → Device: 848.386 MB/s
    - Device → Host: 851.835 MB/s
    - NVML PCIe Throughput: TX 394 KB/s, RX 399 KB/s

**Demo Videos:**

  • Original Optix 8.1 demo recorded on RTX3070: optix81game.mp4
  • Optix 9 demo on GPU 0 (PCIe5.0 x16, 100-130 fps at 2064×2208): OPTIX9_GPU0-pcie5x16.mp4
  • Optix 9 demo on GPU 1 (PCIe3.0 x1, 62-58 fps at 2064×2208): OPTIX9_GPU1-pcie3x1.mp4
  • OpenXR with CUDA interop demo (dual mandelbrot render to OpenGL buffers): OpenXR_CUDA_2DEV.mp4

**Development Notes:**

  • The original Optix 8.1 application was migrated to Optix 9 to support current hardware.
  • For Optix9, frames are rendered at 2064×2208 and then the full-resolution images are copied to the CPU before being cropped to 1920×1080.
  • When using OpenVR/XR, the system employs direct writing to OpenGL/Vulkan buffers via CUDA interop.

Milestone 2 (Target Date: 24.03)

  • initial fixes milestone 1
 - memory fixes for renderer 2h 
 - gpu selection 2h (works only for single gpu at time)
 - memory  shader updates 4h (done in new resource system)
  • new resource system (~30h total)
 - python tensors  problems(3h)
 - new architecture research (2h)
 - refactoring and reimplementing OptiX usage for new architecture (9h)
 - multigpu resources synchronisation (8h)
 - new python bindings and gui code (4h)
  • more details in presentation https://docs.google.com/presentation/d/144S91CHOROZGn3a6xB_6yOgCVZv76l-j2oFlbHrci-Q/edit?usp=sharing

Milestone 3 (Target Date: 4.07)

  • Headset rendering (Not done)
 - alvr linux setup for oculus 3h

- alvr tries to connect to Oculus, using network forwarded by ADB-forwarder to the device

 - enables connection, but drops; at the same time, it rewrites the network configuration in order to play pretend a wireless device connected through ipv4. After a wireless connection has been established, an existing wired connection has to cease to exist, otherwise it conflicts with the new wireless connection

main function of cuda interop with openvr

 - alvr doesn't work out-of-the-box by the newest manual but if we use some hacks and partiallyutilizie older methods, it's feasible to establish a connction between steamvr and oculus.


    int main(int argc, char* argv[]) {
    // Initialize CUDA
    const cudaError_t cudaStatus = cudaFree(nullptr);
    if (cudaStatus != cudaSuccess) {
        std::cerr << "CUDA initialization failed: " << cudaGetErrorString(cudaStatus) << std::endl;
        return -1;
    }

    // Create OpenVR system
    OpenVRSystem vrSystem;
    if (!vrSystem.Initialize()) {
        std::cerr << "Failed to initialize OpenVR system" << std::endl;
        return -1;
    }

    std::cout << "Running main loop... Press ESC to exit." << std::endl;

    // Main loop
    while (!vrSystem.ShouldClose())
    {
        // Render a frame
        vrSystem.RenderFrame();

        // Poll for window events
        vrSystem.PollEvents();
        std::cout<<"pool"<<std::endl;

        // Sleep briefly to prevent 100% CPU usage
        std::this_thread::sleep_for(std::chrono::milliseconds(1));
        std::cout<<"sleep"<<std::endl;
    }

    std::cout << "Shutting down..." << std::endl;

    return 0;
    }

launching different cuda kernels dependent on eye

        dim3 blockSize(16, 16);
        dim3 gridSize((m_renderWidth + blockSize.x - 1) / blockSize.x,
                      (m_renderHeight + blockSize.y - 1) / blockSize.y);

        if (eye == vr::Eye_Left) {
            renderLeftEyeGrid<<<gridSize, blockSize>>>(deviceBuffer, m_renderWidth, m_renderHeight, pitch, m_time);
        } else {
            renderRightEyeGrid<<<gridSize, blockSize>>>(deviceBuffer, m_renderWidth, m_renderHeight, pitch, m_time);
        }

actual kernel to draw left eye

    __global__ void renderLeftEyeGrid(unsigned char* surface, int width, int height, int pitch, float time) {
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;

    if (x >= width || y >= height) return;

    // Calculate output position in the surface
    unsigned char* pixel = surface + y * pitch + x * 4;

    // Create a grid pattern that moves slowly over time
    int gridSize = 32; // Size of each grid cell
    int offsetX = (int)(time * 10.0f) % gridSize;
    int offsetY = (int)(time * 5.0f) % gridSize;

    bool isGridLine = ((x + offsetX) % gridSize < 2) || ((y + offsetY) % gridSize < 2);

    if (isGridLine) {
        // Grid lines: Red
        pixel[0] = 0;     // B
        pixel[1] = 0;     // G
        pixel[2] = 255;   // R
        pixel[3] = 255;   // A
    } else {
        // Grid cells: Black
        pixel[0] = 0;     // B
        pixel[1] = 0;     // G
        pixel[2] = 0;     // R
        pixel[3] = 255;   // A
    }
    }
  • python bindings problems 2h linker error (not finished)
    > python test_openvr_system.py
    Error importing OpenVRSystem: libc10.so: cannot open shared object file: No such file or 
    directory
    Make sure the extension is properly compiled with OpenVR support.
  • renderer component integration

https://drive.google.com/file/d/1aJcXFPpY6meKdMigORkyu8KTGh-POX3W/view?resourcekey&pli=1 to run this view (with control by headset) on a headset

  • MTL materials **bonus task (Not done)
 - examples research 3h https://github.com/NVIDIA/OptiX_Apps.git
 - simple implementation attempts (add support for callable programs) 4h

No headset records available for now due to nvidia kernel driver crash after clicking record in alvr. I was working on bonus task until got headset 04.07.2025

Milestone 4 (Target Date: 21.07)

  • full documentation clenup!!! est: 4h?
  • openvr integration with renderer est: 6h

first working attempt after 3h of work, the linking problem was runtime only and required to set LD_LIBRARY_PATH to pytorch libraries which was used for build https://node.ukrain.ee/first_working_wr.mp4 the video artifacts caused by steamVR vr view window and does not apear in headset.

  • recalculate the pinhole camera matrix from HMD rotation and translation + per eye translation
    def setup_vr_camera(
        eye_matrix,                  # <‑‑ now accepted but not used
        eye_to_head_matrix,          # 3×4 per‑eye
        head_pose_matrix,            # 3×4 HMD pose in world space
        proj_matrix,                 # 4×4 per‑eye projection
        base_pos=(0.0, 0.0, 0.0),    # locked‑camera fallback
        lock_position=False
    ):
    """
    Builds an ors.CameraParameters for OptiX ray generation (UVW method).

    Parameters
    ----------
    eye_matrix : 3×4
        Reserved for future use (time‑warp, etc.); ignored by this function.
    eye_to_head_matrix : 3×4
        Eye‑to‑head transform from OpenVR for this eye.
    head_pose_matrix : 3×4
        Headset pose in world space.
    proj_matrix : 4×4
        Asymmetric projection matrix for this eye.
    base_pos : 3‑tuple
        World‑space position used when lock_position is True.
    lock_position : bool
        If True, ignore positional tracking and keep the camera at base_pos.
    """
    import numpy as np
    from scipy.spatial.transform import Rotation as R

    cam             = ors.CameraParameters()
    head_pose_np    = np.asarray(head_pose_matrix, dtype=float).reshape(3, 4)
    eye_to_head_np  = np.asarray(eye_to_head_matrix, dtype=float).reshape(3, 4)
    proj_np         = np.asarray(proj_matrix,      dtype=float).reshape(4, 4)

    # ---------- position ----------------------------------------------------
    if lock_position:
        position = np.array(base_pos, dtype=np.float64)
    else:
        eye_offset = eye_to_head_np[:, 3]
        hp4        = np.eye(4, dtype=np.float64)
        hp4[:3, :4] = head_pose_np
        position   = (hp4 @ np.append(eye_offset, 1.0))[:3]
        position[2] *= -1

    # ---------- orientation (quaternion) -----------------------------------
    rot_q = R.from_matrix(head_pose_np[:, :3])
    yaw, pitch, roll = rot_q.as_euler("xyz", degrees=False)

    # Uncomment any axis you want to invert
    pitch = -pitch      # nodding
    yaw   = -yaw        # turning
    # roll  = -roll       # tilting

    rot_q   = R.from_euler("xyz", [yaw, pitch, roll])
    right   = rot_q.apply([1, 0, 0])
    up      = rot_q.apply([0, 1, 0])
    forward = rot_q.apply([0, 0, -1])

    # ---------- projection parameters --------------------------------------
    tan_half_w = 1.0 / abs(proj_np[0, 0])
    tan_half_h = 1.0 / abs(proj_np[1, 1])
    off_x      = proj_np[0, 2]
    off_y      = proj_np[1, 2]

    camera_u = right   #* (2.0 * tan_half_w)
    camera_v = up      #* (2.0 * tan_half_h)
    camera_w = (position
                - tan_half_w * (off_x + 1.0) * right
                - tan_half_h * (off_y + 1.0) * up
                - forward)

    # ---------- write to struct --------------------------------------------
    cam.position  = position.tolist()
    cam.camera_u  = camera_u.tolist()
    cam.camera_v  = camera_v.tolist()
    cam.camera_w  = camera_w.tolist()
    return cam
  • cuda memory copy and sync between 2 devices for openvr
        // Get tensor data pointer
        void* tensor_ptr = nullptr;
        size_t buffer_size = m_renderWidth * m_renderHeight * 3 * sizeof(float);

        // Move tensor to the current GPU if needed
        torch::Tensor gpu_result;
        if (result.is_cuda() && result.device().index() != m_deviceId) {
            //std::cout << "Moving tensor from device " << result.device().index()
            //        << " to device " << m_deviceId << std::endl;
            void* source_ptr = result.data_ptr();
            if (!renderer->getDeviceContext()->setDevice()) {
                exit(-1);
            }

            // Create and start timing for GPU to CPU copy
            cudaEvent_t startToCpu, stopToCpu;
            CHECK_CUDA_ERROR(cudaEventCreate(&startToCpu));
            CHECK_CUDA_ERROR(cudaEventCreate(&stopToCpu));
            CHECK_CUDA_ERROR(cudaEventRecord(startToCpu, 0));

            // Copy from GPU to CPU
            float* host_buffer = new float[m_renderWidth * m_renderHeight * 3];
            CHECK_CUDA_ERROR(cudaMemcpy(host_buffer, source_ptr, buffer_size, cudaMemcpyDeviceToHost));
            CHECK_CUDA_ERROR(cudaDeviceSynchronize());

            // Record and calculate the time for GPU to CPU copy
            CHECK_CUDA_ERROR(cudaEventRecord(stopToCpu, 0));
            CHECK_CUDA_ERROR(cudaEventSynchronize(stopToCpu));
            float toCpuTimeMs = 0.0f;
            CHECK_CUDA_ERROR(cudaEventElapsedTime(&toCpuTimeMs, startToCpu, stopToCpu));

            // Save the GPU to CPU copy time
            if (eye == vr::Eye_Left) {
                leftEyeToCpuCopyTime = toCpuTimeMs;
            } else {
                rightEyeToCpuCopyTime = toCpuTimeMs;
            }

            // Add to total for this frame
            totalToCpuCopyTime += toCpuTimeMs;

            // Clean up events
            CHECK_CUDA_ERROR(cudaEventDestroy(startToCpu));
            CHECK_CUDA_ERROR(cudaEventDestroy(stopToCpu));

            // Ensure we're in the right CUDA context for resource mapping
            if (useDeviceContext) {
                if (!m_deviceContext->setDevice()) {
                    //std::cerr << "Failed to set device context for resource mapping" << std::endl;
                    useDeviceContext = false;
                    cudaSetDevice(m_deviceId);
                }
            } else {
                // Fallback to setting the device directly
                CHECK_CUDA_ERROR(cudaSetDevice(m_deviceId));
            }

            // Create and start timing for CPU to GPU copy
            cudaEvent_t startFromCpu, stopFromCpu;
            CHECK_CUDA_ERROR(cudaEventCreate(&startFromCpu));
            CHECK_CUDA_ERROR(cudaEventCreate(&stopFromCpu));
            CHECK_CUDA_ERROR(cudaEventRecord(startFromCpu, 0));

            // Copy from CPU to GPU
            CHECK_CUDA_ERROR(cudaMalloc(&tensor_ptr, buffer_size));
            CHECK_CUDA_ERROR(cudaMemcpy(tensor_ptr, host_buffer, buffer_size, cudaMemcpyHostToDevice));
            CHECK_CUDA_ERROR(cudaDeviceSynchronize());

            // Record and calculate the time for CPU to GPU copy
            CHECK_CUDA_ERROR(cudaEventRecord(stopFromCpu, 0));
            CHECK_CUDA_ERROR(cudaEventSynchronize(stopFromCpu));
            float fromCpuTimeMs = 0.0f;
            CHECK_CUDA_ERROR(cudaEventElapsedTime(&fromCpuTimeMs, startFromCpu, stopFromCpu));

            // Save the CPU to GPU copy time
            if (eye == vr::Eye_Left) {
                leftEyeFromCpuCopyTime = fromCpuTimeMs;
            } else {
                rightEyeFromCpuCopyTime = fromCpuTimeMs;
            }

            // Add to total for this frame
            totalFromCpuCopyTime += fromCpuTimeMs;

            // Clean up events
            CHECK_CUDA_ERROR(cudaEventDestroy(startFromCpu));
            CHECK_CUDA_ERROR(cudaEventDestroy(stopFromCpu));

            // Clean up host buffer
            delete[] host_buffer;
            //gpu_result = result.to(torch::Device(torch::kCUDA, m_deviceId));

        } else if (!result.is_cuda()) {
            gpu_result = result.to(torch::Device(torch::kCUDA, m_deviceId));
            tensor_ptr = gpu_result.data_ptr();
        } else {
            tensor_ptr = result.data_ptr(); // Already on the right device
        }
  • per yey time usage by systemt

both eye on single gpu

each eye on separate gpu

  • MTL base showkeys bonus* est: ...h

https://github.com/artyom0906/OptiX_PyTorch_Extension

  • basic shader with hardcoded light and complex material textures
    #include <optix.h>
    #include <cuda_runtime.h>
    #include <cuda.h>
    #include <vector_types.h>
    #include <vector_functions.h>
    #include "material_types.h"

    using namespace optix_renderer;

    // Vector math helper functions
    __device__ float3 operator+(const float3& a, const float3& b) {
        return make_float3(a.x + b.x, a.y + b.y, a.z + b.z);
    }

    __device__ float3 operator-(const float3& a, const float3& b) {
        return make_float3(a.x - b.x, a.y - b.y, a.z - b.z);
    }

    __device__ float3 operator*(const float3& a, float b) {
        return make_float3(a.x * b, a.y * b, a.z * b);
    }

    __device__ float3 operator*(float a, const float3& b) {
        return make_float3(a * b.x, a * b.y, a * b.z);
    }

    __device__ float3 operator*(const float3& a, const float3& b) {
        return make_float3(a.x * b.x, a.y * b.y, a.z * b.z);
    }

    __device__ float3 operator-(const float3& a) {
        return make_float3(-a.x, -a.y, -a.z);
    }

    __device__ float dot(const float3& a, const float3& b) {
        return a.x * b.x + a.y * b.y + a.z * b.z;
    }

    __device__ float3 normalize(const float3& v) {
        float invLen = 1.0f / sqrtf(dot(v, v));
        return v * invLen;
    }

    __device__ inline float3 cross(const float3& a, const float3& b) {
        return make_float3(a.y * b.z - a.z * b.y, a.z * b.x - a.x * b.z, a.x * b.y - a.y * b.x);
    }

    // Hit group data structure
    // Hit group data - will be the payload for hit group records
    struct HitGroupData {
    SimpleMaterial material;

    // Geometry data for this hit group
    CUdeviceptr vertices;
    CUdeviceptr indices;
    CUdeviceptr normals;
    CUdeviceptr texcoords;
    CUdeviceptr tangents;
    CUdeviceptr bitangents;

    // Texture data using CUtexObject for driver API compatibility
    CUtexObject albedo_texture;
    CUtexObject normal_texture;
    CUtexObject metallic_roughness_texture;
    CUtexObject emission_texture;
    CUtexObject specular_texture;
    CUtexObject specular_tint_texture;
    CUtexObject sheen_texture;
    CUtexObject clearcoat_texture;

    // Additional settings
    bool has_normals;
    bool has_texcoords;
    bool has_tangents;
    bool has_bitangents;

    // Legacy compatibility fields
    float3 albedo;        // Direct color access for basic shading
    float3 emission;      // Direct emission color
    };

    // Launch parameters
    struct Params {
    float3* image;
    float3 camera_pos;
    float3 camera_u;
    float3 camera_v;
    float3 camera_w;
    OptixTraversableHandle traversable;
    unsigned int width;
    unsigned int height;
    unsigned int samples_per_pixel;
    unsigned int max_depth;
    CUdeviceptr output_buffer;
    };

    // Get texture coordinates at hit point
    __device__ float2 getTexCoords(const HitGroupData* data, const float2& barycentrics) {
    if (!data->has_texcoords || !data->texcoords) {
        return make_float2(0.5f, 0.5f);
    }

    const int primitiveIndex = optixGetPrimitiveIndex();
    int3 indices;
    if (data->indices) {
        indices = *((int3*)(data->indices) + primitiveIndex);
    } else {
        indices.x = 3 * primitiveIndex;
        indices.y = 3 * primitiveIndex + 1;
        indices.z = 3 * primitiveIndex + 2;
    }

    float2* texcoords = (float2*)(data->texcoords);
    float2 tc0 = texcoords[indices.x];
    float2 tc1 = texcoords[indices.y];
    float2 tc2 = texcoords[indices.z];

    float w0 = 1.0f - barycentrics.x - barycentrics.y;
    float w1 = barycentrics.x;
    float w2 = barycentrics.y;

    float2 result;
    result.x = w0 * tc0.x + w1 * tc1.x + w2 * tc2.x;
    result.y = w0 * tc0.y + w1 * tc1.y + w2 * tc2.y;

    return result;
    }

    // Sample texture
    __device__ float4 sampleTexture(CUtexObject tex, float2 uv) {
    return tex2D<float4>(tex, uv.x, uv.y);
    }

    extern "C" {
        __constant__ Params params;
    }

    // Ray generation shader
    extern "C" __global__ void __raygen__renderFrame() {
        const uint3 idx = optixGetLaunchIndex();
        const uint3 dim = optixGetLaunchDimensions();

    unsigned int x = idx.x;
    unsigned int y = idx.y;

    unsigned int width = dim.x;
    unsigned int height = dim.y;

    if (params.width > 0) width = params.width;
    if (params.height > 0) height = params.height;

    if (x >= width || y >= height) {
        return;
    }

    float3 color = make_float3(0.1f, 0.1f, 0.1f);

    if (params.traversable) {
        const float3 ray_origin = params.camera_pos;

        // Flip y coordinate to make Y+ point up in camera space
        // Standard NDC has y from -1 (bottom) to 1 (top)
        // By flipping the y coordinate here, we make Y+ upward
        const float2 screen_pos = make_float2(
            (float)x / width * 2.0f - 1.0f,
            -((float)y / height * 2.0f - 1.0f) // Flip Y coordinate
        );

        float3 dir = make_float3(
            params.camera_u.x * screen_pos.x + params.camera_v.x * screen_pos.y + params.camera_w.x,
            params.camera_u.y * screen_pos.x + params.camera_v.y * screen_pos.y + params.camera_w.y,
            params.camera_u.z * screen_pos.x + params.camera_v.z * screen_pos.y + params.camera_w.z
        );

        // The Z- forward convention is handled in the camera basis calculation in setupLaunchParams
        // This is controlled by the camera "up" and "lookAt" vectors

        float length = sqrtf(dir.x*dir.x + dir.y*dir.y + dir.z*dir.z);
        const float3 ray_direction = make_float3(dir.x/length, dir.y/length, dir.z/length);

        unsigned int p0 = 0;
        unsigned int p1 = 0;
        unsigned int p2 = 0;
        unsigned int sbtOffset = 0;

        optixTrace(
            params.traversable,
            ray_origin,
            ray_direction,
            0.0f,
            1e16f,
            0.0f,
            OptixVisibilityMask(1),
            OPTIX_RAY_FLAG_DISABLE_ANYHIT,
            sbtOffset,
            1,
            0,
            p0, p1, p2
        );

        color.x = __uint_as_float(p0);
        color.y = __uint_as_float(p1);
        color.z = __uint_as_float(p2);
    }

    float3* output = (float3*)params.image;
    if (params.output_buffer != 0) {
        unsigned int idx = y * width + x;
        if (x < width && y < height) {
            output[idx] = color;
        }
    }
    }

    // Miss shader
    extern "C" __global__ void __miss__environment() {
    float3 color = make_float3(0.1f, 0.1f, 0.2f);
    optixSetPayload_0(__float_as_uint(color.x));
    optixSetPayload_1(__float_as_uint(color.y));
    optixSetPayload_2(__float_as_uint(color.z));
    }

    // Get normal at hit point
    __device__ float3 getNormal(const HitGroupData* data, const float2& barycentrics) {
    if (!data->has_normals || !data->normals) {
        // Return a default normal if normals are not available
        return make_float3(0.0f, 1.0f, 0.0f);
    }

    const int primitiveIndex = optixGetPrimitiveIndex();
    int3 indices;
    if (data->indices) {
        indices = *((int3*)(data->indices) + primitiveIndex);
    } else {
        indices.x = 3 * primitiveIndex;
        indices.y = 3 * primitiveIndex + 1;
        indices.z = 3 * primitiveIndex + 2;
    }

    float3* normals = (float3*)(data->normals);
    float3 n0 = normals[indices.x];
    float3 n1 = normals[indices.y];
    float3 n2 = normals[indices.z];

    float w0 = 1.0f - barycentrics.x - barycentrics.y;
    float w1 = barycentrics.x;
    float w2 = barycentrics.y;

    float3 result = make_float3(
        w0 * n0.x + w1 * n1.x + w2 * n2.x,
        w0 * n0.y + w1 * n1.y + w2 * n2.y,
        w0 * n0.z + w1 * n1.z + w2 * n2.z
    );

    return normalize(result);
    }

    // Simple lighting calculation
    __device__ float3 calculateLighting(
    const float3& position,
    const float3& normal,
    const float3& view_dir,
    const float3& albedo,
    float metallic,
    float roughness
    ) {
    // Fixed light position for simplicity
    float3 light_dir = normalize(make_float3(1.0f, 1.0f, 1.0f));
    float3 light_color = make_float3(1.0f, 1.0f, 1.0f);
    float light_intensity = 1.5f;

    // Ambient lighting
    float ambient_strength = 0.2f;
    float3 ambient = ambient_strength * albedo;

    // Diffuse lighting (Lambert model)
    float diff = max(dot(normal, light_dir), 0.0f);
    float3 diffuse = diff * light_color * albedo;

    // Specular lighting (Blinn-Phong approximation)
    float3 halfway_dir = normalize(light_dir + view_dir);
    float spec = pow(max(dot(normal, halfway_dir), 0.0f), 32.0f); 
    float specular_strength = 0.5f * (1.0f - roughness);
    float3 specular = specular_strength * spec * light_color;

    // Combine components
    float3 result = ambient + (diffuse + specular) * light_intensity;

    // Ensure values are in valid range
    result.x = min(result.x, 1.0f);
    result.y = min(result.y, 1.0f);
    result.z = min(result.z, 1.0f);

    return result;
    }

    // Closest hit shader
    extern "C" __global__ void __closesthit__radiance() {
    const HitGroupData* data = (const HitGroupData*)optixGetSbtDataPointer();
    const float2 barycentrics = optixGetTriangleBarycentrics();
    float2 texCoords = getTexCoords(data, barycentrics);
    float3 normal = getNormal(data, barycentrics);

    // Material properties
    float3 albedo = data->material.albedo;
    float metallic = data->material.metallic;
    float roughness = data->material.roughness;

    // Sample textures if available
    if (data->albedo_texture) {
        float4 texColor = sampleTexture(data->albedo_texture, texCoords);
        albedo = make_float3(texColor.x, texColor.y, texColor.z);
    }

    if (data->metallic_roughness_texture) {
        float4 mrTexColor = sampleTexture(data->metallic_roughness_texture, texCoords);
        // In most PBR textures:
        // - Metallic is in the blue channel
        // - Roughness is in the green channel
        metallic = mrTexColor.z;
        roughness = mrTexColor.y;
    }

    if (data->normal_texture && data->normals) {  // Just check for normals directly
        float4 normalTexColor = sampleTexture(data->normal_texture, texCoords);
        float3 tangentSpaceNormal = make_float3(
            normalTexColor.x * 2.0f - 1.0f,
            normalTexColor.y * 2.0f - 1.0f,
            normalTexColor.z * 2.0f - 1.0f
        );

        // For visualization, we'll just use the RGB directly for now
        // In a real shader, we would transform this using a TBN matrix
        normal = normalize(tangentSpaceNormal);
    }

    // Get hit position and view direction for lighting calculation
    float3 ray_dir = optixGetWorldRayDirection();
    float ray_t = optixGetRayTmax();
    float3 position = optixGetWorldRayOrigin() + ray_t * ray_dir;
    float3 view_dir = -ray_dir;  // Direction from hit point to camera

    // Choose visualization mode - Uncomment the one you want to use

    // Option 1: Apply lighting to the material (recommended)
    float3 color = calculateLighting(position, normal, view_dir, albedo, metallic, roughness);

    // Option 2: Display only albedo (base color)
    // float3 color = albedo;

    // Option 3: Display normals mapped from [-1,1] to [0,1]
    // float3 color = make_float3(normal.x * 0.5f + 0.5f, normal.y * 0.5f + 0.5f, normal.z * 0.5f + 0.5f);

    // Option 4: Display PBR properties in RGB channels
    // float3 color = make_float3(roughness, roughness, metallic);

    // Option 5: Mix albedo with metallic and roughness for debugging
    // color = albedo;
    // color.z = color.z * 0.5f + metallic * 0.5f;  // Add metallic to blue channel
    // color.y = color.y * 0.5f + roughness * 0.5f; // Add roughness to green channel

    optixSetPayload_0(__float_as_uint(color.x));
    optixSetPayload_1(__float_as_uint(color.y));
    optixSetPayload_2(__float_as_uint(color.z));
    }

https://tartuulikool-my.sharepoint.com/:p:/g/personal/kabakov_ut_ee/EYciYR_YNqFPvRIztPWggxMBQXTyRuPyrOUS0-7d2RLONw?e=zB91b1

Milestone 5 (Target Date: 5.05)

  • **Local Windows build** — spent 6 h; MSVC fails to compile, project runnable on Linux only MSVC segfaults on windows when trying to compile with openvr
  • **C++ camera-code optimisation & code cleanup** — spent 5 h.
    optix_renderer::CameraParameters setupVRCamera(const CameraSetupParams& params) {
    optix_renderer::CameraParameters cam;

    // Convert matrices to Eigen
    Eigen::Map<const Eigen::Matrix<float, 3, 4, Eigen::RowMajor>> headPoseNp(params.headPoseMatrix.data());
    Eigen::Map<const Eigen::Matrix<float, 3, 4, Eigen::RowMajor>> eyeToHeadNp(params.eyeToHeadMatrix.data());
    Eigen::Map<const Eigen::Matrix<float, 4, 4, Eigen::RowMajor>> projNp(params.projMatrix.data());

    // Calculate position
    Eigen::Vector3f position;
    if (params.lockPosition) {
        position = Eigen::Map<const Eigen::Vector3f>(params.basePos.data());
    } else {
        Eigen::Vector3f eyeOffset = eyeToHeadNp.block<3,1>(0,3);
        Eigen::Matrix4f hp4 = Eigen::Matrix4f::Identity();
        hp4.block<3,4>(0,0) = headPoseNp;

        Eigen::Vector4f homogeneous;
        homogeneous << eyeOffset, 1.0f;
        position = (hp4 * homogeneous).head<3>();
        position.z() *= -1.0f;
    }

    // Calculate orientation using rotation matrix
    Eigen::Matrix3f rotMatrix = headPoseNp.block<3,3>(0,0);
    Eigen::Vector3f euler = rotMatrix.eulerAngles(0,1,2); // xyz order

    // Invert pitch and yaw
    euler[0] = -euler[0];  // pitch
    euler[1] = -euler[1];  // yaw

    // Create rotation matrix from adjusted angles
    Eigen::Matrix3f adjustedRot;
    adjustedRot = Eigen::AngleAxisf(euler[2], Eigen::Vector3f::UnitZ()) *
                  Eigen::AngleAxisf(euler[1], Eigen::Vector3f::UnitY()) *
                  Eigen::AngleAxisf(euler[0], Eigen::Vector3f::UnitX());

    // Calculate camera basis vectors
    Eigen::Vector3f right = adjustedRot * Eigen::Vector3f(1,0,0);
    Eigen::Vector3f up = adjustedRot * Eigen::Vector3f(0,1,0);
    Eigen::Vector3f forward = adjustedRot * Eigen::Vector3f(0,0,-1);

    // Calculate projection parameters
    float tanHalfW = 1.0f / std::abs(projNp(0,0));
    float tanHalfH = 1.0f / std::abs(projNp(1,1));
    float offX = projNp(0,2);
    float offY = projNp(1,2);

    // Calculate camera vectors
    Eigen::Vector3f cameraU = right;
    Eigen::Vector3f cameraV = up;
    Eigen::Vector3f cameraW = position - 
                             tanHalfW * (offX + 1.0f) * right - 
                             tanHalfH * (offY + 1.0f) * up - 
                             forward;

    // Copy results to camera parameters
    cam.position = {position[0], position[1], position[2]};
    cam.u = {cameraU[0], cameraU[1], cameraU[2]};
    cam.v = {cameraV[0], cameraV[1], cameraV[2]};
    cam.w = {cameraW[0], cameraW[1], cameraW[2]};


    return cam;
    }

python bindings

    py::class_<ors::CameraSetupParams>(m, "CameraSetupParams")
        .def(py::init<>())
        .def_readwrite("eye_matrix", &ors::CameraSetupParams::eyeMatrix)
        .def_readwrite("eye_to_head_matrix", &ors::CameraSetupParams::eyeToHeadMatrix)
        .def_readwrite("head_pose_matrix", &ors::CameraSetupParams::headPoseMatrix)
        .def_readwrite("proj_matrix", &ors::CameraSetupParams::projMatrix)
        .def_readwrite("base_pos", &ors::CameraSetupParams::basePos)
        .def_readwrite("lock_position", &ors::CameraSetupParams::lockPosition);

Milestone 6 (Target Date: 19.05)

  • **Documentation cleanup** — 5 h.
  • **dynamic Geometry rebuild - 2h

python bindings

        .def("set_scene_changed", &Renderer::setSceneChanged)
        .def("is_scene_changed", &Renderer::isSceneChanged)
        .def("reset_scene_changed", &Renderer::resetSceneChanged);

build and rebuild acceleration structure

       bool buildInstanceAccelerationStructure() {

        if (!m_deviceContext->setDevice()) {
            std::cerr << "Failed to set device context for acceleration structure build" << std::endl;
            return false;
        }
        // Calculate the number of instances
        const size_t numInstances = m_instances.size();

        if (numInstances == 0) {
            return true; // Nothing to do
        }

        // Create array of OptixInstance for the IAS
        std::vector<OptixInstance> optixInstances;

        // Initialize array - we'll only add visible instances
        // instead of pre-allocating and potentially skipping entries
        optixInstances.clear();

        // Fill in the instances with transforms and references to GAS
        for (size_t i = 0; i < numInstances; i++) {
            auto& instance = m_instances[i];

            // Skip if the instance is not visible
            if (!instance->isVisible()) {
                continue;
            }

            // Create new instance and add to vector
            OptixInstance newInstance = {};
            optixInstances.push_back(newInstance);

            // Get reference to the newly added instance
            auto& optixInstance = optixInstances.back();

            // Get the resource IDs
            GeometryHandle geometryId = instance->getGeometryHandle();
            MaterialHandle materialId = instance->getMaterialHandle();

            // Get the device geometry data
            //std::cout << "Configuring instance " << i << " with geometry ID: " << geometryId << std::endl;
            auto geomData = m_deviceContext->getGeometryData(geometryId);
            if (!geomData) {
                std::cerr << "Error: Geometry data not found for instance " << i << std::endl;
                continue;
            }

            // Skip any geometry with a null traversable handle
            if (geomData->traversable == 0) {
                std::cerr << "Error: Geometry " << i << " has a null traversable handle" << std::endl;
                continue;
            }

            // Debug - print traversable handle
            //std::cout << "Instance " << i << " using traversable handle: " << geomData->traversable << std::endl;

            // Get the transform matrix from the instance
            torch::Tensor transform = instance->getTransform();

            // Ensure the transform tensor is contiguous in memory
            if (!transform.is_contiguous()) {
                transform = transform.contiguous();
            }

            // Print the transform matrix for debugging (Row-major format from PyTorch)
            //std::cout << "Transform BEFORE transpose for instance " << i << " (row-major):" << std::endl;
            float* transformPtr = (float*)transform.data_ptr();
            //for (int row = 0; row < 4; row++) {
            //    std::cout << "  ";
            //    for (int col = 0; col < 4; col++) {
            //        std::cout << transformPtr[row * 4 + col] << " ";
            //    }
            //    std::cout << std::endl;
            //}

            // NO transpose - try using row-major directly
            // Let's copy the first 3 rows of the 4x4 matrix directly
            optixInstance.transform[0] = transformPtr[0];  // Row 0, Col 0
            optixInstance.transform[1] = transformPtr[1];  // Row 0, Col 1
            optixInstance.transform[2] = transformPtr[2];  // Row 0, Col 2
            optixInstance.transform[3] = transformPtr[3];  // Row 0, Col 3 (translation X)

            optixInstance.transform[4] = transformPtr[4];  // Row 1, Col 0
            optixInstance.transform[5] = transformPtr[5];  // Row 1, Col 1
            optixInstance.transform[6] = transformPtr[6];  // Row 1, Col 2
            optixInstance.transform[7] = transformPtr[7];  // Row 1, Col 3 (translation Y)

            optixInstance.transform[8] = transformPtr[8];  // Row 2, Col 0
            optixInstance.transform[9] = transformPtr[9];  // Row 2, Col 1
            optixInstance.transform[10] = transformPtr[10]; // Row 2, Col 2
            optixInstance.transform[11] = transformPtr[11]; // Row 2, Col 3 (translation Z)

            // Print the OptiX transform matrix (column-major) for debugging
            //std::cout << "Transform AFTER transpose for instance " << i << " (column-major):" << std::endl;
            //std::cout << "  [" << optixInstance.transform[0] << ", " << optixInstance.transform[4] << ", " << optixInstance.transform[8] << ", " << "translation X: " << optixInstance.transform[3] << "]" << std::endl;
            //std::cout << "  [" << optixInstance.transform[1] << ", " << optixInstance.transform[5] << ", " << optixInstance.transform[9] << ", " << "translation Y: " << optixInstance.transform[7] << "]" << std::endl;
            //std::cout << "  [" << optixInstance.transform[2] << ", " << optixInstance.transform[6] << ", " << optixInstance.transform[10] << ", " << "translation Z: " << optixInstance.transform[11] << "]" << std::endl;

            // Set instance ID for hit group selection
            optixInstance.instanceId = static_cast<unsigned int>(i);

            // Set the SBT offset for material selection
            // Use the instance index as SBT offset to select different materials
            // Instance 0 = Cube = Blue, Instance 1 = Floor = Green
            optixInstance.sbtOffset = static_cast<unsigned int>(i);

            // Set the visibility mask (all visible by default)
            optixInstance.visibilityMask = 255;

            // Set the traversable handle to the geometry's GAS
            optixInstance.traversableHandle = geomData->traversable;

            // Set bitfield flags
            optixInstance.flags = OPTIX_INSTANCE_FLAG_NONE;
        }

        // Check if we have any instances after filtering for visibility
        if (optixInstances.empty()) {
            std::cerr << "WARNING: No visible instances to render!" << std::endl;
            // Set empty traversable
            m_ias = 0;
            return true;
        }

        // Debug output for instances
        //std::cout << "Building IAS with " << optixInstances.size() << " visible instances" << std::endl;

        // Create CUdeviceptr for the instance array - use consistent CUDA driver API
        size_t instanceBufferSize = sizeof(OptixInstance) * optixInstances.size();
        CUdeviceptr d_instances;
        CUDA_DRIVER_CHECK(cuMemAllocAsync(&d_instances, instanceBufferSize, m_deviceContext->getStream()));
        CUDA_DRIVER_CHECK(cuMemcpyHtoDAsync(
            d_instances,
            optixInstances.data(),
            instanceBufferSize
            , m_deviceContext->getStream()
        ));
        // Set up the build input
        OptixBuildInput buildInput = {};
        buildInput.type = OPTIX_BUILD_INPUT_TYPE_INSTANCES;
        buildInput.instanceArray.instances = d_instances;
        buildInput.instanceArray.numInstances = static_cast<unsigned int>(optixInstances.size());

        // Set up the acceleration structure build options
        OptixAccelBuildOptions accelOptions = {};
        accelOptions.buildFlags = OPTIX_BUILD_FLAG_ALLOW_UPDATE | OPTIX_BUILD_FLAG_ALLOW_COMPACTION 
                                  | OPTIX_BUILD_FLAG_PREFER_FAST_TRACE;
        accelOptions.operation = OPTIX_BUILD_OPERATION_BUILD;

        // Compute buffer sizes
        OptixAccelBufferSizes bufferSizes;
        OPTIX_CHECK(optixAccelComputeMemoryUsage(
            m_deviceContext->getContext(),
            &accelOptions,
            &buildInput,
            1, // one build input
            &bufferSizes
        ));

        // Allocate buffers - use consistent CUDA driver API
        CUdeviceptr d_tempBuffer;
        CUdeviceptr d_iasOutputBuffer;

        CUDA_DRIVER_CHECK(cuMemAllocAsync(&d_tempBuffer, bufferSizes.tempSizeInBytes, m_deviceContext->getStream()));
        CUDA_DRIVER_CHECK(cuMemAllocAsync(&d_iasOutputBuffer, bufferSizes.outputSizeInBytes, m_deviceContext->getStream()));

        // Debug: Check valid input
        //std::cout << "Before IAS build: Instances buffer is " << (d_instances ? "valid" : "null") << std::endl;
        //std::cout << "Before IAS build: Temp buffer is " << (d_tempBuffer ? "valid" : "null") << std::endl;
        //std::cout << "Before IAS build: Output buffer is " << (d_iasOutputBuffer ? "valid" : "null") << std::endl;

        // Build the acceleration structure
        OptixTraversableHandle tempHandle = 0;
        OPTIX_CHECK(optixAccelBuild(
            m_deviceContext->getContext(),
            0, // CUDA stream
            &accelOptions,
            &buildInput,
            1, // one build input
            d_tempBuffer,
            bufferSizes.tempSizeInBytes,
            d_iasOutputBuffer,
            bufferSizes.outputSizeInBytes,
            &tempHandle, // output to temp variable first
            nullptr, // no emitted properties
            0 // no emitted properties
        ));

        // Store the handle and debug output
        m_ias = tempHandle;
        //std::cout << "IAS build complete, traversable handle: " << m_ias << std::endl;

        // Clean up temporary buffers - use CUDA driver API for consistency
        CUDA_DRIVER_CHECK(cuMemFreeAsync(d_tempBuffer, m_deviceContext->getStream()));

        // Store the buffers we need to keep - free old buffers with CUDA driver API
        if (m_d_iasOutputBuffer) {
            CUDA_DRIVER_CHECK(cuMemFreeAsync(m_d_iasOutputBuffer, m_deviceContext->getStream()));
        }
        if (m_d_instances) {
            CUDA_DRIVER_CHECK(cuMemFreeAsync(m_d_instances, m_deviceContext->getStream()));
        }

        m_d_iasOutputBuffer = d_iasOutputBuffer;
        m_d_instances = d_instances;
        m_numInstances = optixInstances.size();

        return true;
    }

At the conclusion of the project, the final build, repository link, and a short final result video will be provided.

  • Arvutiteaduse instituut
  • Loodus- ja täppisteaduste valdkond
  • Tartu Ülikool
Tehniliste probleemide või küsimuste korral kirjuta:

Kursuse sisu ja korralduslike küsimustega pöörduge kursuse korraldajate poole.
Õppematerjalide varalised autoriõigused kuuluvad Tartu Ülikoolile. Õppematerjalide kasutamine on lubatud autoriõiguse seaduses ettenähtud teose vaba kasutamise eesmärkidel ja tingimustel. Õppematerjalide kasutamisel on kasutaja kohustatud viitama õppematerjalide autorile.
Õppematerjalide kasutamine muudel eesmärkidel on lubatud ainult Tartu Ülikooli eelneval kirjalikul nõusolekul.
Courses’i keskkonna kasutustingimused