optixVR engine
Artem Kabakov
This project is a rewrite of the Optix 8.1 engine, updated to support multi-GPU configurations and VR integration. The code has been migrated to Optix 9 with CUDA 12.8 on Linux, addressing previous limitations and enabling advanced rendering techniques. Detailed benchmarks, hardware configuration, and demo videos are provided below. For a full project plan, please refer to the linked documentation (if available).
Final build, repository link, and a brief (10-20 sec) final result video will be added once the project is completed.
Milestone 1 (Target Date: 10.03)
- Rendering Fixes (estimated: 2-3 hours): Resolve artifacts in Optix 9 videos caused by incorrect handling of object pointers between Python and C++ bindings.
- Memory & Shader Updates (estimated: 2 hours): Improve texture memory management and apply necessary shader fixes.
- GPU Selection Feature (estimated: 1-2 hours): Implement a feature to select the target GPU in code (instead of using CUDA_VISIBLE_DEVICES).
**Hardware & Benchmark Details:**
- System Configuration: 2 × NVIDIA GeForce RTX 5090, Intel i9-12900K, 96GB DDR4 3200.
Transfer speed benchmark Report:
1. GPU 0 (PCIe Root Complex: 0000:01) - Host → Device: 33638 MB/s - Device → Host: 41879.4 MB/s - NVML PCIe Throughput: TX 21012 KB/s, RX 410 KB/s 2. GPU 1 (PCIe Root Complex: 0000:05) - Host → Device: 848.386 MB/s - Device → Host: 851.835 MB/s - NVML PCIe Throughput: TX 394 KB/s, RX 399 KB/s
**Demo Videos:**
- Original Optix 8.1 demo recorded on RTX3070: optix81game.mp4
- Optix 9 demo on GPU 0 (PCIe5.0 x16, 100-130 fps at 2064×2208): OPTIX9_GPU0-pcie5x16.mp4
- Optix 9 demo on GPU 1 (PCIe3.0 x1, 62-58 fps at 2064×2208): OPTIX9_GPU1-pcie3x1.mp4
- OpenXR with CUDA interop demo (dual mandelbrot render to OpenGL buffers): OpenXR_CUDA_2DEV.mp4
**Development Notes:**
- The original Optix 8.1 application was migrated to Optix 9 to support current hardware.
- For Optix9, frames are rendered at 2064×2208 and then the full-resolution images are copied to the CPU before being cropped to 1920×1080.
- When using OpenVR/XR, the system employs direct writing to OpenGL/Vulkan buffers via CUDA interop.
Milestone 2 (Target Date: 24.03)
- initial fixes milestone 1
- memory fixes for renderer 2h - gpu selection 2h (works only for single gpu at time) - memory shader updates 4h (done in new resource system)
- new resource system (~30h total)
- python tensors problems(3h) - new architecture research (2h) - refactoring and reimplementing OptiX usage for new architecture (9h) - multigpu resources synchronisation (8h) - new python bindings and gui code (4h)
- more details in presentation https://docs.google.com/presentation/d/144S91CHOROZGn3a6xB_6yOgCVZv76l-j2oFlbHrci-Q/edit?usp=sharing
Milestone 3 (Target Date: 4.07)
- Headset rendering (Not done)
- alvr linux setup for oculus 3h


- alvr tries to connect to Oculus, using network forwarded by ADB-forwarder to the device

- enables connection, but drops; at the same time, it rewrites the network configuration in order to play pretend a wireless device connected through ipv4. After a wireless connection has been established, an existing wired connection has to cease to exist, otherwise it conflicts with the new wireless connection
main function of cuda interop with openvr
- alvr doesn't work out-of-the-box by the newest manual but if we use some hacks and partiallyutilizie older methods, it's feasible to establish a connction between steamvr and oculus. int main(int argc, char* argv[]) { // Initialize CUDA const cudaError_t cudaStatus = cudaFree(nullptr); if (cudaStatus != cudaSuccess) { std::cerr << "CUDA initialization failed: " << cudaGetErrorString(cudaStatus) << std::endl; return -1; } // Create OpenVR system OpenVRSystem vrSystem; if (!vrSystem.Initialize()) { std::cerr << "Failed to initialize OpenVR system" << std::endl; return -1; } std::cout << "Running main loop... Press ESC to exit." << std::endl; // Main loop while (!vrSystem.ShouldClose()) { // Render a frame vrSystem.RenderFrame(); // Poll for window events vrSystem.PollEvents(); std::cout<<"pool"<<std::endl; // Sleep briefly to prevent 100% CPU usage std::this_thread::sleep_for(std::chrono::milliseconds(1)); std::cout<<"sleep"<<std::endl; } std::cout << "Shutting down..." << std::endl; return 0; }
launching different cuda kernels dependent on eye
dim3 blockSize(16, 16); dim3 gridSize((m_renderWidth + blockSize.x - 1) / blockSize.x, (m_renderHeight + blockSize.y - 1) / blockSize.y); if (eye == vr::Eye_Left) { renderLeftEyeGrid<<<gridSize, blockSize>>>(deviceBuffer, m_renderWidth, m_renderHeight, pitch, m_time); } else { renderRightEyeGrid<<<gridSize, blockSize>>>(deviceBuffer, m_renderWidth, m_renderHeight, pitch, m_time); }
actual kernel to draw left eye
__global__ void renderLeftEyeGrid(unsigned char* surface, int width, int height, int pitch, float time) { int x = blockIdx.x * blockDim.x + threadIdx.x; int y = blockIdx.y * blockDim.y + threadIdx.y; if (x >= width || y >= height) return; // Calculate output position in the surface unsigned char* pixel = surface + y * pitch + x * 4; // Create a grid pattern that moves slowly over time int gridSize = 32; // Size of each grid cell int offsetX = (int)(time * 10.0f) % gridSize; int offsetY = (int)(time * 5.0f) % gridSize; bool isGridLine = ((x + offsetX) % gridSize < 2) || ((y + offsetY) % gridSize < 2); if (isGridLine) { // Grid lines: Red pixel[0] = 0; // B pixel[1] = 0; // G pixel[2] = 255; // R pixel[3] = 255; // A } else { // Grid cells: Black pixel[0] = 0; // B pixel[1] = 0; // G pixel[2] = 0; // R pixel[3] = 255; // A } }
- python bindings problems 2h linker error (not finished)
> python test_openvr_system.py Error importing OpenVRSystem: libc10.so: cannot open shared object file: No such file or directory Make sure the extension is properly compiled with OpenVR support.
- renderer component integration
https://drive.google.com/file/d/1aJcXFPpY6meKdMigORkyu8KTGh-POX3W/view?resourcekey&pli=1 to run this view (with control by headset) on a headset
- MTL materials **bonus task (Not done)
- examples research 3h https://github.com/NVIDIA/OptiX_Apps.git - simple implementation attempts (add support for callable programs) 4h
No headset records available for now due to nvidia kernel driver crash after clicking record in alvr. I was working on bonus task until got headset 04.07.2025
Milestone 4 (Target Date: 21.07)
- full documentation clenup!!! est: 4h?
- openvr integration with renderer est: 6h
first working attempt after 3h of work, the linking problem was runtime only and required to set LD_LIBRARY_PATH to pytorch libraries which was used for build https://node.ukrain.ee/first_working_wr.mp4 the video artifacts caused by steamVR vr view window and does not apear in headset.
- recalculate the pinhole camera matrix from HMD rotation and translation + per eye translation
def setup_vr_camera( eye_matrix, # <‑‑ now accepted but not used eye_to_head_matrix, # 3×4 per‑eye head_pose_matrix, # 3×4 HMD pose in world space proj_matrix, # 4×4 per‑eye projection base_pos=(0.0, 0.0, 0.0), # locked‑camera fallback lock_position=False ): """ Builds an ors.CameraParameters for OptiX ray generation (UVW method). Parameters ---------- eye_matrix : 3×4 Reserved for future use (time‑warp, etc.); ignored by this function. eye_to_head_matrix : 3×4 Eye‑to‑head transform from OpenVR for this eye. head_pose_matrix : 3×4 Headset pose in world space. proj_matrix : 4×4 Asymmetric projection matrix for this eye. base_pos : 3‑tuple World‑space position used when lock_position is True. lock_position : bool If True, ignore positional tracking and keep the camera at base_pos. """ import numpy as np from scipy.spatial.transform import Rotation as R cam = ors.CameraParameters() head_pose_np = np.asarray(head_pose_matrix, dtype=float).reshape(3, 4) eye_to_head_np = np.asarray(eye_to_head_matrix, dtype=float).reshape(3, 4) proj_np = np.asarray(proj_matrix, dtype=float).reshape(4, 4) # ---------- position ---------------------------------------------------- if lock_position: position = np.array(base_pos, dtype=np.float64) else: eye_offset = eye_to_head_np[:, 3] hp4 = np.eye(4, dtype=np.float64) hp4[:3, :4] = head_pose_np position = (hp4 @ np.append(eye_offset, 1.0))[:3] position[2] *= -1 # ---------- orientation (quaternion) ----------------------------------- rot_q = R.from_matrix(head_pose_np[:, :3]) yaw, pitch, roll = rot_q.as_euler("xyz", degrees=False) # Uncomment any axis you want to invert pitch = -pitch # nodding yaw = -yaw # turning # roll = -roll # tilting rot_q = R.from_euler("xyz", [yaw, pitch, roll]) right = rot_q.apply([1, 0, 0]) up = rot_q.apply([0, 1, 0]) forward = rot_q.apply([0, 0, -1]) # ---------- projection parameters -------------------------------------- tan_half_w = 1.0 / abs(proj_np[0, 0]) tan_half_h = 1.0 / abs(proj_np[1, 1]) off_x = proj_np[0, 2] off_y = proj_np[1, 2] camera_u = right #* (2.0 * tan_half_w) camera_v = up #* (2.0 * tan_half_h) camera_w = (position - tan_half_w * (off_x + 1.0) * right - tan_half_h * (off_y + 1.0) * up - forward) # ---------- write to struct -------------------------------------------- cam.position = position.tolist() cam.camera_u = camera_u.tolist() cam.camera_v = camera_v.tolist() cam.camera_w = camera_w.tolist() return cam
- cuda memory copy and sync between 2 devices for openvr
// Get tensor data pointer void* tensor_ptr = nullptr; size_t buffer_size = m_renderWidth * m_renderHeight * 3 * sizeof(float); // Move tensor to the current GPU if needed torch::Tensor gpu_result; if (result.is_cuda() && result.device().index() != m_deviceId) { //std::cout << "Moving tensor from device " << result.device().index() // << " to device " << m_deviceId << std::endl; void* source_ptr = result.data_ptr(); if (!renderer->getDeviceContext()->setDevice()) { exit(-1); } // Create and start timing for GPU to CPU copy cudaEvent_t startToCpu, stopToCpu; CHECK_CUDA_ERROR(cudaEventCreate(&startToCpu)); CHECK_CUDA_ERROR(cudaEventCreate(&stopToCpu)); CHECK_CUDA_ERROR(cudaEventRecord(startToCpu, 0)); // Copy from GPU to CPU float* host_buffer = new float[m_renderWidth * m_renderHeight * 3]; CHECK_CUDA_ERROR(cudaMemcpy(host_buffer, source_ptr, buffer_size, cudaMemcpyDeviceToHost)); CHECK_CUDA_ERROR(cudaDeviceSynchronize()); // Record and calculate the time for GPU to CPU copy CHECK_CUDA_ERROR(cudaEventRecord(stopToCpu, 0)); CHECK_CUDA_ERROR(cudaEventSynchronize(stopToCpu)); float toCpuTimeMs = 0.0f; CHECK_CUDA_ERROR(cudaEventElapsedTime(&toCpuTimeMs, startToCpu, stopToCpu)); // Save the GPU to CPU copy time if (eye == vr::Eye_Left) { leftEyeToCpuCopyTime = toCpuTimeMs; } else { rightEyeToCpuCopyTime = toCpuTimeMs; } // Add to total for this frame totalToCpuCopyTime += toCpuTimeMs; // Clean up events CHECK_CUDA_ERROR(cudaEventDestroy(startToCpu)); CHECK_CUDA_ERROR(cudaEventDestroy(stopToCpu)); // Ensure we're in the right CUDA context for resource mapping if (useDeviceContext) { if (!m_deviceContext->setDevice()) { //std::cerr << "Failed to set device context for resource mapping" << std::endl; useDeviceContext = false; cudaSetDevice(m_deviceId); } } else { // Fallback to setting the device directly CHECK_CUDA_ERROR(cudaSetDevice(m_deviceId)); } // Create and start timing for CPU to GPU copy cudaEvent_t startFromCpu, stopFromCpu; CHECK_CUDA_ERROR(cudaEventCreate(&startFromCpu)); CHECK_CUDA_ERROR(cudaEventCreate(&stopFromCpu)); CHECK_CUDA_ERROR(cudaEventRecord(startFromCpu, 0)); // Copy from CPU to GPU CHECK_CUDA_ERROR(cudaMalloc(&tensor_ptr, buffer_size)); CHECK_CUDA_ERROR(cudaMemcpy(tensor_ptr, host_buffer, buffer_size, cudaMemcpyHostToDevice)); CHECK_CUDA_ERROR(cudaDeviceSynchronize()); // Record and calculate the time for CPU to GPU copy CHECK_CUDA_ERROR(cudaEventRecord(stopFromCpu, 0)); CHECK_CUDA_ERROR(cudaEventSynchronize(stopFromCpu)); float fromCpuTimeMs = 0.0f; CHECK_CUDA_ERROR(cudaEventElapsedTime(&fromCpuTimeMs, startFromCpu, stopFromCpu)); // Save the CPU to GPU copy time if (eye == vr::Eye_Left) { leftEyeFromCpuCopyTime = fromCpuTimeMs; } else { rightEyeFromCpuCopyTime = fromCpuTimeMs; } // Add to total for this frame totalFromCpuCopyTime += fromCpuTimeMs; // Clean up events CHECK_CUDA_ERROR(cudaEventDestroy(startFromCpu)); CHECK_CUDA_ERROR(cudaEventDestroy(stopFromCpu)); // Clean up host buffer delete[] host_buffer; //gpu_result = result.to(torch::Device(torch::kCUDA, m_deviceId)); } else if (!result.is_cuda()) { gpu_result = result.to(torch::Device(torch::kCUDA, m_deviceId)); tensor_ptr = gpu_result.data_ptr(); } else { tensor_ptr = result.data_ptr(); // Already on the right device }
- per yey time usage by systemt
both eye on single gpu

each eye on separate gpu

- MTL base showkeys bonus* est: ...h
https://github.com/artyom0906/OptiX_PyTorch_Extension
- basic shader with hardcoded light and complex material textures
#include <optix.h> #include <cuda_runtime.h> #include <cuda.h> #include <vector_types.h> #include <vector_functions.h> #include "material_types.h" using namespace optix_renderer; // Vector math helper functions __device__ float3 operator+(const float3& a, const float3& b) { return make_float3(a.x + b.x, a.y + b.y, a.z + b.z); } __device__ float3 operator-(const float3& a, const float3& b) { return make_float3(a.x - b.x, a.y - b.y, a.z - b.z); } __device__ float3 operator*(const float3& a, float b) { return make_float3(a.x * b, a.y * b, a.z * b); } __device__ float3 operator*(float a, const float3& b) { return make_float3(a * b.x, a * b.y, a * b.z); } __device__ float3 operator*(const float3& a, const float3& b) { return make_float3(a.x * b.x, a.y * b.y, a.z * b.z); } __device__ float3 operator-(const float3& a) { return make_float3(-a.x, -a.y, -a.z); } __device__ float dot(const float3& a, const float3& b) { return a.x * b.x + a.y * b.y + a.z * b.z; } __device__ float3 normalize(const float3& v) { float invLen = 1.0f / sqrtf(dot(v, v)); return v * invLen; } __device__ inline float3 cross(const float3& a, const float3& b) { return make_float3(a.y * b.z - a.z * b.y, a.z * b.x - a.x * b.z, a.x * b.y - a.y * b.x); } // Hit group data structure // Hit group data - will be the payload for hit group records struct HitGroupData { SimpleMaterial material; // Geometry data for this hit group CUdeviceptr vertices; CUdeviceptr indices; CUdeviceptr normals; CUdeviceptr texcoords; CUdeviceptr tangents; CUdeviceptr bitangents; // Texture data using CUtexObject for driver API compatibility CUtexObject albedo_texture; CUtexObject normal_texture; CUtexObject metallic_roughness_texture; CUtexObject emission_texture; CUtexObject specular_texture; CUtexObject specular_tint_texture; CUtexObject sheen_texture; CUtexObject clearcoat_texture; // Additional settings bool has_normals; bool has_texcoords; bool has_tangents; bool has_bitangents; // Legacy compatibility fields float3 albedo; // Direct color access for basic shading float3 emission; // Direct emission color }; // Launch parameters struct Params { float3* image; float3 camera_pos; float3 camera_u; float3 camera_v; float3 camera_w; OptixTraversableHandle traversable; unsigned int width; unsigned int height; unsigned int samples_per_pixel; unsigned int max_depth; CUdeviceptr output_buffer; }; // Get texture coordinates at hit point __device__ float2 getTexCoords(const HitGroupData* data, const float2& barycentrics) { if (!data->has_texcoords || !data->texcoords) { return make_float2(0.5f, 0.5f); } const int primitiveIndex = optixGetPrimitiveIndex(); int3 indices; if (data->indices) { indices = *((int3*)(data->indices) + primitiveIndex); } else { indices.x = 3 * primitiveIndex; indices.y = 3 * primitiveIndex + 1; indices.z = 3 * primitiveIndex + 2; } float2* texcoords = (float2*)(data->texcoords); float2 tc0 = texcoords[indices.x]; float2 tc1 = texcoords[indices.y]; float2 tc2 = texcoords[indices.z]; float w0 = 1.0f - barycentrics.x - barycentrics.y; float w1 = barycentrics.x; float w2 = barycentrics.y; float2 result; result.x = w0 * tc0.x + w1 * tc1.x + w2 * tc2.x; result.y = w0 * tc0.y + w1 * tc1.y + w2 * tc2.y; return result; } // Sample texture __device__ float4 sampleTexture(CUtexObject tex, float2 uv) { return tex2D<float4>(tex, uv.x, uv.y); } extern "C" { __constant__ Params params; } // Ray generation shader extern "C" __global__ void __raygen__renderFrame() { const uint3 idx = optixGetLaunchIndex(); const uint3 dim = optixGetLaunchDimensions(); unsigned int x = idx.x; unsigned int y = idx.y; unsigned int width = dim.x; unsigned int height = dim.y; if (params.width > 0) width = params.width; if (params.height > 0) height = params.height; if (x >= width || y >= height) { return; } float3 color = make_float3(0.1f, 0.1f, 0.1f); if (params.traversable) { const float3 ray_origin = params.camera_pos; // Flip y coordinate to make Y+ point up in camera space // Standard NDC has y from -1 (bottom) to 1 (top) // By flipping the y coordinate here, we make Y+ upward const float2 screen_pos = make_float2( (float)x / width * 2.0f - 1.0f, -((float)y / height * 2.0f - 1.0f) // Flip Y coordinate ); float3 dir = make_float3( params.camera_u.x * screen_pos.x + params.camera_v.x * screen_pos.y + params.camera_w.x, params.camera_u.y * screen_pos.x + params.camera_v.y * screen_pos.y + params.camera_w.y, params.camera_u.z * screen_pos.x + params.camera_v.z * screen_pos.y + params.camera_w.z ); // The Z- forward convention is handled in the camera basis calculation in setupLaunchParams // This is controlled by the camera "up" and "lookAt" vectors float length = sqrtf(dir.x*dir.x + dir.y*dir.y + dir.z*dir.z); const float3 ray_direction = make_float3(dir.x/length, dir.y/length, dir.z/length); unsigned int p0 = 0; unsigned int p1 = 0; unsigned int p2 = 0; unsigned int sbtOffset = 0; optixTrace( params.traversable, ray_origin, ray_direction, 0.0f, 1e16f, 0.0f, OptixVisibilityMask(1), OPTIX_RAY_FLAG_DISABLE_ANYHIT, sbtOffset, 1, 0, p0, p1, p2 ); color.x = __uint_as_float(p0); color.y = __uint_as_float(p1); color.z = __uint_as_float(p2); } float3* output = (float3*)params.image; if (params.output_buffer != 0) { unsigned int idx = y * width + x; if (x < width && y < height) { output[idx] = color; } } } // Miss shader extern "C" __global__ void __miss__environment() { float3 color = make_float3(0.1f, 0.1f, 0.2f); optixSetPayload_0(__float_as_uint(color.x)); optixSetPayload_1(__float_as_uint(color.y)); optixSetPayload_2(__float_as_uint(color.z)); } // Get normal at hit point __device__ float3 getNormal(const HitGroupData* data, const float2& barycentrics) { if (!data->has_normals || !data->normals) { // Return a default normal if normals are not available return make_float3(0.0f, 1.0f, 0.0f); } const int primitiveIndex = optixGetPrimitiveIndex(); int3 indices; if (data->indices) { indices = *((int3*)(data->indices) + primitiveIndex); } else { indices.x = 3 * primitiveIndex; indices.y = 3 * primitiveIndex + 1; indices.z = 3 * primitiveIndex + 2; } float3* normals = (float3*)(data->normals); float3 n0 = normals[indices.x]; float3 n1 = normals[indices.y]; float3 n2 = normals[indices.z]; float w0 = 1.0f - barycentrics.x - barycentrics.y; float w1 = barycentrics.x; float w2 = barycentrics.y; float3 result = make_float3( w0 * n0.x + w1 * n1.x + w2 * n2.x, w0 * n0.y + w1 * n1.y + w2 * n2.y, w0 * n0.z + w1 * n1.z + w2 * n2.z ); return normalize(result); } // Simple lighting calculation __device__ float3 calculateLighting( const float3& position, const float3& normal, const float3& view_dir, const float3& albedo, float metallic, float roughness ) { // Fixed light position for simplicity float3 light_dir = normalize(make_float3(1.0f, 1.0f, 1.0f)); float3 light_color = make_float3(1.0f, 1.0f, 1.0f); float light_intensity = 1.5f; // Ambient lighting float ambient_strength = 0.2f; float3 ambient = ambient_strength * albedo; // Diffuse lighting (Lambert model) float diff = max(dot(normal, light_dir), 0.0f); float3 diffuse = diff * light_color * albedo; // Specular lighting (Blinn-Phong approximation) float3 halfway_dir = normalize(light_dir + view_dir); float spec = pow(max(dot(normal, halfway_dir), 0.0f), 32.0f); float specular_strength = 0.5f * (1.0f - roughness); float3 specular = specular_strength * spec * light_color; // Combine components float3 result = ambient + (diffuse + specular) * light_intensity; // Ensure values are in valid range result.x = min(result.x, 1.0f); result.y = min(result.y, 1.0f); result.z = min(result.z, 1.0f); return result; } // Closest hit shader extern "C" __global__ void __closesthit__radiance() { const HitGroupData* data = (const HitGroupData*)optixGetSbtDataPointer(); const float2 barycentrics = optixGetTriangleBarycentrics(); float2 texCoords = getTexCoords(data, barycentrics); float3 normal = getNormal(data, barycentrics); // Material properties float3 albedo = data->material.albedo; float metallic = data->material.metallic; float roughness = data->material.roughness; // Sample textures if available if (data->albedo_texture) { float4 texColor = sampleTexture(data->albedo_texture, texCoords); albedo = make_float3(texColor.x, texColor.y, texColor.z); } if (data->metallic_roughness_texture) { float4 mrTexColor = sampleTexture(data->metallic_roughness_texture, texCoords); // In most PBR textures: // - Metallic is in the blue channel // - Roughness is in the green channel metallic = mrTexColor.z; roughness = mrTexColor.y; } if (data->normal_texture && data->normals) { // Just check for normals directly float4 normalTexColor = sampleTexture(data->normal_texture, texCoords); float3 tangentSpaceNormal = make_float3( normalTexColor.x * 2.0f - 1.0f, normalTexColor.y * 2.0f - 1.0f, normalTexColor.z * 2.0f - 1.0f ); // For visualization, we'll just use the RGB directly for now // In a real shader, we would transform this using a TBN matrix normal = normalize(tangentSpaceNormal); } // Get hit position and view direction for lighting calculation float3 ray_dir = optixGetWorldRayDirection(); float ray_t = optixGetRayTmax(); float3 position = optixGetWorldRayOrigin() + ray_t * ray_dir; float3 view_dir = -ray_dir; // Direction from hit point to camera // Choose visualization mode - Uncomment the one you want to use // Option 1: Apply lighting to the material (recommended) float3 color = calculateLighting(position, normal, view_dir, albedo, metallic, roughness); // Option 2: Display only albedo (base color) // float3 color = albedo; // Option 3: Display normals mapped from [-1,1] to [0,1] // float3 color = make_float3(normal.x * 0.5f + 0.5f, normal.y * 0.5f + 0.5f, normal.z * 0.5f + 0.5f); // Option 4: Display PBR properties in RGB channels // float3 color = make_float3(roughness, roughness, metallic); // Option 5: Mix albedo with metallic and roughness for debugging // color = albedo; // color.z = color.z * 0.5f + metallic * 0.5f; // Add metallic to blue channel // color.y = color.y * 0.5f + roughness * 0.5f; // Add roughness to green channel optixSetPayload_0(__float_as_uint(color.x)); optixSetPayload_1(__float_as_uint(color.y)); optixSetPayload_2(__float_as_uint(color.z)); }
Milestone 5 (Target Date: 5.05)
- **Local Windows build** — spent 6 h; MSVC fails to compile, project runnable on Linux only MSVC segfaults on windows when trying to compile with openvr
- **C++ camera-code optimisation & code cleanup** — spent 5 h. optix_renderer::CameraParameters setupVRCamera(const CameraSetupParams& params) {
optix_renderer::CameraParameters cam; // Convert matrices to Eigen Eigen::Map<const Eigen::Matrix<float, 3, 4, Eigen::RowMajor>> headPoseNp(params.headPoseMatrix.data()); Eigen::Map<const Eigen::Matrix<float, 3, 4, Eigen::RowMajor>> eyeToHeadNp(params.eyeToHeadMatrix.data()); Eigen::Map<const Eigen::Matrix<float, 4, 4, Eigen::RowMajor>> projNp(params.projMatrix.data()); // Calculate position Eigen::Vector3f position; if (params.lockPosition) { position = Eigen::Map<const Eigen::Vector3f>(params.basePos.data()); } else { Eigen::Vector3f eyeOffset = eyeToHeadNp.block<3,1>(0,3); Eigen::Matrix4f hp4 = Eigen::Matrix4f::Identity(); hp4.block<3,4>(0,0) = headPoseNp; Eigen::Vector4f homogeneous; homogeneous << eyeOffset, 1.0f; position = (hp4 * homogeneous).head<3>(); position.z() *= -1.0f; } // Calculate orientation using rotation matrix Eigen::Matrix3f rotMatrix = headPoseNp.block<3,3>(0,0); Eigen::Vector3f euler = rotMatrix.eulerAngles(0,1,2); // xyz order // Invert pitch and yaw euler[0] = -euler[0]; // pitch euler[1] = -euler[1]; // yaw // Create rotation matrix from adjusted angles Eigen::Matrix3f adjustedRot; adjustedRot = Eigen::AngleAxisf(euler[2], Eigen::Vector3f::UnitZ()) * Eigen::AngleAxisf(euler[1], Eigen::Vector3f::UnitY()) * Eigen::AngleAxisf(euler[0], Eigen::Vector3f::UnitX()); // Calculate camera basis vectors Eigen::Vector3f right = adjustedRot * Eigen::Vector3f(1,0,0); Eigen::Vector3f up = adjustedRot * Eigen::Vector3f(0,1,0); Eigen::Vector3f forward = adjustedRot * Eigen::Vector3f(0,0,-1); // Calculate projection parameters float tanHalfW = 1.0f / std::abs(projNp(0,0)); float tanHalfH = 1.0f / std::abs(projNp(1,1)); float offX = projNp(0,2); float offY = projNp(1,2); // Calculate camera vectors Eigen::Vector3f cameraU = right; Eigen::Vector3f cameraV = up; Eigen::Vector3f cameraW = position - tanHalfW * (offX + 1.0f) * right - tanHalfH * (offY + 1.0f) * up - forward; // Copy results to camera parameters cam.position = {position[0], position[1], position[2]}; cam.u = {cameraU[0], cameraU[1], cameraU[2]}; cam.v = {cameraV[0], cameraV[1], cameraV[2]}; cam.w = {cameraW[0], cameraW[1], cameraW[2]}; return cam; }
python bindings
py::class_<ors::CameraSetupParams>(m, "CameraSetupParams") .def(py::init<>()) .def_readwrite("eye_matrix", &ors::CameraSetupParams::eyeMatrix) .def_readwrite("eye_to_head_matrix", &ors::CameraSetupParams::eyeToHeadMatrix) .def_readwrite("head_pose_matrix", &ors::CameraSetupParams::headPoseMatrix) .def_readwrite("proj_matrix", &ors::CameraSetupParams::projMatrix) .def_readwrite("base_pos", &ors::CameraSetupParams::basePos) .def_readwrite("lock_position", &ors::CameraSetupParams::lockPosition);
Milestone 6 (Target Date: 19.05)
- **Documentation cleanup** — 5 h.
- **dynamic Geometry rebuild - 2h
python bindings
.def("set_scene_changed", &Renderer::setSceneChanged) .def("is_scene_changed", &Renderer::isSceneChanged) .def("reset_scene_changed", &Renderer::resetSceneChanged);
build and rebuild acceleration structure
bool buildInstanceAccelerationStructure() { if (!m_deviceContext->setDevice()) { std::cerr << "Failed to set device context for acceleration structure build" << std::endl; return false; } // Calculate the number of instances const size_t numInstances = m_instances.size(); if (numInstances == 0) { return true; // Nothing to do } // Create array of OptixInstance for the IAS std::vector<OptixInstance> optixInstances; // Initialize array - we'll only add visible instances // instead of pre-allocating and potentially skipping entries optixInstances.clear(); // Fill in the instances with transforms and references to GAS for (size_t i = 0; i < numInstances; i++) { auto& instance = m_instances[i]; // Skip if the instance is not visible if (!instance->isVisible()) { continue; } // Create new instance and add to vector OptixInstance newInstance = {}; optixInstances.push_back(newInstance); // Get reference to the newly added instance auto& optixInstance = optixInstances.back(); // Get the resource IDs GeometryHandle geometryId = instance->getGeometryHandle(); MaterialHandle materialId = instance->getMaterialHandle(); // Get the device geometry data //std::cout << "Configuring instance " << i << " with geometry ID: " << geometryId << std::endl; auto geomData = m_deviceContext->getGeometryData(geometryId); if (!geomData) { std::cerr << "Error: Geometry data not found for instance " << i << std::endl; continue; } // Skip any geometry with a null traversable handle if (geomData->traversable == 0) { std::cerr << "Error: Geometry " << i << " has a null traversable handle" << std::endl; continue; } // Debug - print traversable handle //std::cout << "Instance " << i << " using traversable handle: " << geomData->traversable << std::endl; // Get the transform matrix from the instance torch::Tensor transform = instance->getTransform(); // Ensure the transform tensor is contiguous in memory if (!transform.is_contiguous()) { transform = transform.contiguous(); } // Print the transform matrix for debugging (Row-major format from PyTorch) //std::cout << "Transform BEFORE transpose for instance " << i << " (row-major):" << std::endl; float* transformPtr = (float*)transform.data_ptr(); //for (int row = 0; row < 4; row++) { // std::cout << " "; // for (int col = 0; col < 4; col++) { // std::cout << transformPtr[row * 4 + col] << " "; // } // std::cout << std::endl; //} // NO transpose - try using row-major directly // Let's copy the first 3 rows of the 4x4 matrix directly optixInstance.transform[0] = transformPtr[0]; // Row 0, Col 0 optixInstance.transform[1] = transformPtr[1]; // Row 0, Col 1 optixInstance.transform[2] = transformPtr[2]; // Row 0, Col 2 optixInstance.transform[3] = transformPtr[3]; // Row 0, Col 3 (translation X) optixInstance.transform[4] = transformPtr[4]; // Row 1, Col 0 optixInstance.transform[5] = transformPtr[5]; // Row 1, Col 1 optixInstance.transform[6] = transformPtr[6]; // Row 1, Col 2 optixInstance.transform[7] = transformPtr[7]; // Row 1, Col 3 (translation Y) optixInstance.transform[8] = transformPtr[8]; // Row 2, Col 0 optixInstance.transform[9] = transformPtr[9]; // Row 2, Col 1 optixInstance.transform[10] = transformPtr[10]; // Row 2, Col 2 optixInstance.transform[11] = transformPtr[11]; // Row 2, Col 3 (translation Z) // Print the OptiX transform matrix (column-major) for debugging //std::cout << "Transform AFTER transpose for instance " << i << " (column-major):" << std::endl; //std::cout << " [" << optixInstance.transform[0] << ", " << optixInstance.transform[4] << ", " << optixInstance.transform[8] << ", " << "translation X: " << optixInstance.transform[3] << "]" << std::endl; //std::cout << " [" << optixInstance.transform[1] << ", " << optixInstance.transform[5] << ", " << optixInstance.transform[9] << ", " << "translation Y: " << optixInstance.transform[7] << "]" << std::endl; //std::cout << " [" << optixInstance.transform[2] << ", " << optixInstance.transform[6] << ", " << optixInstance.transform[10] << ", " << "translation Z: " << optixInstance.transform[11] << "]" << std::endl; // Set instance ID for hit group selection optixInstance.instanceId = static_cast<unsigned int>(i); // Set the SBT offset for material selection // Use the instance index as SBT offset to select different materials // Instance 0 = Cube = Blue, Instance 1 = Floor = Green optixInstance.sbtOffset = static_cast<unsigned int>(i); // Set the visibility mask (all visible by default) optixInstance.visibilityMask = 255; // Set the traversable handle to the geometry's GAS optixInstance.traversableHandle = geomData->traversable; // Set bitfield flags optixInstance.flags = OPTIX_INSTANCE_FLAG_NONE; } // Check if we have any instances after filtering for visibility if (optixInstances.empty()) { std::cerr << "WARNING: No visible instances to render!" << std::endl; // Set empty traversable m_ias = 0; return true; } // Debug output for instances //std::cout << "Building IAS with " << optixInstances.size() << " visible instances" << std::endl; // Create CUdeviceptr for the instance array - use consistent CUDA driver API size_t instanceBufferSize = sizeof(OptixInstance) * optixInstances.size(); CUdeviceptr d_instances; CUDA_DRIVER_CHECK(cuMemAllocAsync(&d_instances, instanceBufferSize, m_deviceContext->getStream())); CUDA_DRIVER_CHECK(cuMemcpyHtoDAsync( d_instances, optixInstances.data(), instanceBufferSize , m_deviceContext->getStream() )); // Set up the build input OptixBuildInput buildInput = {}; buildInput.type = OPTIX_BUILD_INPUT_TYPE_INSTANCES; buildInput.instanceArray.instances = d_instances; buildInput.instanceArray.numInstances = static_cast<unsigned int>(optixInstances.size()); // Set up the acceleration structure build options OptixAccelBuildOptions accelOptions = {}; accelOptions.buildFlags = OPTIX_BUILD_FLAG_ALLOW_UPDATE | OPTIX_BUILD_FLAG_ALLOW_COMPACTION | OPTIX_BUILD_FLAG_PREFER_FAST_TRACE; accelOptions.operation = OPTIX_BUILD_OPERATION_BUILD; // Compute buffer sizes OptixAccelBufferSizes bufferSizes; OPTIX_CHECK(optixAccelComputeMemoryUsage( m_deviceContext->getContext(), &accelOptions, &buildInput, 1, // one build input &bufferSizes )); // Allocate buffers - use consistent CUDA driver API CUdeviceptr d_tempBuffer; CUdeviceptr d_iasOutputBuffer; CUDA_DRIVER_CHECK(cuMemAllocAsync(&d_tempBuffer, bufferSizes.tempSizeInBytes, m_deviceContext->getStream())); CUDA_DRIVER_CHECK(cuMemAllocAsync(&d_iasOutputBuffer, bufferSizes.outputSizeInBytes, m_deviceContext->getStream())); // Debug: Check valid input //std::cout << "Before IAS build: Instances buffer is " << (d_instances ? "valid" : "null") << std::endl; //std::cout << "Before IAS build: Temp buffer is " << (d_tempBuffer ? "valid" : "null") << std::endl; //std::cout << "Before IAS build: Output buffer is " << (d_iasOutputBuffer ? "valid" : "null") << std::endl; // Build the acceleration structure OptixTraversableHandle tempHandle = 0; OPTIX_CHECK(optixAccelBuild( m_deviceContext->getContext(), 0, // CUDA stream &accelOptions, &buildInput, 1, // one build input d_tempBuffer, bufferSizes.tempSizeInBytes, d_iasOutputBuffer, bufferSizes.outputSizeInBytes, &tempHandle, // output to temp variable first nullptr, // no emitted properties 0 // no emitted properties )); // Store the handle and debug output m_ias = tempHandle; //std::cout << "IAS build complete, traversable handle: " << m_ias << std::endl; // Clean up temporary buffers - use CUDA driver API for consistency CUDA_DRIVER_CHECK(cuMemFreeAsync(d_tempBuffer, m_deviceContext->getStream())); // Store the buffers we need to keep - free old buffers with CUDA driver API if (m_d_iasOutputBuffer) { CUDA_DRIVER_CHECK(cuMemFreeAsync(m_d_iasOutputBuffer, m_deviceContext->getStream())); } if (m_d_instances) { CUDA_DRIVER_CHECK(cuMemFreeAsync(m_d_instances, m_deviceContext->getStream())); } m_d_iasOutputBuffer = d_iasOutputBuffer; m_d_instances = d_instances; m_numInstances = optixInstances.size(); return true; }
At the conclusion of the project, the final build, repository link, and a short final result video will be provided.