optixVR engine
Artem Kabakov
This project is a rewrite of the Optix 8.1 engine, updated to support multi-GPU configurations and VR integration. The code has been migrated to Optix 9 with CUDA 12.8 on Linux, addressing previous limitations and enabling advanced rendering techniques. Detailed benchmarks, hardware configuration, and demo videos are provided below. For a full project plan, please refer to the linked documentation (if available).
Final build, repository link, and a brief (10-20 sec) final result video will be added once the project is completed.
Milestone 1 (Target Date: 10.03)
- Rendering Fixes (estimated: 2-3 hours): Resolve artifacts in Optix 9 videos caused by incorrect handling of object pointers between Python and C++ bindings.
- Memory & Shader Updates (estimated: 2 hours): Improve texture memory management and apply necessary shader fixes.
- GPU Selection Feature (estimated: 1-2 hours): Implement a feature to select the target GPU in code (instead of using CUDA_VISIBLE_DEVICES).
**Hardware & Benchmark Details:**
- System Configuration: 2 × NVIDIA GeForce RTX 5090, Intel i9-12900K, 96GB DDR4 3200.
Transfer speed benchmark Report:
1. GPU 0 (PCIe Root Complex: 0000:01)
- Host → Device: 33638 MB/s
- Device → Host: 41879.4 MB/s
- NVML PCIe Throughput: TX 21012 KB/s, RX 410 KB/s
2. GPU 1 (PCIe Root Complex: 0000:05)
- Host → Device: 848.386 MB/s
- Device → Host: 851.835 MB/s
- NVML PCIe Throughput: TX 394 KB/s, RX 399 KB/s
**Demo Videos:**
- Original Optix 8.1 demo recorded on RTX3070: optix81game.mp4
- Optix 9 demo on GPU 0 (PCIe5.0 x16, 100-130 fps at 2064×2208): OPTIX9_GPU0-pcie5x16.mp4
- Optix 9 demo on GPU 1 (PCIe3.0 x1, 62-58 fps at 2064×2208): OPTIX9_GPU1-pcie3x1.mp4
- OpenXR with CUDA interop demo (dual mandelbrot render to OpenGL buffers): OpenXR_CUDA_2DEV.mp4
**Development Notes:**
- The original Optix 8.1 application was migrated to Optix 9 to support current hardware.
- For Optix9, frames are rendered at 2064×2208 and then the full-resolution images are copied to the CPU before being cropped to 1920×1080.
- When using OpenVR/XR, the system employs direct writing to OpenGL/Vulkan buffers via CUDA interop.
Milestone 2 (Target Date: 24.03)
- initial fixes milestone 1
- memory fixes for renderer 2h - gpu selection 2h (works only for single gpu at time) - memory shader updates 4h (done in new resource system)
- new resource system (~30h total)
- python tensors problems(3h) - new architecture research (2h) - refactoring and reimplementing OptiX usage for new architecture (9h) - multigpu resources synchronisation (8h) - new python bindings and gui code (4h)
- more details in presentation https://docs.google.com/presentation/d/144S91CHOROZGn3a6xB_6yOgCVZv76l-j2oFlbHrci-Q/edit?usp=sharing
Milestone 3 (Target Date: 4.07)
- Headset rendering (Not done)
- alvr linux setup for oculus 3h


- alvr tries to connect to Oculus, using network forwarded by ADB-forwarder to the device

- enables connection, but drops; at the same time, it rewrites the network configuration in order to play pretend a wireless device connected through ipv4. After a wireless connection has been established, an existing wired connection has to cease to exist, otherwise it conflicts with the new wireless connection
main function of cuda interop with openvr
- alvr doesn't work out-of-the-box by the newest manual but if we use some hacks and partiallyutilizie older methods, it's feasible to establish a connction between steamvr and oculus.
int main(int argc, char* argv[]) {
// Initialize CUDA
const cudaError_t cudaStatus = cudaFree(nullptr);
if (cudaStatus != cudaSuccess) {
std::cerr << "CUDA initialization failed: " << cudaGetErrorString(cudaStatus) << std::endl;
return -1;
}
// Create OpenVR system
OpenVRSystem vrSystem;
if (!vrSystem.Initialize()) {
std::cerr << "Failed to initialize OpenVR system" << std::endl;
return -1;
}
std::cout << "Running main loop... Press ESC to exit." << std::endl;
// Main loop
while (!vrSystem.ShouldClose())
{
// Render a frame
vrSystem.RenderFrame();
// Poll for window events
vrSystem.PollEvents();
std::cout<<"pool"<<std::endl;
// Sleep briefly to prevent 100% CPU usage
std::this_thread::sleep_for(std::chrono::milliseconds(1));
std::cout<<"sleep"<<std::endl;
}
std::cout << "Shutting down..." << std::endl;
return 0;
}
launching different cuda kernels dependent on eye
dim3 blockSize(16, 16);
dim3 gridSize((m_renderWidth + blockSize.x - 1) / blockSize.x,
(m_renderHeight + blockSize.y - 1) / blockSize.y);
if (eye == vr::Eye_Left) {
renderLeftEyeGrid<<<gridSize, blockSize>>>(deviceBuffer, m_renderWidth, m_renderHeight, pitch, m_time);
} else {
renderRightEyeGrid<<<gridSize, blockSize>>>(deviceBuffer, m_renderWidth, m_renderHeight, pitch, m_time);
}
actual kernel to draw left eye
__global__ void renderLeftEyeGrid(unsigned char* surface, int width, int height, int pitch, float time) {
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= width || y >= height) return;
// Calculate output position in the surface
unsigned char* pixel = surface + y * pitch + x * 4;
// Create a grid pattern that moves slowly over time
int gridSize = 32; // Size of each grid cell
int offsetX = (int)(time * 10.0f) % gridSize;
int offsetY = (int)(time * 5.0f) % gridSize;
bool isGridLine = ((x + offsetX) % gridSize < 2) || ((y + offsetY) % gridSize < 2);
if (isGridLine) {
// Grid lines: Red
pixel[0] = 0; // B
pixel[1] = 0; // G
pixel[2] = 255; // R
pixel[3] = 255; // A
} else {
// Grid cells: Black
pixel[0] = 0; // B
pixel[1] = 0; // G
pixel[2] = 0; // R
pixel[3] = 255; // A
}
}
- python bindings problems 2h linker error (not finished)
> python test_openvr_system.py
Error importing OpenVRSystem: libc10.so: cannot open shared object file: No such file or
directory
Make sure the extension is properly compiled with OpenVR support.
- renderer component integration
https://drive.google.com/file/d/1aJcXFPpY6meKdMigORkyu8KTGh-POX3W/view?resourcekey&pli=1 to run this view (with control by headset) on a headset
- MTL materials **bonus task (Not done)
- examples research 3h https://github.com/NVIDIA/OptiX_Apps.git - simple implementation attempts (add support for callable programs) 4h
No headset records available for now due to nvidia kernel driver crash after clicking record in alvr. I was working on bonus task until got headset 04.07.2025
Milestone 4 (Target Date: 21.07)
- full documentation clenup!!! est: 4h?
- openvr integration with renderer est: 6h
first working attempt after 3h of work, the linking problem was runtime only and required to set LD_LIBRARY_PATH to pytorch libraries which was used for build https://node.ukrain.ee/first_working_wr.mp4 the video artifacts caused by steamVR vr view window and does not apear in headset.
- recalculate the pinhole camera matrix from HMD rotation and translation + per eye translation
def setup_vr_camera(
eye_matrix, # <‑‑ now accepted but not used
eye_to_head_matrix, # 3×4 per‑eye
head_pose_matrix, # 3×4 HMD pose in world space
proj_matrix, # 4×4 per‑eye projection
base_pos=(0.0, 0.0, 0.0), # locked‑camera fallback
lock_position=False
):
"""
Builds an ors.CameraParameters for OptiX ray generation (UVW method).
Parameters
----------
eye_matrix : 3×4
Reserved for future use (time‑warp, etc.); ignored by this function.
eye_to_head_matrix : 3×4
Eye‑to‑head transform from OpenVR for this eye.
head_pose_matrix : 3×4
Headset pose in world space.
proj_matrix : 4×4
Asymmetric projection matrix for this eye.
base_pos : 3‑tuple
World‑space position used when lock_position is True.
lock_position : bool
If True, ignore positional tracking and keep the camera at base_pos.
"""
import numpy as np
from scipy.spatial.transform import Rotation as R
cam = ors.CameraParameters()
head_pose_np = np.asarray(head_pose_matrix, dtype=float).reshape(3, 4)
eye_to_head_np = np.asarray(eye_to_head_matrix, dtype=float).reshape(3, 4)
proj_np = np.asarray(proj_matrix, dtype=float).reshape(4, 4)
# ---------- position ----------------------------------------------------
if lock_position:
position = np.array(base_pos, dtype=np.float64)
else:
eye_offset = eye_to_head_np[:, 3]
hp4 = np.eye(4, dtype=np.float64)
hp4[:3, :4] = head_pose_np
position = (hp4 @ np.append(eye_offset, 1.0))[:3]
position[2] *= -1
# ---------- orientation (quaternion) -----------------------------------
rot_q = R.from_matrix(head_pose_np[:, :3])
yaw, pitch, roll = rot_q.as_euler("xyz", degrees=False)
# Uncomment any axis you want to invert
pitch = -pitch # nodding
yaw = -yaw # turning
# roll = -roll # tilting
rot_q = R.from_euler("xyz", [yaw, pitch, roll])
right = rot_q.apply([1, 0, 0])
up = rot_q.apply([0, 1, 0])
forward = rot_q.apply([0, 0, -1])
# ---------- projection parameters --------------------------------------
tan_half_w = 1.0 / abs(proj_np[0, 0])
tan_half_h = 1.0 / abs(proj_np[1, 1])
off_x = proj_np[0, 2]
off_y = proj_np[1, 2]
camera_u = right #* (2.0 * tan_half_w)
camera_v = up #* (2.0 * tan_half_h)
camera_w = (position
- tan_half_w * (off_x + 1.0) * right
- tan_half_h * (off_y + 1.0) * up
- forward)
# ---------- write to struct --------------------------------------------
cam.position = position.tolist()
cam.camera_u = camera_u.tolist()
cam.camera_v = camera_v.tolist()
cam.camera_w = camera_w.tolist()
return cam
- cuda memory copy and sync between 2 devices for openvr
// Get tensor data pointer
void* tensor_ptr = nullptr;
size_t buffer_size = m_renderWidth * m_renderHeight * 3 * sizeof(float);
// Move tensor to the current GPU if needed
torch::Tensor gpu_result;
if (result.is_cuda() && result.device().index() != m_deviceId) {
//std::cout << "Moving tensor from device " << result.device().index()
// << " to device " << m_deviceId << std::endl;
void* source_ptr = result.data_ptr();
if (!renderer->getDeviceContext()->setDevice()) {
exit(-1);
}
// Create and start timing for GPU to CPU copy
cudaEvent_t startToCpu, stopToCpu;
CHECK_CUDA_ERROR(cudaEventCreate(&startToCpu));
CHECK_CUDA_ERROR(cudaEventCreate(&stopToCpu));
CHECK_CUDA_ERROR(cudaEventRecord(startToCpu, 0));
// Copy from GPU to CPU
float* host_buffer = new float[m_renderWidth * m_renderHeight * 3];
CHECK_CUDA_ERROR(cudaMemcpy(host_buffer, source_ptr, buffer_size, cudaMemcpyDeviceToHost));
CHECK_CUDA_ERROR(cudaDeviceSynchronize());
// Record and calculate the time for GPU to CPU copy
CHECK_CUDA_ERROR(cudaEventRecord(stopToCpu, 0));
CHECK_CUDA_ERROR(cudaEventSynchronize(stopToCpu));
float toCpuTimeMs = 0.0f;
CHECK_CUDA_ERROR(cudaEventElapsedTime(&toCpuTimeMs, startToCpu, stopToCpu));
// Save the GPU to CPU copy time
if (eye == vr::Eye_Left) {
leftEyeToCpuCopyTime = toCpuTimeMs;
} else {
rightEyeToCpuCopyTime = toCpuTimeMs;
}
// Add to total for this frame
totalToCpuCopyTime += toCpuTimeMs;
// Clean up events
CHECK_CUDA_ERROR(cudaEventDestroy(startToCpu));
CHECK_CUDA_ERROR(cudaEventDestroy(stopToCpu));
// Ensure we're in the right CUDA context for resource mapping
if (useDeviceContext) {
if (!m_deviceContext->setDevice()) {
//std::cerr << "Failed to set device context for resource mapping" << std::endl;
useDeviceContext = false;
cudaSetDevice(m_deviceId);
}
} else {
// Fallback to setting the device directly
CHECK_CUDA_ERROR(cudaSetDevice(m_deviceId));
}
// Create and start timing for CPU to GPU copy
cudaEvent_t startFromCpu, stopFromCpu;
CHECK_CUDA_ERROR(cudaEventCreate(&startFromCpu));
CHECK_CUDA_ERROR(cudaEventCreate(&stopFromCpu));
CHECK_CUDA_ERROR(cudaEventRecord(startFromCpu, 0));
// Copy from CPU to GPU
CHECK_CUDA_ERROR(cudaMalloc(&tensor_ptr, buffer_size));
CHECK_CUDA_ERROR(cudaMemcpy(tensor_ptr, host_buffer, buffer_size, cudaMemcpyHostToDevice));
CHECK_CUDA_ERROR(cudaDeviceSynchronize());
// Record and calculate the time for CPU to GPU copy
CHECK_CUDA_ERROR(cudaEventRecord(stopFromCpu, 0));
CHECK_CUDA_ERROR(cudaEventSynchronize(stopFromCpu));
float fromCpuTimeMs = 0.0f;
CHECK_CUDA_ERROR(cudaEventElapsedTime(&fromCpuTimeMs, startFromCpu, stopFromCpu));
// Save the CPU to GPU copy time
if (eye == vr::Eye_Left) {
leftEyeFromCpuCopyTime = fromCpuTimeMs;
} else {
rightEyeFromCpuCopyTime = fromCpuTimeMs;
}
// Add to total for this frame
totalFromCpuCopyTime += fromCpuTimeMs;
// Clean up events
CHECK_CUDA_ERROR(cudaEventDestroy(startFromCpu));
CHECK_CUDA_ERROR(cudaEventDestroy(stopFromCpu));
// Clean up host buffer
delete[] host_buffer;
//gpu_result = result.to(torch::Device(torch::kCUDA, m_deviceId));
} else if (!result.is_cuda()) {
gpu_result = result.to(torch::Device(torch::kCUDA, m_deviceId));
tensor_ptr = gpu_result.data_ptr();
} else {
tensor_ptr = result.data_ptr(); // Already on the right device
}
- per yey time usage by systemt
both eye on single gpu

each eye on separate gpu

- MTL base showkeys bonus* est: ...h
https://github.com/artyom0906/OptiX_PyTorch_Extension
- basic shader with hardcoded light and complex material textures
#include <optix.h>
#include <cuda_runtime.h>
#include <cuda.h>
#include <vector_types.h>
#include <vector_functions.h>
#include "material_types.h"
using namespace optix_renderer;
// Vector math helper functions
__device__ float3 operator+(const float3& a, const float3& b) {
return make_float3(a.x + b.x, a.y + b.y, a.z + b.z);
}
__device__ float3 operator-(const float3& a, const float3& b) {
return make_float3(a.x - b.x, a.y - b.y, a.z - b.z);
}
__device__ float3 operator*(const float3& a, float b) {
return make_float3(a.x * b, a.y * b, a.z * b);
}
__device__ float3 operator*(float a, const float3& b) {
return make_float3(a * b.x, a * b.y, a * b.z);
}
__device__ float3 operator*(const float3& a, const float3& b) {
return make_float3(a.x * b.x, a.y * b.y, a.z * b.z);
}
__device__ float3 operator-(const float3& a) {
return make_float3(-a.x, -a.y, -a.z);
}
__device__ float dot(const float3& a, const float3& b) {
return a.x * b.x + a.y * b.y + a.z * b.z;
}
__device__ float3 normalize(const float3& v) {
float invLen = 1.0f / sqrtf(dot(v, v));
return v * invLen;
}
__device__ inline float3 cross(const float3& a, const float3& b) {
return make_float3(a.y * b.z - a.z * b.y, a.z * b.x - a.x * b.z, a.x * b.y - a.y * b.x);
}
// Hit group data structure
// Hit group data - will be the payload for hit group records
struct HitGroupData {
SimpleMaterial material;
// Geometry data for this hit group
CUdeviceptr vertices;
CUdeviceptr indices;
CUdeviceptr normals;
CUdeviceptr texcoords;
CUdeviceptr tangents;
CUdeviceptr bitangents;
// Texture data using CUtexObject for driver API compatibility
CUtexObject albedo_texture;
CUtexObject normal_texture;
CUtexObject metallic_roughness_texture;
CUtexObject emission_texture;
CUtexObject specular_texture;
CUtexObject specular_tint_texture;
CUtexObject sheen_texture;
CUtexObject clearcoat_texture;
// Additional settings
bool has_normals;
bool has_texcoords;
bool has_tangents;
bool has_bitangents;
// Legacy compatibility fields
float3 albedo; // Direct color access for basic shading
float3 emission; // Direct emission color
};
// Launch parameters
struct Params {
float3* image;
float3 camera_pos;
float3 camera_u;
float3 camera_v;
float3 camera_w;
OptixTraversableHandle traversable;
unsigned int width;
unsigned int height;
unsigned int samples_per_pixel;
unsigned int max_depth;
CUdeviceptr output_buffer;
};
// Get texture coordinates at hit point
__device__ float2 getTexCoords(const HitGroupData* data, const float2& barycentrics) {
if (!data->has_texcoords || !data->texcoords) {
return make_float2(0.5f, 0.5f);
}
const int primitiveIndex = optixGetPrimitiveIndex();
int3 indices;
if (data->indices) {
indices = *((int3*)(data->indices) + primitiveIndex);
} else {
indices.x = 3 * primitiveIndex;
indices.y = 3 * primitiveIndex + 1;
indices.z = 3 * primitiveIndex + 2;
}
float2* texcoords = (float2*)(data->texcoords);
float2 tc0 = texcoords[indices.x];
float2 tc1 = texcoords[indices.y];
float2 tc2 = texcoords[indices.z];
float w0 = 1.0f - barycentrics.x - barycentrics.y;
float w1 = barycentrics.x;
float w2 = barycentrics.y;
float2 result;
result.x = w0 * tc0.x + w1 * tc1.x + w2 * tc2.x;
result.y = w0 * tc0.y + w1 * tc1.y + w2 * tc2.y;
return result;
}
// Sample texture
__device__ float4 sampleTexture(CUtexObject tex, float2 uv) {
return tex2D<float4>(tex, uv.x, uv.y);
}
extern "C" {
__constant__ Params params;
}
// Ray generation shader
extern "C" __global__ void __raygen__renderFrame() {
const uint3 idx = optixGetLaunchIndex();
const uint3 dim = optixGetLaunchDimensions();
unsigned int x = idx.x;
unsigned int y = idx.y;
unsigned int width = dim.x;
unsigned int height = dim.y;
if (params.width > 0) width = params.width;
if (params.height > 0) height = params.height;
if (x >= width || y >= height) {
return;
}
float3 color = make_float3(0.1f, 0.1f, 0.1f);
if (params.traversable) {
const float3 ray_origin = params.camera_pos;
// Flip y coordinate to make Y+ point up in camera space
// Standard NDC has y from -1 (bottom) to 1 (top)
// By flipping the y coordinate here, we make Y+ upward
const float2 screen_pos = make_float2(
(float)x / width * 2.0f - 1.0f,
-((float)y / height * 2.0f - 1.0f) // Flip Y coordinate
);
float3 dir = make_float3(
params.camera_u.x * screen_pos.x + params.camera_v.x * screen_pos.y + params.camera_w.x,
params.camera_u.y * screen_pos.x + params.camera_v.y * screen_pos.y + params.camera_w.y,
params.camera_u.z * screen_pos.x + params.camera_v.z * screen_pos.y + params.camera_w.z
);
// The Z- forward convention is handled in the camera basis calculation in setupLaunchParams
// This is controlled by the camera "up" and "lookAt" vectors
float length = sqrtf(dir.x*dir.x + dir.y*dir.y + dir.z*dir.z);
const float3 ray_direction = make_float3(dir.x/length, dir.y/length, dir.z/length);
unsigned int p0 = 0;
unsigned int p1 = 0;
unsigned int p2 = 0;
unsigned int sbtOffset = 0;
optixTrace(
params.traversable,
ray_origin,
ray_direction,
0.0f,
1e16f,
0.0f,
OptixVisibilityMask(1),
OPTIX_RAY_FLAG_DISABLE_ANYHIT,
sbtOffset,
1,
0,
p0, p1, p2
);
color.x = __uint_as_float(p0);
color.y = __uint_as_float(p1);
color.z = __uint_as_float(p2);
}
float3* output = (float3*)params.image;
if (params.output_buffer != 0) {
unsigned int idx = y * width + x;
if (x < width && y < height) {
output[idx] = color;
}
}
}
// Miss shader
extern "C" __global__ void __miss__environment() {
float3 color = make_float3(0.1f, 0.1f, 0.2f);
optixSetPayload_0(__float_as_uint(color.x));
optixSetPayload_1(__float_as_uint(color.y));
optixSetPayload_2(__float_as_uint(color.z));
}
// Get normal at hit point
__device__ float3 getNormal(const HitGroupData* data, const float2& barycentrics) {
if (!data->has_normals || !data->normals) {
// Return a default normal if normals are not available
return make_float3(0.0f, 1.0f, 0.0f);
}
const int primitiveIndex = optixGetPrimitiveIndex();
int3 indices;
if (data->indices) {
indices = *((int3*)(data->indices) + primitiveIndex);
} else {
indices.x = 3 * primitiveIndex;
indices.y = 3 * primitiveIndex + 1;
indices.z = 3 * primitiveIndex + 2;
}
float3* normals = (float3*)(data->normals);
float3 n0 = normals[indices.x];
float3 n1 = normals[indices.y];
float3 n2 = normals[indices.z];
float w0 = 1.0f - barycentrics.x - barycentrics.y;
float w1 = barycentrics.x;
float w2 = barycentrics.y;
float3 result = make_float3(
w0 * n0.x + w1 * n1.x + w2 * n2.x,
w0 * n0.y + w1 * n1.y + w2 * n2.y,
w0 * n0.z + w1 * n1.z + w2 * n2.z
);
return normalize(result);
}
// Simple lighting calculation
__device__ float3 calculateLighting(
const float3& position,
const float3& normal,
const float3& view_dir,
const float3& albedo,
float metallic,
float roughness
) {
// Fixed light position for simplicity
float3 light_dir = normalize(make_float3(1.0f, 1.0f, 1.0f));
float3 light_color = make_float3(1.0f, 1.0f, 1.0f);
float light_intensity = 1.5f;
// Ambient lighting
float ambient_strength = 0.2f;
float3 ambient = ambient_strength * albedo;
// Diffuse lighting (Lambert model)
float diff = max(dot(normal, light_dir), 0.0f);
float3 diffuse = diff * light_color * albedo;
// Specular lighting (Blinn-Phong approximation)
float3 halfway_dir = normalize(light_dir + view_dir);
float spec = pow(max(dot(normal, halfway_dir), 0.0f), 32.0f);
float specular_strength = 0.5f * (1.0f - roughness);
float3 specular = specular_strength * spec * light_color;
// Combine components
float3 result = ambient + (diffuse + specular) * light_intensity;
// Ensure values are in valid range
result.x = min(result.x, 1.0f);
result.y = min(result.y, 1.0f);
result.z = min(result.z, 1.0f);
return result;
}
// Closest hit shader
extern "C" __global__ void __closesthit__radiance() {
const HitGroupData* data = (const HitGroupData*)optixGetSbtDataPointer();
const float2 barycentrics = optixGetTriangleBarycentrics();
float2 texCoords = getTexCoords(data, barycentrics);
float3 normal = getNormal(data, barycentrics);
// Material properties
float3 albedo = data->material.albedo;
float metallic = data->material.metallic;
float roughness = data->material.roughness;
// Sample textures if available
if (data->albedo_texture) {
float4 texColor = sampleTexture(data->albedo_texture, texCoords);
albedo = make_float3(texColor.x, texColor.y, texColor.z);
}
if (data->metallic_roughness_texture) {
float4 mrTexColor = sampleTexture(data->metallic_roughness_texture, texCoords);
// In most PBR textures:
// - Metallic is in the blue channel
// - Roughness is in the green channel
metallic = mrTexColor.z;
roughness = mrTexColor.y;
}
if (data->normal_texture && data->normals) { // Just check for normals directly
float4 normalTexColor = sampleTexture(data->normal_texture, texCoords);
float3 tangentSpaceNormal = make_float3(
normalTexColor.x * 2.0f - 1.0f,
normalTexColor.y * 2.0f - 1.0f,
normalTexColor.z * 2.0f - 1.0f
);
// For visualization, we'll just use the RGB directly for now
// In a real shader, we would transform this using a TBN matrix
normal = normalize(tangentSpaceNormal);
}
// Get hit position and view direction for lighting calculation
float3 ray_dir = optixGetWorldRayDirection();
float ray_t = optixGetRayTmax();
float3 position = optixGetWorldRayOrigin() + ray_t * ray_dir;
float3 view_dir = -ray_dir; // Direction from hit point to camera
// Choose visualization mode - Uncomment the one you want to use
// Option 1: Apply lighting to the material (recommended)
float3 color = calculateLighting(position, normal, view_dir, albedo, metallic, roughness);
// Option 2: Display only albedo (base color)
// float3 color = albedo;
// Option 3: Display normals mapped from [-1,1] to [0,1]
// float3 color = make_float3(normal.x * 0.5f + 0.5f, normal.y * 0.5f + 0.5f, normal.z * 0.5f + 0.5f);
// Option 4: Display PBR properties in RGB channels
// float3 color = make_float3(roughness, roughness, metallic);
// Option 5: Mix albedo with metallic and roughness for debugging
// color = albedo;
// color.z = color.z * 0.5f + metallic * 0.5f; // Add metallic to blue channel
// color.y = color.y * 0.5f + roughness * 0.5f; // Add roughness to green channel
optixSetPayload_0(__float_as_uint(color.x));
optixSetPayload_1(__float_as_uint(color.y));
optixSetPayload_2(__float_as_uint(color.z));
}
Milestone 5 (Target Date: 5.05)
- **Local Windows build** — spent 6 h; MSVC fails to compile, project runnable on Linux only MSVC segfaults on windows when trying to compile with openvr
- **C++ camera-code optimisation & code cleanup** — spent 5 h. optix_renderer::CameraParameters setupVRCamera(const CameraSetupParams& params) {
optix_renderer::CameraParameters cam;
// Convert matrices to Eigen
Eigen::Map<const Eigen::Matrix<float, 3, 4, Eigen::RowMajor>> headPoseNp(params.headPoseMatrix.data());
Eigen::Map<const Eigen::Matrix<float, 3, 4, Eigen::RowMajor>> eyeToHeadNp(params.eyeToHeadMatrix.data());
Eigen::Map<const Eigen::Matrix<float, 4, 4, Eigen::RowMajor>> projNp(params.projMatrix.data());
// Calculate position
Eigen::Vector3f position;
if (params.lockPosition) {
position = Eigen::Map<const Eigen::Vector3f>(params.basePos.data());
} else {
Eigen::Vector3f eyeOffset = eyeToHeadNp.block<3,1>(0,3);
Eigen::Matrix4f hp4 = Eigen::Matrix4f::Identity();
hp4.block<3,4>(0,0) = headPoseNp;
Eigen::Vector4f homogeneous;
homogeneous << eyeOffset, 1.0f;
position = (hp4 * homogeneous).head<3>();
position.z() *= -1.0f;
}
// Calculate orientation using rotation matrix
Eigen::Matrix3f rotMatrix = headPoseNp.block<3,3>(0,0);
Eigen::Vector3f euler = rotMatrix.eulerAngles(0,1,2); // xyz order
// Invert pitch and yaw
euler[0] = -euler[0]; // pitch
euler[1] = -euler[1]; // yaw
// Create rotation matrix from adjusted angles
Eigen::Matrix3f adjustedRot;
adjustedRot = Eigen::AngleAxisf(euler[2], Eigen::Vector3f::UnitZ()) *
Eigen::AngleAxisf(euler[1], Eigen::Vector3f::UnitY()) *
Eigen::AngleAxisf(euler[0], Eigen::Vector3f::UnitX());
// Calculate camera basis vectors
Eigen::Vector3f right = adjustedRot * Eigen::Vector3f(1,0,0);
Eigen::Vector3f up = adjustedRot * Eigen::Vector3f(0,1,0);
Eigen::Vector3f forward = adjustedRot * Eigen::Vector3f(0,0,-1);
// Calculate projection parameters
float tanHalfW = 1.0f / std::abs(projNp(0,0));
float tanHalfH = 1.0f / std::abs(projNp(1,1));
float offX = projNp(0,2);
float offY = projNp(1,2);
// Calculate camera vectors
Eigen::Vector3f cameraU = right;
Eigen::Vector3f cameraV = up;
Eigen::Vector3f cameraW = position -
tanHalfW * (offX + 1.0f) * right -
tanHalfH * (offY + 1.0f) * up -
forward;
// Copy results to camera parameters
cam.position = {position[0], position[1], position[2]};
cam.u = {cameraU[0], cameraU[1], cameraU[2]};
cam.v = {cameraV[0], cameraV[1], cameraV[2]};
cam.w = {cameraW[0], cameraW[1], cameraW[2]};
return cam;
}
python bindings
py::class_<ors::CameraSetupParams>(m, "CameraSetupParams")
.def(py::init<>())
.def_readwrite("eye_matrix", &ors::CameraSetupParams::eyeMatrix)
.def_readwrite("eye_to_head_matrix", &ors::CameraSetupParams::eyeToHeadMatrix)
.def_readwrite("head_pose_matrix", &ors::CameraSetupParams::headPoseMatrix)
.def_readwrite("proj_matrix", &ors::CameraSetupParams::projMatrix)
.def_readwrite("base_pos", &ors::CameraSetupParams::basePos)
.def_readwrite("lock_position", &ors::CameraSetupParams::lockPosition);
Milestone 6 (Target Date: 19.05)
- **Documentation cleanup** — 5 h.
- **dynamic Geometry rebuild - 2h
python bindings
.def("set_scene_changed", &Renderer::setSceneChanged)
.def("is_scene_changed", &Renderer::isSceneChanged)
.def("reset_scene_changed", &Renderer::resetSceneChanged);
build and rebuild acceleration structure
bool buildInstanceAccelerationStructure() {
if (!m_deviceContext->setDevice()) {
std::cerr << "Failed to set device context for acceleration structure build" << std::endl;
return false;
}
// Calculate the number of instances
const size_t numInstances = m_instances.size();
if (numInstances == 0) {
return true; // Nothing to do
}
// Create array of OptixInstance for the IAS
std::vector<OptixInstance> optixInstances;
// Initialize array - we'll only add visible instances
// instead of pre-allocating and potentially skipping entries
optixInstances.clear();
// Fill in the instances with transforms and references to GAS
for (size_t i = 0; i < numInstances; i++) {
auto& instance = m_instances[i];
// Skip if the instance is not visible
if (!instance->isVisible()) {
continue;
}
// Create new instance and add to vector
OptixInstance newInstance = {};
optixInstances.push_back(newInstance);
// Get reference to the newly added instance
auto& optixInstance = optixInstances.back();
// Get the resource IDs
GeometryHandle geometryId = instance->getGeometryHandle();
MaterialHandle materialId = instance->getMaterialHandle();
// Get the device geometry data
//std::cout << "Configuring instance " << i << " with geometry ID: " << geometryId << std::endl;
auto geomData = m_deviceContext->getGeometryData(geometryId);
if (!geomData) {
std::cerr << "Error: Geometry data not found for instance " << i << std::endl;
continue;
}
// Skip any geometry with a null traversable handle
if (geomData->traversable == 0) {
std::cerr << "Error: Geometry " << i << " has a null traversable handle" << std::endl;
continue;
}
// Debug - print traversable handle
//std::cout << "Instance " << i << " using traversable handle: " << geomData->traversable << std::endl;
// Get the transform matrix from the instance
torch::Tensor transform = instance->getTransform();
// Ensure the transform tensor is contiguous in memory
if (!transform.is_contiguous()) {
transform = transform.contiguous();
}
// Print the transform matrix for debugging (Row-major format from PyTorch)
//std::cout << "Transform BEFORE transpose for instance " << i << " (row-major):" << std::endl;
float* transformPtr = (float*)transform.data_ptr();
//for (int row = 0; row < 4; row++) {
// std::cout << " ";
// for (int col = 0; col < 4; col++) {
// std::cout << transformPtr[row * 4 + col] << " ";
// }
// std::cout << std::endl;
//}
// NO transpose - try using row-major directly
// Let's copy the first 3 rows of the 4x4 matrix directly
optixInstance.transform[0] = transformPtr[0]; // Row 0, Col 0
optixInstance.transform[1] = transformPtr[1]; // Row 0, Col 1
optixInstance.transform[2] = transformPtr[2]; // Row 0, Col 2
optixInstance.transform[3] = transformPtr[3]; // Row 0, Col 3 (translation X)
optixInstance.transform[4] = transformPtr[4]; // Row 1, Col 0
optixInstance.transform[5] = transformPtr[5]; // Row 1, Col 1
optixInstance.transform[6] = transformPtr[6]; // Row 1, Col 2
optixInstance.transform[7] = transformPtr[7]; // Row 1, Col 3 (translation Y)
optixInstance.transform[8] = transformPtr[8]; // Row 2, Col 0
optixInstance.transform[9] = transformPtr[9]; // Row 2, Col 1
optixInstance.transform[10] = transformPtr[10]; // Row 2, Col 2
optixInstance.transform[11] = transformPtr[11]; // Row 2, Col 3 (translation Z)
// Print the OptiX transform matrix (column-major) for debugging
//std::cout << "Transform AFTER transpose for instance " << i << " (column-major):" << std::endl;
//std::cout << " [" << optixInstance.transform[0] << ", " << optixInstance.transform[4] << ", " << optixInstance.transform[8] << ", " << "translation X: " << optixInstance.transform[3] << "]" << std::endl;
//std::cout << " [" << optixInstance.transform[1] << ", " << optixInstance.transform[5] << ", " << optixInstance.transform[9] << ", " << "translation Y: " << optixInstance.transform[7] << "]" << std::endl;
//std::cout << " [" << optixInstance.transform[2] << ", " << optixInstance.transform[6] << ", " << optixInstance.transform[10] << ", " << "translation Z: " << optixInstance.transform[11] << "]" << std::endl;
// Set instance ID for hit group selection
optixInstance.instanceId = static_cast<unsigned int>(i);
// Set the SBT offset for material selection
// Use the instance index as SBT offset to select different materials
// Instance 0 = Cube = Blue, Instance 1 = Floor = Green
optixInstance.sbtOffset = static_cast<unsigned int>(i);
// Set the visibility mask (all visible by default)
optixInstance.visibilityMask = 255;
// Set the traversable handle to the geometry's GAS
optixInstance.traversableHandle = geomData->traversable;
// Set bitfield flags
optixInstance.flags = OPTIX_INSTANCE_FLAG_NONE;
}
// Check if we have any instances after filtering for visibility
if (optixInstances.empty()) {
std::cerr << "WARNING: No visible instances to render!" << std::endl;
// Set empty traversable
m_ias = 0;
return true;
}
// Debug output for instances
//std::cout << "Building IAS with " << optixInstances.size() << " visible instances" << std::endl;
// Create CUdeviceptr for the instance array - use consistent CUDA driver API
size_t instanceBufferSize = sizeof(OptixInstance) * optixInstances.size();
CUdeviceptr d_instances;
CUDA_DRIVER_CHECK(cuMemAllocAsync(&d_instances, instanceBufferSize, m_deviceContext->getStream()));
CUDA_DRIVER_CHECK(cuMemcpyHtoDAsync(
d_instances,
optixInstances.data(),
instanceBufferSize
, m_deviceContext->getStream()
));
// Set up the build input
OptixBuildInput buildInput = {};
buildInput.type = OPTIX_BUILD_INPUT_TYPE_INSTANCES;
buildInput.instanceArray.instances = d_instances;
buildInput.instanceArray.numInstances = static_cast<unsigned int>(optixInstances.size());
// Set up the acceleration structure build options
OptixAccelBuildOptions accelOptions = {};
accelOptions.buildFlags = OPTIX_BUILD_FLAG_ALLOW_UPDATE | OPTIX_BUILD_FLAG_ALLOW_COMPACTION
| OPTIX_BUILD_FLAG_PREFER_FAST_TRACE;
accelOptions.operation = OPTIX_BUILD_OPERATION_BUILD;
// Compute buffer sizes
OptixAccelBufferSizes bufferSizes;
OPTIX_CHECK(optixAccelComputeMemoryUsage(
m_deviceContext->getContext(),
&accelOptions,
&buildInput,
1, // one build input
&bufferSizes
));
// Allocate buffers - use consistent CUDA driver API
CUdeviceptr d_tempBuffer;
CUdeviceptr d_iasOutputBuffer;
CUDA_DRIVER_CHECK(cuMemAllocAsync(&d_tempBuffer, bufferSizes.tempSizeInBytes, m_deviceContext->getStream()));
CUDA_DRIVER_CHECK(cuMemAllocAsync(&d_iasOutputBuffer, bufferSizes.outputSizeInBytes, m_deviceContext->getStream()));
// Debug: Check valid input
//std::cout << "Before IAS build: Instances buffer is " << (d_instances ? "valid" : "null") << std::endl;
//std::cout << "Before IAS build: Temp buffer is " << (d_tempBuffer ? "valid" : "null") << std::endl;
//std::cout << "Before IAS build: Output buffer is " << (d_iasOutputBuffer ? "valid" : "null") << std::endl;
// Build the acceleration structure
OptixTraversableHandle tempHandle = 0;
OPTIX_CHECK(optixAccelBuild(
m_deviceContext->getContext(),
0, // CUDA stream
&accelOptions,
&buildInput,
1, // one build input
d_tempBuffer,
bufferSizes.tempSizeInBytes,
d_iasOutputBuffer,
bufferSizes.outputSizeInBytes,
&tempHandle, // output to temp variable first
nullptr, // no emitted properties
0 // no emitted properties
));
// Store the handle and debug output
m_ias = tempHandle;
//std::cout << "IAS build complete, traversable handle: " << m_ias << std::endl;
// Clean up temporary buffers - use CUDA driver API for consistency
CUDA_DRIVER_CHECK(cuMemFreeAsync(d_tempBuffer, m_deviceContext->getStream()));
// Store the buffers we need to keep - free old buffers with CUDA driver API
if (m_d_iasOutputBuffer) {
CUDA_DRIVER_CHECK(cuMemFreeAsync(m_d_iasOutputBuffer, m_deviceContext->getStream()));
}
if (m_d_instances) {
CUDA_DRIVER_CHECK(cuMemFreeAsync(m_d_instances, m_deviceContext->getStream()));
}
m_d_iasOutputBuffer = d_iasOutputBuffer;
m_d_instances = d_instances;
m_numInstances = optixInstances.size();
return true;
}
At the conclusion of the project, the final build, repository link, and a short final result video will be provided.