Projecting Trackable Thermal Patterns for Dynamic Computer Vision
(CVPR '24)
Mark Sheinin, Aswin Sankaranarayanan, and Srinivasa Narasimhan
The problem: Pattern recognition requires, well, *patterns*
It is no coincidence that "Pattern Recognition" is in the name of the most prominent computer vision conference (i.e., CVPR). This is because the ability to detect and recognize the same pattern from different views is fundamental for dynamic vision tasks, namely for tasks where the observer (i.e., the camera) must navigate and map in a given environment. But what if the environment doesn't have enough good texture to facilitate this task?
Sure, you can always go and find a better room for your Nerf/Gaussian-Splatting demo, but a navigating/mapping robot must operate in any given environment, including navigating long dark roads at night, inside pipelines and long featureless hallways, and scanning textureless objects.
​
​
​
​
​
​
​
But what if we could give our robots the ability to 'paint' auxiliary patterns on textureless environment regions to improve their navigation and mapping?
The solution: Painting light with light
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
But wait, don't heat patterns evaporate with time?
Indeed, they do, and that's actually a problem because it means that a surface pattern imaged now will appear differently when imaged in future frames. This brightness inconsistency is incompatible with various off-the-shelf dynamic vision algorithms and may degrade their performance (as shown below). So, how do we fix this problem?
Answer: We train a neural network to reverse the brightness inconsistency, as explained next.
​
A frame taken at time t+dt differs from a frame taken at time t in two ways: (a) First, points existing at time t have undergone heat diffusion, making them smoother and dimmer frame t+dt, and (b) a bunch of new points were projected on the surface showing up in frame t+dt but not existing in frame t. So, our network performs two functions: it takes the later frame t+dt as input, removes the newly added points appearing only in frame t+dt, and undiffuses the existing points in frame t+dt to match their appearance to frame t. Our network also returns the newly added points, which is useful for tracking initialization (details in the paper).
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
The effect of the UNET correction can be seen in the example below.
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
environments and objects lacking 'good' texture for feature extraction and matching
generated with deepai.org
​
Well, we can not equip our robots with buckets of paint. But we can use something very similar to a device that already exists on many of our robots -- like autonomous cars -- and that is the Lidar. Lidars measure depth by scanning a laser at surface environments. But when a laser hits a surface, some light energy gets absorbed by the material, causing a slight increase in the material's surface temperature. While invisible to the naked eye, this increase in surface temperature can be seen by a thermal camera, which images in the infrared domain. So, a robot can use the laser to paint its own tailor-made heat patterns, which it can use to facilitate navigation and mapping, while 'seeing' the pattern with a thermal camera.
imaging system schematic
rotating object
with pattern projection
our laser 'paints' a dot pattern that 'sticks' to the object's surface
we track the heat points to generate point matches between frames
we manually feed the point matches to COLMAP to generate the camera motion and a sparse 3D model of the object
time t
time t+1 (no new points)
heat diffusion causes the surface points to appear smoother and dimmer
time t+1 (with new points)
the system projects new heat points
to constantly reinforce the heat pattern
frame at time t
UNET
corrected frame at time t+1
frame at time t+1
- point existing in both frames.
- new point not existing in frame t.
scene object
thermal video with pattern projection
notice how the heat points rapidly diffuse after projection
without correction
with correction
COLMAP output without heat diffusion correcting
COLMAP out with heat diffusion correction
3D shape without heat diffusion correction
3D shape with heat diffusion correction
Additional applications
Our system is useful for various dynamic vision tasks like:
Object tracking
texturless planer object
thermal pattern projection and tracking
superimposed a picture
on the plane’s surface
Optical flow
texturless planer object (rotating)
thermal pattern projection
(no explicit point tracking)
optical flow computed between consecutive frame pairs
Indoor localization
we put the system on a cart, directed at the floor using a mirror. Then, we make a loop around the office desks on the left.
thermal pattern projection and tracking
recovered camera (cart) trajectory
Great, now tell me when it doesn't work (limitations)
Our method relies on the laser's ability to heat up surface spots rapidly. However, some materials are not amenable to such an operation. To heat up a surface point 'well,' the surface material must have a low albedo in the laser's wavelength, high emissivity, and low thermal conductivity (so that the absorbed heat does not diffuse too quickly). Therefore, materials such as glass and metals will show degraded performance for our method. Compared to visible-light SfM, our patterns can not facilitate loop closure using temporally distant frames because all the patterns evaporate entirely after some time duration.
BibTex
@inproceedings{Sheinin:2024,
title = {Projecting Trackable Thermal Patterns for Dynamic Computer Vision},
author={Sheinin, Mark and Sankaranarayanan, Aswin and Narasimhan, Srinivasa G.},
booktitle={Proc. IEEE/CVF CVPR},
year={2024},
}