An ALTernative (to the depth sensing and 3D object scanning) vital signs, sleep, and motor activity monitoring application for Intel RealSense cameras.
Principles of the ALT technology operation and its example implementation using Microsoft Kinect, Raspberry Pi, and Pi NoIR camera are described here. Below, we describe some results obtained using ALT systems based on different Intel RealSense cameras and also outline some features and alternative implementations of the ALT tech which were not covered before.
Intel RealSense technology [1] enables depth perception, 3D imaging, and feature tracking for virtual reality, robotics, and other applications. Light emitters of RealSense cameras can be used [2] as light source elements for the artificial light texture (ALT) technology [2, 3, 4, 5, 6]. ALT allows obtaining heart rate and respiration rate data for a person in a non-contact fashion. Skin of the person does not have to be exposed to a camera in order for ALT to work [2-6]. ALT does not use depth data to obtain the vital signs information [2-6].
ALT can use RealSense cameras themselves or use another camera such as, for example, a Raspberry Pi NoIR camera [7] for video stream capture [2-6]. The figures below show ALT data obtained using infrared (IR) video streams of Intel RealSense R200 [8] and F200 [9] cameras, and video stream of a Raspberry Pi NoIR camera. R200 and F200 cameras used their own light projectors for ALT, and R200 camera’s projector was used with the Pi NoIR camera.
Note that R200 and F200 are the early generation RealSense cameras and seem to be no longer manufactured by Intel as of Dec. 2021 (Intel seems to have stopped their production long before Dec. 2021). ALT can work with different types of static light patterns generated by various devices such as Microsoft Kinect [2-4, 10] and Intel RealSense R200, D415, D435, D435i, and D455 cameras [5, 8, 16; also, see below]. ALT can also work with dynamically projected patterns such as, e.g., the ones generated by the F200 camera [see below; also, see reference 5].
As we have discussed before [10], one of the possible implementations of the ALT technology includes obtaining ‘Sum of Absolute Differences’ (SAD) [11] values generated by a video encoder for the video frames captured by a video camera [12].
Alternatively to using data generated by a video encoder, calculation of the SAD values can be incorporated into the video data processing part of the ALT software in other ways:
As a possible implementation, SAD-generating code can include iterating over pixels of a given captured video frame, for each pixel of the video frame calculating a difference between a numeric value of a certain kind associated with that pixel in the video frame data (e.g. the value corresponding to the pixel’s grayscale level) and numeric value of the same kind associated with the corresponding pixel of another captured video frame (for example, two pixels belonging to different video frames can be designated as corresponding to one another if pixel row and pixel column numbers for these pixels are the same), calculating absolute value of the found difference, and adding the calculated absolute value to the sum of the absolute values calculated on the previous step of the iteration process. The sum of the absolute difference values (the SAD value) thus calculated for a given video frame is analogous to the sSAD value obtained from the data generated by a video encoder [see 10, 12].
The video linked below demonstrates real-time non-contact heartbeats and respiration monitoring using an ALT system which includes an Intel RealSense D415 camera and a Microsoft Surface tablet/laptop. The plot in the video shows raw real-time ALT data calculated according to the implementation described above. The heartbeats signal amplitude during the inhale/exhale is smaller compared to that in between the respiration cycles. One can say that respiration modulates the heartbeats signal amplitude.
The SAD value calculated for a captured video frame is a simple metric of the similarity between that video frame and another video frame (called the ‘reference’ video frame) which data were used in the calculation of the SAD value. The SAD value is the “Manhattan distance” [13] between the two video frames calculated using numeric values associated with the video frame pixels.
Similarly, SAD value generated by a video encoder (e.g H.264 one) for a macroblock [14] of a video frame, which we have used in another implementation of the ALT technology [10, 12], is a measure of similarity between the macroblock and the corresponding macroblock of another video frame (the ‘reference’ video frame), the "Manhattan distance" [13] between these two mackroblocks.
Similarly to the SAD value calculation procedure described above, SAD value generated by a video encoder for a macroblock of a video frame can be obtained by calculating for each pixel of the macroblock the absolute difference between a numeric value associated with the pixel and a numeric value associated with a pixel of the corresponding macroblock of the reference video frame, and finding a sum of these absolute difference values.
For an ALT implementation, the numeric value associated with a pixel of a video frame can be taken directly from the video frame data for the pixel (e.g. pixel’s grayscale level) or obtained as a result of calculations using one or more of the video frame data values for the pixel and/or other pixels of the video frame (e.g. an average of the grayscale level values for the pixel and all of its neighboring pixels in the video frame).
Note that the corresponding macroblocks can generally have different position within the video frames. Two pixels used in the absolute value calculation can have different position within the video frames too in an ALT implementation regardless of whether it uses data generated by a video encoder or not.
Although for any captured video frame we typically use the one immediately preceding it as the ‘reference’ video frame in the SAD values calculations, nothing precludes us from using any one of the captured video frames as the reference video frame for any other captured video frame for the ALT data generation purposes.
Respiration, heartbeats and/or other movements of a person’s body cause additional variations of the “Manhattan distance” between the consecutive video frames compared to the case when there are no body movements (and/or body -caused movements) in the scene captured by the video camera. Thus, the calculated SAD values contain information about the respiration and/or heartbeats and/or other movements of a person over the time period covered by the captured video frames.
As we have discussed earlier [10], application of the ‘artificial light texture’ to the person’s body and/or to the objects surrounding the person can largely increase illumination contrast in the scene observed by a video camera, and can lead to significant (e.g., orders of magnitude) enhancement of the variations in the “Manhattan distance” between the video frames which (the variations) are associated with the respiration and/or heartbeats of the person compared to the case when the ‘artificial light texture’ is absent (e.g. when the ALT-generating light emitter is switched off) and otherwise identical data collection and data processing steps are performed.
Provided that video frames are captured at equal time intervals, the SAD value calculated for a video frame can be viewed as proportional to the integral rate of change of the numeric values which are associated with the video frame pixels and used in the SAD value calculations.
The raw ALT data are shown in the figures below by black lines connecting the data points (the calculated SAD values for the captured video frames shown by black dots). The orange and blue lines are 0.2 s and 1 s long (in the equivalent number of data samples) moving averages of the raw data shown to highlight the heartbeat and respiration processes captured in the ALT data, respectively. Numeric values for the heart rate and respiration rate can be obtained, for example, via Fourier data analysis [15, 10].
Note that, similarly to the “Pi Camera + Kinect” ALT system [10], the distance between the F200 or R200 camera and the person can affect how pronounced the heartbeat signal will be during the respiration events (e.g. inhale/exhale sequence). Generally, the closer the camera gets to the person the less pronounced the heartbeat signal component in the ALT data becomes during respiration events. Note also that at a large enough distance between the camera and the person there can be virtually no discernable pulse or respiration signal in the ALT data. Adjustments of the camera’s position can be made, for example, based on observing visualizations of the collected ALT data in real time.
Snapshots of the scene captured by the cameras are shown to the left of the corresponding data plots.
Note that the ALT software was executing the same video frames processing algorithm described above to generate the data both for F200 and for R200 cameras (Figures 1 and 2)!
Figure 1. ALT data obtained using the light emitter and IR video stream of an R200 Intel RealSense camera. A snapshot of the scene taken from the R200 IR video stream is shown on the left. A person is sitting in an armchair at about 3 feet distance from the R200 camera.
Figure 2. ALT data obtained using the light emitter and IR video stream of an F200 Intel RealSense camera. A snapshot of the scene taken from the F200 IR video stream is shown on the left. A person is sitting on a chair at about 3 feet distance from the F200 camera.
In the case of the dynamically projected patterns, as we have shown above on the example of an F200 Intel RealSence device, body movements, including the ones associated with the heartbeats and respiration, lead to the changes in the non-uniform illumination distribution of the scene created by the light emitter of the F200 device, as captured by the infrared camera of the F200 device (the captured non-uniform illumination distribution forms the ‘artificial light texture’), which otherwise would have been absent provided the absence of any motion in the scene.
Figure 3. ALT data obtained using the light emitter of an R200 Intel RealSense camera and the video stream of a Raspberry Pi NoIR camera. A view of the scene taken from the Pi NoIR camera’s video stream is shown on the left. A person is resting in an armchair at about 3 feet distance from the Pi NoIR and R200 cameras. R200 camera’s emitter provided most of the illumination for the scene. We have used the code [12] from our previous “Pi Camera + Kinect” example [10] to generate the ALT data shown in this Figure.
As the ALT technology implementations described above and before [10] demonstrate, the ALT technology does not rely on any particular kind of light pattern (statically and/or dynamically projected). The ALT technology also does not use depth information encoded in the light patterns projected by the depth sensing devices such as Kinect or RealSense cameras in order to obtain vital signs information.
References (NOTE: some links can become outdated):
-
http://www.intel.com/content/www/us/en/architecture-and-technology/realsense-overview.html
-
“Non-contact real-time monitoring of heart and respiration rates using Artificial Light Texture", https://www.linkedin.com/pulse/use-artificial-light-texture-non-contact-real-time-heart-misharin
-
“What has happened while we slept?”, https://www.linkedin.com/pulse/what-has-happened-while-we-slept-alexander-misharin
-
“When your heart beats”, https://www.linkedin.com/pulse/when-your-heart-beats-alexander-misharin
-
“ALT pulse and respiration monitoring using Intel RealSense cameras", https://www.linkedin.com/pulse/alt-pulse-respiration-monitoring-using-intel-cameras-misharin
-
“The Artificial Light Texture (ALT) technology for vital signs (heartbeats, respiration), sleep, and motor activity monitoring”, https://github.com/lvetech/ALT
-
“Introducing the Intel® RealSense™ R200 Camera (world facing)” https://software.intel.com/en-us/articles/realsense-r200-camera
-
“Can Your Webcam Do This? - Exploring the Intel® RealSense™ 3D Camera (F200)” https://software.intel.com/en-us/blogs/2015/01/26/can-your-webcam-do-this
-
"Sum of absolute differences", https://en.wikipedia.org/wiki/Sum_of_absolute_differences
-
https://github.com/lvetech/ALT/blob/master/code/simple-ALT-raw.py
-
“Taxicab geometry”, https://en.wikipedia.org/wiki/Taxicab_geometry
-
“Macroblock”, https://en.wikipedia.org/wiki/Macroblock
-
“Short-time Fourier transform” https://en.wikipedia.org/wiki/Short-time_Fourier_transform
ALT by Alexander Misharin, LVE Technologies LLC is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. The full text of the CC BY-NC-SA 4.0 license can be found at https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode. Contact LVE Technologies LLC if you would like to obtain a commercial license: info(at)lvetechnologies.com