For the most part, our effort has been focused on single exposure image enhancement, however we are beginning to use recurrent models to improve quality when video information is available.
Nonetheless, it's kinda a neat idea, so I tried testing the feasibility of it. I set up a recent flagship phone that claims to have 960fps super-slow-motion video capture next to another phone with a strobe app at 12Hz with a short delay in between pulses.
There are definitely a few frames where the LED is at an intermediate brightness, however teasing out the exact timings between the flash and the camera may prove to be difficult to correctly synchronize.
As for over-saturated images having more signal... although the PSNR calculation may give you a better number, in practice, a region that is over-saturated is just a blob of 1s on the image (assuming float64 pixel values of 0-1) and there is no information there to extract. With a black level near but not at 0, we've found there is often more information hidden in the 'dark noise' than can be discerned by the human eye alone.
Wow, cool, you actually tested it! And an effective test too.
Stepping back and forth throughout the frames (using mpv), the flash clearly enhances several spots of localized brightness where contrast pops out into clear relief.
The effect is clearest at the very bottom of the image which goes from "shadow blob" to "adequately discernible", but I think the area just above that (the 3rd vertical quarter of the image) is most interesting; the detail visible in frames 24-29 (immediately before 00:00:01 / 30.030fps) is excellent, and that's with the flash LED at peak brightness.
Flash synchronization would be effectively impossible to achieve (the camera would need to stream LED status information inside each frame), but achieving such synchronization may provide no net gain, even with "LED is on" information available, both because the exact point the hardware says "LED is off" will not necessarily correspond to the exact moment in time the light decays to zero (based on 1/960 = 1.0416 milliseconds per frame, the video suggests it takes apparently 2 frames or ~2.08 milliseconds for the light to decay), which will never be the same as the flash sends light outwards into arbitrarily different environments. I can't help but wonder if calibration references for everything from Vantablack to mirrors would be needed... for each camera sensor... and that there would then be the problem of figuring out which reference(s?) to select.
Staring at the video frames some more, two ideas come to mind: 1), analyzing all the frames to identify areas of significant difference in brightness, then 2), for each (perhaps nonrectangular) region of difference, figuring out the "best" source reference for that specific region. As an example reference, I'd generally use frame 13 for most of the image, and frame 44 or so (out of many, many possible candidates) for the bits that, as you say, become float64 1.00 :). Obviously a nontrivial amount of normalization would then be needed.
I'm not aware of how you'd do either of these neurally :) but the idea for (1) came from https://en.wikipedia.org/wiki/Seam_carving (although just basic edge detection may be more correct for this scenario), while the idea for (2) came from
https://github.com/google/butteraugli which "estimates the psychovisual similarity of two images"; perhaps there's something out there that can identify "best contrast"? I'm not sure.
Trivial aside: I wondered why mpv kept saying "Inserting rotation filter." and also why the frame numbers appeared sideways. Then I realized the video has rotation metadata in it, presumably so the device doesn't need to do landscape-to-portrait frame buffering at 960fps (heh). I then realized the left-to-right rolling shutter effect I was seeing was actually a bottom-to-top rolling shutter. I... think that's unusual? I'm curious - after Googling then reading (or, more accurately, digging signal out of) https://www.androidauthority.com/real-960fps-super-slow-moti... - was the device an Xperia 1?
(And just to write it down for future reference: --vf 'drawtext=fontcolor=white:fontsize=100:text="%{n}"' adds frame numbers to mpv. Yay.)
Nonetheless, it's kinda a neat idea, so I tried testing the feasibility of it. I set up a recent flagship phone that claims to have 960fps super-slow-motion video capture next to another phone with a strobe app at 12Hz with a short delay in between pulses.
https://www.dropbox.com/s/ha51ntucl3klkcb/cell_flash_960fps....
There are definitely a few frames where the LED is at an intermediate brightness, however teasing out the exact timings between the flash and the camera may prove to be difficult to correctly synchronize.
As for over-saturated images having more signal... although the PSNR calculation may give you a better number, in practice, a region that is over-saturated is just a blob of 1s on the image (assuming float64 pixel values of 0-1) and there is no information there to extract. With a black level near but not at 0, we've found there is often more information hidden in the 'dark noise' than can be discerned by the human eye alone.