Shallow DoF for Streaming

A VMI experiment conducted in 2011 demonstrated that video images with a shallow depth of field required around 10% less bandwidth than those with a large depth of field, which is good for streaming.

We wanted to test this theory and this article explores this idea and concludes that you can save around 10% bandwidth by shooting with a large sensor camera compared with a small sensor camera, as the shallow depth of field means that you capture only the bits that you want in fine detail and as background is defocused.

https://youtu.be/ebO0XIMHF60

The Power of Pictures

I first saw true HD pictures years ago at IBC.  OpTex had a camera focusing on some beautifully lit fruit. The image on the screen was extraordinary, hypnotising.  Leap forward a decade and Amsterdam brought Europe its first taste of Ultra High Definition Television – not 720p, not 1080i, but 4320p.  The enveloping cinematic pictures and the embrace of 22.2 channel sound were overwhelming.  My colleague was quite emotional, and for me, despite being in this high-tech business bazaar, this was a memorable moment.  This is the power of television.

To date no streamed video has yet had the same impact on me.  The limitations of bandwidth mean that today, video over IP is a strangled, muffled affair.

To give it credit, streaming is in its infancy.  H.264 was standardised in only 2003.  YouTube only began in 2005.  The iPlayer’s history goes back less than 3 years.  And on reflection, these services have made major leaps in that time, rolling out 720p and 1080p, adaptive streaming, 16:9 and even 3D.  So maybe my dissatisfaction is a typical 21st century need for instant gratification.

Sadly the video production community can’t do much to change the country’s broadband infrastructure.  Nor do I know of any magic codec that will be able to deliver real HDTV at 1mb/s. 

My old house is a squash and a squeeze

It’s an inescapable fact that video production creates large data files, even if modern cameras squeeze the data as much as broadcasters will allow.  Camera compressors essentially save one frame – the key – and intelligently encode information that changes from frame to frame, rather than capturing all the information from every frame.  When confronted with fast edits, complex and changing backgrounds, moving cameras, every pixel of these intermediate frames must also be saved. But for a talking head, only a few pixels will change, so the file will compress well.  Failing to understand compression has certainly filled the Internet with videos that are nothing more than a pixellated mess.

Production specialists have developed a number of effective ways of shooting that produce  streaming-friendly files.  The wise Christina Fox (www.urbanfox.tv) writes on her website:  “For minimal file size, the ideal subject is probably a talking head against a bland background in a noise free studio.”  She advises that the rich, vibrant fast action scenes a television broadcaster can get away with are “enough to make a codec melt.” By taking Christina’s advice, the camera can be coaxed to capture more of the important data – the subject – and less of the backdrop.  The effect is to create videos where the focus of the film is a sharp, strong and well defined image that stands out clearly against a plain background.  With the codec untroubled by irrelevant detail behind the subject, there will be more ‘bits’ for the all-important subject of the video.

A camera revolution… for the Internet?

Perhaps we can now go a step further…

Large sensor camcorders have caused an upheaval in the video, TV and film worlds.  The Canon 5D, ARRI’s Alexa, Sony’s F3 and Panasonic’s AF-101 are all changing the way programme makers approach their work.  Camera operators are able to use the shallow depth of field and low light performance of these models to create a stunning cinematic look that had traditionally been reserved for 35mm film cameras and the most exclusive digital models.  Some will argue that this technique is becoming over-used and others would contend that maximising the ‘bokeh’ of a scene is alien to how the eye sees.  However, it is widely adopted both as an artistic approach and as a way of fixing the viewer’s attention on the subject matter for everything from news interviews to natural history films.

So what has this got to do with streaming?  “We’ve had this notion for a while now, that the shallow depth of field that these new cameras can produce may just be what streaming has been waiting for.  A tight focus on the subject with the other elements of the frame blurred should, in theory, produce beautiful, smooth-playing files.”  Argues Barry Bassett, MD of VMI Camera Hire.

Richard Payne, camera specialist with Holdan concurs: “We’ve seen lots of examples of great films being posted on the Internet using the 101 and the Alexa.  But we were keen to find out whether sensor size makes a tangible difference when it comes to streaming video.  Or is the picture quality improvement actually down to the production values of the shoot and superior encoding?”

Testing Times

In controlled conditions selected to mimic a typical chaotic backdrop (and to escape the blustery weather), the tests were performed: two cameras, fixed to tripods, recording the same format, to the same solid state field recorder.  The Panasonic AG-AF101’s 4/3rds MOS (an area of 225mm2) sensor is compared against the Panasonic AJ-HPX-171E’s 1/3″ CCD (an area of 17.3mm2).  The AF101 was equipped first with an Angenieux 15-40 Optimo Zoom T2.6 and then a Voigtlander 25mm f0.95 Nokton Micro Four Thirds Lens.  To ensure a level playing field, a Panasonic HP-G20 HD Recorder was used to capture the images in the same AVC-Intra format from the cameras’ SDI port.

I should note that while my attention was on the test, the camera operators around me seemed utterly transfixed by the images that the AF101 captured through the Voigtlander at f0.95.  “Breathtaking”, “stunning”, “cinematic” – yet again, the power of a beautiful image is always distracting.

Back to the experiment, the cameras shot our scene (a semi-professional juggler), each camera using an open iris.  The images were fed via HD-SDI to a Panasonic 17″ HD studio LCD monitor. 

Out of the Camera

On screen the background of the images from the HPX-171 were relatively sharp; using the Angenieux, the AF101 achieved wonderfully rich pictures with a modest blurring of the backdrop; pictures from the AF101 paired with the Voigtlander were multi-dimensional, sharp and with a totally defocused background.  A longer lens would probably have been a better choice for the test – a Zeiss 50mm ZF F1.4 would have created an even greater blur behind the subject and at 50mm the blurring achieved by the Angenieux would have been more pronounced.

Predictions

If our eyes were not deceiving us, the HPX-171 would have been capturing far more information across the whole frame – a sharp subject and a largely in-focus background.  The codec would have been working hard.  The blurred scene behind the juggler, produced by the AF101 with the Angenieux, should have allowed the codec to render the focused subject more accurately.  The filmic look from the AF-101/Voigtlander should have had an even more telling effect on the codec, creating the best-looking compressed pictures of all.  To find out how the full resolution files would cope under different levels of compression, Rhozet’s Carbon Coder was called into action to render the files as MP4.  The files were encoded at 2.5Mbs (around YouTube’s peak setting) and at 5Mbs.  While the smaller files were uploaded to YouTube, the larger files were hosted by the professional streaming outfit, Sharpstream.  The data was also encoded with variable bitrate and the files compared.

The Reality

First things first: compressing the videos with a variable bitrate showed that on average the AF101’s shallower depth of field reduced file sizes by 10%.  This is clearly good news for streaming over public networks.

Low Compression – High Quality

All the higher quality files residing on the Sharpstream servers were fairly faithful to the originals in their lack of pixellation, handling of shadows and colours.  While the movement captured was even and judder-free, there was clear blurring of the juggling balls.  Although not as pleasant overall, YouTube’s 1080p setting shows the same result.  These lower levels of compression seem to have coped well with video from all three camera set-ups.

High Compression – Low Quality

This is where things start to become interesting.  At YouTube’s 240p settings, banding and pixellation become significant factors.  Under any form of scrutiny, with the camera set to shoot a wide angle, none of the three are acceptable on anything larger than a smart phone screen. 

However, the close-up shots that emphasise the difference in the depth of field begin to tell a story.  Let’s state, for the avoidance of doubt, that the pictures at this level are both horrible: blocky and smeared.  Putting this to one side, it is clear that the shallowest focus image delivers better compressed pictures.

1. AF-101 at T.95

2. AF-101 at T2.4

3. HPX-171 at T2.4

______________________________________________________________________________

4. AF-101 at T.95

5. AF-101 at T2.4

6. HPX-171 at T2.4

The stripes on the shirt fall apart with the HPX-171’s small sensor: jagged, blotchy and unnatural.  The AF-101’s chip captures a more realistic image that does not distract from the main attraction on screen.

______________________________________________________________________________

7. AF-101 at T.95

8. AF-101 at T2.4

9. HPX-171 at T2.4

__________________________________________________

Close up shots Video

https://youtu.be/mpudwB7eMRU
https://youtu.be/ebO0XIMHF60
https://youtu.be/k5bvdj-fsb0

Wide Shots

https://youtu.be/DrM3lbRifgI
https://youtu.be/DLUgCfuQUIw
https://youtu.be/gDu3NBn0Jlo

To achieve this advantage, the compressor has been able to deal with the periphery of the 4/3rds camera’s image in a more efficient way.  The highly defocused background image from the camera has not tested the codec.  It has in fact allowed it to focus on the subject.  To put it simply, it has put the bits where the viewer wants them.  Conversely, the 171’s greater depth of field has forced the codec to render the background’s unnecessary detail.  This has been achieved by balancing foreground and background detail equally: too much irrelevant image data at the expense of the data we really need.

Conclusion

In challenging conditions, depth of field makes a marked difference to the watchability of compressed video.  Had the rain not fallen on the day of the test, the crew would have relocated to a park for shooting in front of moving greenery.  We can confidently expect, on the basis of the tests we ran, that the large sensor’s advantages would have been even greater in these more challenging conditions.  An unpredictable, moving background makes camera sensors run hot and creates highly complex data files that codecs cannot intelligently compress.  A very shallow depth of field and heavily blurred greenery would clearly be a considerable advantage when the files were compressed.

TV and video production is all about compromise – size, cost and quality issues are daily considerations.  For every requirement there is the perfect solution as well as the clever workaround.  In video streaming, the perfect answer would be to shoot as you would like to according to creative needs, before delivering a 10Mbs programme to the viewers.  The reality is that the country’s rather iffy broadband infrastructure and strangled mobile phone networks mean 1-2Mbs delivery more realistic.  Producing video for the web therefore requires programme makers to adapt their shooting style, their environment and the level of on-scren action.  And for the highest quality, large sensor cameras with a very fast lens has got to be a consideration.

Until fibre runs beneath our streets, it’s unlikely that video streaming will draw the emotional response of an Ultra High Definition broadcast.  But step by step, the production world can explore techniques and technologies to give the viewers the TV-like experience they are starting to expect.

This is summarised as a saving of around 10% of bandwidth by having a shallow depth of field but this is very largely dependent on just how shallow the depth to field is and how busy the background is!

Related articles

No Comments

Leave a Reply

VMI are proud sponsors of: