Knect365 is part of the Knowledge and Networking Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 3099067.

Informa

The realities of making VR and AR streaming a reality

Adrian Pennignton360 video is a stepping stone to the merger of physical and digital worlds, the virtual and real. All it takes is a lot of processing power and bandwidth, writes Adrian Pennington

From a technology and business standpoint, live 360-degree video streaming remains one of the acute challenges on the path to a fully immersive, premium TV experience on head-mounted displays (HMD) and flat screens. This is nothing new. Delivering high-resolution video (whether 4K today, 8K tomorrow, or 12K in future) on a mass scale has been an industry struggle for years.

HMDs are a long way from penetrating the mass market, production equipment and workflows are yet to be standardised, editorial grammar is a work in progress and budgets for content emanate largely from marketing.

“2017 will be the year where immersive VR will disappear as 3D did a few years ago, or it will be adopted by fans,” forecasts Carlo DeMarchis, chief product and marketing officer at sports media services company Deltatre.

Unleashing 360-degree video hinges on the ability to monetize it toward the widest audience. “Given the exorbitant cost involved with producing high-res content, this cannot be practically achieved by implementing legacy adaptive bitrate technologies,” says Alain Nochimowski, EVP of Innovation, Viaccess-Orca.

As an example, standard DASH-based streaming of 4K-captured 360-degree video content not only results in much lower resolutions within a user’s field of view, it may require up to 20Mbps bandwidth. “Who would want to watch a 90-minute soccer game in VR 360 if the ball can barely be seen?” poses Nochimowski.

The technical challenges associated with VR video are not trivial. Most people agree that frame rates have to be higher, colour needs to be 10bit (HDR), compression artefacts need reducing and resolution has to be greater.

The lowest usable quality VR video is streamed at 1440p/30fps, needing at least a 10Mbps ‘up’ from the event and ideally 6Mbps download to the consumer’s device. For live streams with multiple camera rigs and points of view, simply multiply the 'up' requirement by the number of rigs. This can quickly turn into a very high bandwidth requirement.

“Data and bandwidth constraints are substantial,” admits DeMarchis. “However, we can expect this to be solved in the medium term as compression techniques and network topologies advance.”

Several sports and sports broadcasters have partnered with VR production specialists to trial and commercialise 360 live events. Deltatre is using Nokia’s OZO VR system for clients including UEFA; Fox Sports is hosting live VR streams of US Open Golf, NASCAR and more with NextVR. Sky has focused mostly on VOD sports content such as coverage of boxer Anthony Joshua’s bouts but has a stake in VR developer Jaunt.

Some technical partners offer an end-to-end VR platform, which critics might call proprietary and therefore risks being locked-in. Others prefer to offer best-of-breed tools. In either case, the most crucial aspect is the encoding solution that optimises bandwidth.

Viaccess-Orca’s Virtual Arena uses tiling technology to increase bandwidth efficiency while “significantly improving streamed video resolution in the user’s field of view,” Nochimowski says. “What’s more, it paves the way to video monetisation through advanced content protection and video/advertising analytics.”

VR specialist Focal Point VR tested live 360 of Champions Tennis from the Royal Albert Hall in December. It streams 6K VR video from Blackmagic Design cameras using its technology that packs the stream into a standard 4K video format.

“Packing allows us to have native effective 6K resolution in the main areas of interest such as the field of play while areas of less interest – such as the view behind the VR viewer’s head – receives fewer pixels,” explains head of production Paul James.

This requires an up connection of 18Mbps with the viewer needing better than 10Mbps down (and ideally close to 18Mbps) to receive the full resolution. “Below that we transcode the stream so viewers can still enjoy the VR stream down to around 2Mbps,” explains FPVR CEO Jonathan Newth.

By the end of this year end FPVR aims to support up to 16K 360 streaming (“close to retina resolution”) with down speeds of less than 20Mbps. “Without aggressive optimisation this would require greater than 300Mbps which is generally not available and certainly not commercially viable,” says Newth.

Production issues

Rapidly evolving technology, although essential to 360-degree production, can make editorial and planning very difficult.

Most cameras are either not designed to be used for VR (such as any GoPro rig), or are prototypes (therefore unreliable), or very high-end and out of the financial reach of most filmmakers.

“Fundamentally, the tools necessary to produce a live virtual reality video are provided by different companies meaning there is no common workflow,” says Newth.

Stitching software, for example, is essential to VR production: standalone software, plugins and hacks are available from many different providers, all competing for market share. These tools and techniques need to work in real-time to be useful for live streamed VR.

Producers also remain unsure about the importance of 3D (stereo) VR video. “Many have avoided it because of its high production overhead, others because it is hard to do well,” suggests Newth, although FPVR believe stereo 360 essential to the quality of the experience.

Who is in control?

The jury’s out on whether audiences prefer live VR as a more familiar directed (editorialised) experience or to control the gaze themselves. Sky and deltatre are separately exploring a solution in which user selectable, directed experiences, 360 video and traditional feeds are available within the same app.

Others think live VR appeals to millennials keen for greater control over the experience. FPVR enables real time, gaze control camera selection and viewer selectable resolution/bitrates to allow for bandwidth variation.

“You lose some advantages of editing and action replay and the ability to zoom in and out of a picture,” says Newth. “But what you get in return is a feeling of authenticity and presence.”

Other storytelling difficulties include avoiding motion sickness for viewers watching fast action. Putting VR cameras on F1 cars as they corner at 150mph was flagged up by Sky’s tests.

“Each event has its own set of specificities that need to be reflected in the production work,” says Nochimowski. “In other words, the production rules for basketball differ greatly from soccer.”

VR offers great intimacy, but rig positions at football or NFL venues suffer from being a long way from the action. Basketball can be much more effective because of proximity to the court. Indeed, the NBA has made weekly live paid VR streams available via association’s League Pass program, partnered with NextVR.

“Beware of placing a rig courtside and another behind the net because of the jump in a viewer’s eyeline when viewing angle is switched,” warns DeMarchis. He advises insertion of a blank frame when swapping between camera angles of different heights “to tell the brain to expect a change”.

An issue directly related to getting return feeds from the camera back out to the cloud is that the majority of camera positions are cabled, therefore limited in movement. No robust wireless live solution for VR has been invented.

The weight of the current crop of HMDs is also problematic and a contributing factor in the preference for videos of two to five minutes. As designs develop as it is anticipated to and VR gear becomes as light and user friendly as a mobile phone, users are more likely to want to spend longer in the experience.

Toward mixed reality

The ability to add contextual overlays (stats, advertising) on top of 360 video is just a glimpse at what an AR-enabled experience could look like.

There are two distinct flavours of AR: mobile/tablets and the more immersive experience of headsets such as Microsoft Hololens. The latter have a very limited installed base and substantially lower resolutions than their VR counterparts, making them unsuitable to video based applications as it stands, according to Newth. The mobile/tablet form of AR, typified by Pokemon GO, doesn't naturally support live VR video which is best as an immersive experience.

Closely related to AR is Mixed Reality (MR) which merges real and virtual worlds to produce new environments and visualizations. Ultimately, physical and digital objects will interact in MR in real time.

“Imagine a VR live stream of the Hollywood premier of Avatar 2,” says Newth. “The remote viewer has the chance to get close to the stars with the perfect red carpet view, but we could also add CGI characters from the movie into the scene, sharing the space with the flesh and blood stars.”

Integrating high-quality CGI to video in real time was pioneered by mega-budget feature films like Avatar as an aid to production. Similar technology has now made its way into live TV.

The X Factor producer Fremantle Media and Norway’s The Future Group (TFG) have developed entertainment format Lost in Time which combines live action filmed on a green screen with audience participation and real-time graphics.

By genlocking live studio footage with virtual images rendered in Epic’s Unreal games engine, viewers at home can participate in the game alongside contestants, using mobile and VR apps, live.

Most of the 60-minutes of each show being aired in Norway from March is pre-recorded, but through the use of the companion app viewers can interact with live elements incorporated into the broadcast.

“Nothing has failed yet but we decided to remove one element of risk which is live production,” says Bård Anders Kasin, co-founder, TFG. “However, this is possible and will likely happen from season two.”

VR/AR/MR is rapidly moving toward a world where the boundary between the digital and the physical is eroding. The next step is to add human-like senses and artificial intelligence to VR headsets to unlock even more immersive applications.

Later this year, Intel will debut consumer technology that does just that. Project Alloy is a wearable computer that features an Intel seventh generation Core processor, a fisheye lens, two RealSense cameras and other body worn sensors.

The system will be linked to a VR rig developed by Voke, a live VR specialist Intel acquired in November. Voke’s rigs are typically loaded with 12 (or more) cameras that capture 360 video and, crucially, information about scene depth. Intel says the system processes 40 to 50GB/s of data from an event to the viewer in real time.

But Intel is going further. To create an emotional connection with merged reality experiences, Alloy headsets use RealSense cameras to record a viewer’s spatial and contextual awareness (basically so you can move without colliding into real-world stuff). Achieving this at high fidelity requires a superfast data capture, which Intel pins at more than 50 million 3D points per second.

At this year’s SuperBowl, Intel and Fox Labs produced panoramic POV replays from an array of 38 5K cameras ringing Houston's NRG Stadium. With the entire field including the players digitized, the viewer could theoretically view from any position, from any point of view, and with an enhanced ability to interact. It is exciting to imagine viewers with Alloy headsets perhaps as soon as next year being able to appreciate the point of view a player in the SuperBowl.

“This is a game changer for the entire category of virtual and augmented reality,” believes Achin Bhowmik, Intel’s VP, Perceptual Computing Group. “You choose the experience, and you get to navigate real-world content in new ways.”

Bhowmik goes on to point out that it took billions of years of evolution to develop sophisticated human perception comprising 3D vision, binaural hearing, smell and taste connected to a powerful brain with incredible processing capabilities.

“It took only a decade for digital devices to sense like humans, due to the rapid pace of perceptual computing innovation,” he says. “The ability to learn and adapt to what devices sense is right around the corner.”

 

Get articles like this by email