Live Streaming on iOS

In our work, we often need to implement video streaming from iOS devices in real-time or near real-time. A common example is the use of iOS devices as surveillance cameras or creation of streaming applications like Periscope. Problem statement often imposes additional requirements like enabling of stream playback on another device or running the app smoothly in the browser or VLC player, low latency (video streaming in near real-time), low resource consumption (long battery life), no need in a dedicated media server, etc.

Such a problem in itself is not new, but no straightforward solution has been found for it yet. Putting it more accurately, there is a whole range of possible solutions:

Each approach has its benefits and downsides. For example, in our past but relevant post How to Live Stream Video as You Shoot It in iOS, we wrote about on-device HLS stream generation. It allows you to do without a media server and immediately download content to the CDN (e.g., Amazon S3 + CloudFront). But as such an HLS based approach has its inherent flaws (see below), this time we would like to discuss some better options: on-device generation of an fMP4 flow and RTP streaming based on a local RTSP server.

HTTP Live Streaming

HLS emerged in 2009 and quickly became relatively popular. This was facilitated by full support for HLS in the Apple ecosystem and its pretty clear structure: there is a master index file (or playlist) containing a list of links to video stream segments (or chunks) continuously added during online streaming. In HLS you also can define multiple alternate video streams for clients with alternate bitrate requirements (low/high bandwidth, etc.). However, such a two-tier system with the need to update the playlist has become a limitation to HLS use. HLS has shown poor performance in real-time broadcasts due to the following issues:

  1. The stream is cut into small chunks (several seconds each, as recommended); hence, inherent latency might be too high for real-time.
  2. However, a standard guideline for video players is to buffer a minimum of 3 chunks. Hence, the latency of item 1 would be tripled at the player level;
  3. Moreover, for the player to learn of new chunks in the stream the player has to continually re-fetch the index file; this is another source of latency (otherwise the player would be simply unaware of what to show next).
  4. Each chunk is an MPEG-TS file having substantial overhead to the core media content.

An obvious step to reduce latency in HLS would be to reduce the chunk size to the minimum (1 second). However, with such small file sizes the stream becomes prone to network instability. Hence, under the average “natural” conditions it is almost impossible to achieve smooth playback: more player resources are allocated for re-querying the master file and next chunk than for video display itself.

To resolve such issues, similar approaches were offered in 2011-2012: MPEG – by MPEG-DASH, Smooth Streaming – by Microsoft, and HDS – by Adobe. However, none of them became the “de-facto standard” (although of those three MPEG-DASH is a full-scale ISO standard), as they had the similar underlying shortcomings and, moreover, the Adobe’s and Microsoft’s solutions necessitated a special server-side support. Here you can find the table comparing these formats.

Fragmented MP4

As the time went on, another approach became popular for simple broadcasts, fMP4 (fragmented MP4). It was a minor (but essential) expansion to MP4, a well-known and widespread format almost ubiquitous even at that time, so broad support of fMP4 naturally came along shortly after.

All the difference between a regular MP4 and fMP4 is in the arrangement of elements describing the video and audio streams. In a conventional MP4 file, such elements are located in the end of the file, but in fMP4 they are put in the beginning of the file. As MP4 could natively contain multiple streams divided into separate data chunks, such a simple change made the file “infinite” for the player.

This very fMP4 property is used for live streaming. First, the player reads the description of the video or audio stream and then starts to wait for the data and display them (play back) as they arrive. If you generate your footage chunks on the fly, the player will be able to automatically play back a real-time stream at no additional effort.

And it does work! Of course, certain issues may arise at fMP4 generator implementation. To discuss them, let’s take an example, our DemoFMP4 app.

fMP4 Live Streaming on iOS

This is a demo app which takes the frames from the camera and sends them in the fMP4 format to the connected viewers. To connect to the device viewers have to request a virtual “MP4” file automatically generated at the following address: http://<ip-address>:7000/index.mp4.
For this the app launches a lightweight GCDWebServer which listens to port 7000 and serves requests to download index.mp4.

Here you may encounter several issues.

  1. First of all, video cameras output “raw” frames that cannot be transparently sent to the player: you should compress and format them properly first. Fortunately enough, starting from iOS 8.0, Apple opened software-based access to hardware video compression capable to generate H.264 chunks on the fly. For this purpose, VTCompressionSessionCreate and VTCompressionSessionEncodeFrame families of functions from the VideoToolbox framework, are used.
    1. Audio is compressed in a similar way: by the AudioConverterNewSpecific / AudioConverterFillComplexBuffer functions of AudoToolbox. This results in data chunks in the AAC format.
  2. Second, the camera outputs frames at a comparatively high rate. So we do not lose them, we keep the frames in the CBCircularData ring buffers sending the frames to compression as they are filled. Such ring buffers are also used to generate a chunked response, so the app is not keeping more than a preset number of frames in its memory (otherwise an infinite broadcast would require infinite memory to be allocated).
  3. Third, for proper operation of fMP4 you should correctly set the initial stream data (including Sps/Pps) in the MP4 file’s moov block. To do this, the app seeks through the H.264 blocks generated by the hardware encoder, finds the next key frame and fetches the value of Sps/Pps. Then, when generating the moov header, it uses them to properly time the stream in the player. Hence, from the player perspective the file is always shown “from the very beginning”.
  4. There is another problem: MP4 has its own data requirements, and can include properly formatted H.264 blocks only. We have solved this issue by attaching a wonderful library called Bento4 that helps you to repack the H.264 blocks into properly formatted MP4 atoms on the fly.

flowchart-livestreaming-en
So we have made an app capable to deliver the stream almost latency-free from a device camera in real-time by sending an “infinite MP4 file” to any standard HTTP client. This approach results in a fairly small latency of 1-2 seconds. In view of broad support of fMP4 and ease of organizing such a broadcast (no ad-hoc server is needed), such client based generation of fMP4 is a simple and reliable solution.

True Real-Time Live Streaming

But what if we need “true real-time”, like in Skype, for example. Unfortunately, fMP4 is not the best fit for this. Although there is no master index file here (like in HLS) and video stream chunks are small, there are still chunks within the MP4 file. So, until such a chunk has been downloaded completely, the client would not be able to see the frames, so a light latency may emerge.

RTSP Live Streaming on iOS

Hence for true real-time meaning that the player gets a frame almost immediately after it has been generated by the camera, another format natively designed for streaming is more suited. We mean RTSP, which has also been well-supported among the players (for instance, it is easily played back by the well-known VLC player), and it is relatively easy to implement. To explain, let’s look at our sample app for RTSP-based streaming, DemoRTSP.

Unlike HLS and fMP4 which exchange data via HTTP, RTSP originally uses its own format on top of “bare sockets.” Also, RTSP uses two data channels, i.e., a control channel to exchange the control data between the client and server, and delivery channel to stream only the compressed data from the server. This slightly complicates the exchange model, but minimizes latency as the client automatically receives data as soon as they are sent by the server over the network. There is simply no intermediary in RTSP!

DemoRTSP constitutes a minimum set to implement such exchange. At launch time, the app starts to listen the service port (554) for connects from client players. On connect, DemoRTSP sends a response containing a simple line indicating a codec to be used to compress video and audio (for iOS this is a standard H.264/AAC pair) and the data port to deliver compressed frames from the server. Then the client connects to the “data port”, and plays back everything it receives from the server. In this model, the server never waits for anything and all the compressed frames are immediately sent to the player, ensuring the minimum latency.

It must be said that on the modern iOS devices compression and networking are not so resource-consuming. Therefore, apart from a simple streaming, additional stream handling before transmission is possible. For example, it is easy to overlay text, use video effects, or otherwise automatically change video as the app needs. For this purpose, both post-processing of the camera buffer or a more efficient OpenGL based approach can be used. Let us discuss this in more detail.

Live Video Effects on iOS

In our DemoRTSP app we have used a simple approach based on PBVision, but in your apps you can use more advanced solutions based on GPUImage allowing to apply to video stream a chain of OpenGL effects and bring latency almost to zero.

In DemoRTSP you can find an example of blur overlay on the stream.
Who has said Prisma-like video generation on the fly is impossible?!

You can go even further, and use the “quickest” way of image handling currently available in iOS – Metal, which came as a replacement to OpenGL (in the recent iOS versions). For instance, in the MetalVideoCapture sample you can see how to use the CVMetalTextureCache API to pass the camera capture to Metal Render Pass.

Besides Metal, the latest iOS versions came with a few interesting features. One of them is the ReplayKit where you can stream the device’s screen with just a couple of lines of code. This platform is pretty fascinating, but at the moment it does not allow to “meddle” with frame creation, cannot capture from the camera (only from the screen!) and is limited to built-in capturing features. Currently, this framework is not intended for full-scale controlled streaming, more resembling something “reserved for future use.” The same applies to another new popular development, support of the new Swift programming language in iOS. Unfortunately, it is not well-integrable with the low-level features needed for video and data handling, so it is unlikely it would be applicable to such tasks anytime soon, except for the interface.

Links

  1. iOS Live Streaming
  2. MP4/fMP4
  3. HLS vs MPEG-DASH
  4. Metal app
  5. ReplayKit
  6. DemoFMP4 and DemoRTSP
  7. VideoToolbox
  8. AudioToolbox
  9. Bento4
  10. GCDWebServer
  11. PBJVision

How to Optimize Waveform Rendering in iOS

In the article we are going to share our experience of finding the optimal way to visualize a waveform in an iOS app, choosing between CAShapeLayer and drawRect(), and certain nuances of using Swift. The publication is targeted at those who create complex custom UI components for sound processing, but also any iOS developers wishing to broaden their horizons. Continue reading

Faster Than Real Time Stabilization of Smartphone Videos

Casual videos shot with a smartphone can often suffer from shakiness. ‎Software-based video stabilization methods have been actively developing over the ‎last years and can make such videos look much more professional and fluid. We ‎have just released Deshake, our video stabilizer app for iOS that can enable faster ‎than real-time video processing. In this article, we are going to ‎overview the video stabilization techniques available and discuss some of ‎engineering challenges we have faced during the development.‎ Continue reading

How to remove shakiness from a smartphone video

Technology offers a variety of means by which we can now capture special moments, all in the palm of our hands using our mobile devices. At the same time, our devices offer functionalities that would have been unthinkable a mere ten years ago. One such feature would be the anti-shaking functionality on videos. While most smartphones don’t offer some form of video stabilisation function, mobile companies have not yet perfected it. Continue reading

One app story

For quite a long time I was sure that most of my University skills could hardly be used in my business. Just try to think of any use for military- KGB-ideology kind of stuff, taught for 5 years, mixed with development courses in rare devices. No, I didn’t think so either.

I like to think though, that the first part has helped me to deal with complex situations. Next to that, the ideological stuff put me in a lot of very funny situations and they say laugh extends one’s lifetime span, so… The surprise was that my knowledge of development of very specific devices became very helpful. Even better; that knowledge led to a new app creation. Continue reading

How to make a great app preview video

If you are reading this then you most likely know that Apple released a new version of iOS 8 which allows app developers to publish short 15-30 seconds preview videos of their apps. As you know, the App Store is very crowded so making a great app preview video is something you’ll want to do as soon as possible!

In this article I will guide you through the process of making an app preview video as well as share some lessons I learnt while making a video for the Together app using Adobe Premiere Pro CC. The process took almost a whole day and I’m pretty sure that this post will save you a lot of time and make your video better! Continue reading

Data Syncing in Core Data Based iOS Apps


the-cloud-rainbow-unicorn

In this post, we would like to discuss possible issues that you may face while implementing data syncing in iOS applications, and how you can tackle such issues with various approaches and tools. We are going to approach the issue from different angles, trying to make a generic overview of the topic. Still the main focus for us are applications that have a data model that is sophisticated enough to need a structured database and the Core Data framework.

Continue reading

How to Use UIDynamicAnimator in iOS7 Apps

With the release of iOS 7 developers got empowered by many new tools to make fascinating effects and animations in their apps. However applicability of many of them is still in question. As soon as we made ourselves familiar with them, in our new project we decided to make a new animation, similar to that used by Apple in Messages. Continue reading

How to configure Jenkins CI on Mac OS X to build Android- and iOS Phonegap/Cordova apps and deliver them to TestFlight/HockeyApp

In most cases, mobile apps development is a very exciting experience, particularly if you really enjoy what you do. As we are involved with a substantial number of external projects (mostly custom application and website development), with our native applications also rapidly growing in numbers, we took to minimize the time and effort we spend to build our releases and pre-releases. If you have multiple developers, testers, and lots of projects, the cost to build apps may become substantial. Therefore, to avoid time wastage and focus on really important service-oriented effort, we propose a guide on how to create a system to automatically build your apps. This approach we practice in our apps which we offer to your attention: Together, PhotoSuerte, Veranda. Continue reading