Live Streaming on iOS

In our work, we often need to implement video streaming from iOS devices in real-time or near real-time. A common example is the use of iOS devices as surveillance cameras or creation of streaming applications like Periscope. Problem statement often imposes additional requirements like enabling of stream playback on another device or running the app smoothly in the browser or VLC player, low latency (video streaming in near real-time), low resource consumption (long battery life), no need in a dedicated media server, etc.

Such a problem in itself is not new, but no straightforward solution has been found for it yet. Putting it more accurately, there is a whole range of possible solutions:

Each approach has its benefits and downsides. For example, in our past but relevant post How to Live Stream Video as You Shoot It in iOS, we wrote about on-device HLS stream generation. It allows you to do without a media server and immediately download content to the CDN (e.g., Amazon S3 + CloudFront). But as such an HLS based approach has its inherent flaws (see below), this time we would like to discuss some better options: on-device generation of an fMP4 flow and RTP streaming based on a local RTSP server.

HTTP Live Streaming

HLS emerged in 2009 and quickly became relatively popular. This was facilitated by full support for HLS in the Apple ecosystem and its pretty clear structure: there is a master index file (or playlist) containing a list of links to video stream segments (or chunks) continuously added during online streaming. In HLS you also can define multiple alternate video streams for clients with alternate bitrate requirements (low/high bandwidth, etc.). However, such a two-tier system with the need to update the playlist has become a limitation to HLS use. HLS has shown poor performance in real-time broadcasts due to the following issues:

  1. The stream is cut into small chunks (several seconds each, as recommended); hence, inherent latency might be too high for real-time.
  2. However, a standard guideline for video players is to buffer a minimum of 3 chunks. Hence, the latency of item 1 would be tripled at the player level;
  3. Moreover, for the player to learn of new chunks in the stream the player has to continually re-fetch the index file; this is another source of latency (otherwise the player would be simply unaware of what to show next).
  4. Each chunk is an MPEG-TS file having substantial overhead to the core media content.

An obvious step to reduce latency in HLS would be to reduce the chunk size to the minimum (1 second). However, with such small file sizes the stream becomes prone to network instability. Hence, under the average “natural” conditions it is almost impossible to achieve smooth playback: more player resources are allocated for re-querying the master file and next chunk than for video display itself.

To resolve such issues, similar approaches were offered in 2011-2012: MPEG – by MPEG-DASH, Smooth Streaming – by Microsoft, and HDS – by Adobe. However, none of them became the “de-facto standard” (although of those three MPEG-DASH is a full-scale ISO standard), as they had the similar underlying shortcomings and, moreover, the Adobe’s and Microsoft’s solutions necessitated a special server-side support. Here you can find the table comparing these formats.

Fragmented MP4

As the time went on, another approach became popular for simple broadcasts, fMP4 (fragmented MP4). It was a minor (but essential) expansion to MP4, a well-known and widespread format almost ubiquitous even at that time, so broad support of fMP4 naturally came along shortly after.

All the difference between a regular MP4 and fMP4 is in the arrangement of elements describing the video and audio streams. In a conventional MP4 file, such elements are located in the end of the file, but in fMP4 they are put in the beginning of the file. As MP4 could natively contain multiple streams divided into separate data chunks, such a simple change made the file “infinite” for the player.

This very fMP4 property is used for live streaming. First, the player reads the description of the video or audio stream and then starts to wait for the data and display them (play back) as they arrive. If you generate your footage chunks on the fly, the player will be able to automatically play back a real-time stream at no additional effort.

And it does work! Of course, certain issues may arise at fMP4 generator implementation. To discuss them, let’s take an example, our DemoFMP4 app.

fMP4 Live Streaming on iOS

This is a demo app which takes the frames from the camera and sends them in the fMP4 format to the connected viewers. To connect to the device viewers have to request a virtual “MP4” file automatically generated at the following address: http://<ip-address>:7000/index.mp4.
For this the app launches a lightweight GCDWebServer which listens to port 7000 and serves requests to download index.mp4.

Here you may encounter several issues.

  1. First of all, video cameras output “raw” frames that cannot be transparently sent to the player: you should compress and format them properly first. Fortunately enough, starting from iOS 8.0, Apple opened software-based access to hardware video compression capable to generate H.264 chunks on the fly. For this purpose, VTCompressionSessionCreate and VTCompressionSessionEncodeFrame families of functions from the VideoToolbox framework, are used.
    1. Audio is compressed in a similar way: by the AudioConverterNewSpecific / AudioConverterFillComplexBuffer functions of AudoToolbox. This results in data chunks in the AAC format.
  2. Second, the camera outputs frames at a comparatively high rate. So we do not lose them, we keep the frames in the CBCircularData ring buffers sending the frames to compression as they are filled. Such ring buffers are also used to generate a chunked response, so the app is not keeping more than a preset number of frames in its memory (otherwise an infinite broadcast would require infinite memory to be allocated).
  3. Third, for proper operation of fMP4 you should correctly set the initial stream data (including Sps/Pps) in the MP4 file’s moov block. To do this, the app seeks through the H.264 blocks generated by the hardware encoder, finds the next key frame and fetches the value of Sps/Pps. Then, when generating the moov header, it uses them to properly time the stream in the player. Hence, from the player perspective the file is always shown “from the very beginning”.
  4. There is another problem: MP4 has its own data requirements, and can include properly formatted H.264 blocks only. We have solved this issue by attaching a wonderful library called Bento4 that helps you to repack the H.264 blocks into properly formatted MP4 atoms on the fly.

So we have made an app capable to deliver the stream almost latency-free from a device camera in real-time by sending an “infinite MP4 file” to any standard HTTP client. This approach results in a fairly small latency of 1-2 seconds. In view of broad support of fMP4 and ease of organizing such a broadcast (no ad-hoc server is needed), such client based generation of fMP4 is a simple and reliable solution.

True Real-Time Live Streaming

But what if we need “true real-time”, like in Skype, for example. Unfortunately, fMP4 is not the best fit for this. Although there is no master index file here (like in HLS) and video stream chunks are small, there are still chunks within the MP4 file. So, until such a chunk has been downloaded completely, the client would not be able to see the frames, so a light latency may emerge.

RTSP Live Streaming on iOS

Hence for true real-time meaning that the player gets a frame almost immediately after it has been generated by the camera, another format natively designed for streaming is more suited. We mean RTSP, which has also been well-supported among the players (for instance, it is easily played back by the well-known VLC player), and it is relatively easy to implement. To explain, let’s look at our sample app for RTSP-based streaming, DemoRTSP.

Unlike HLS and fMP4 which exchange data via HTTP, RTSP originally uses its own format on top of “bare sockets.” Also, RTSP uses two data channels, i.e., a control channel to exchange the control data between the client and server, and delivery channel to stream only the compressed data from the server. This slightly complicates the exchange model, but minimizes latency as the client automatically receives data as soon as they are sent by the server over the network. There is simply no intermediary in RTSP!

DemoRTSP constitutes a minimum set to implement such exchange. At launch time, the app starts to listen the service port (554) for connects from client players. On connect, DemoRTSP sends a response containing a simple line indicating a codec to be used to compress video and audio (for iOS this is a standard H.264/AAC pair) and the data port to deliver compressed frames from the server. Then the client connects to the “data port”, and plays back everything it receives from the server. In this model, the server never waits for anything and all the compressed frames are immediately sent to the player, ensuring the minimum latency.

It must be said that on the modern iOS devices compression and networking are not so resource-consuming. Therefore, apart from a simple streaming, additional stream handling before transmission is possible. For example, it is easy to overlay text, use video effects, or otherwise automatically change video as the app needs. For this purpose, both post-processing of the camera buffer or a more efficient OpenGL based approach can be used. Let us discuss this in more detail.

Live Video Effects on iOS

In our DemoRTSP app we have used a simple approach based on PBVision, but in your apps you can use more advanced solutions based on GPUImage allowing to apply to video stream a chain of OpenGL effects and bring latency almost to zero.

In DemoRTSP you can find an example of blur overlay on the stream.
Who has said Prisma-like video generation on the fly is impossible?!

You can go even further, and use the “quickest” way of image handling currently available in iOS – Metal, which came as a replacement to OpenGL (in the recent iOS versions). For instance, in the MetalVideoCapture sample you can see how to use the CVMetalTextureCache API to pass the camera capture to Metal Render Pass.

Besides Metal, the latest iOS versions came with a few interesting features. One of them is the ReplayKit where you can stream the device’s screen with just a couple of lines of code. This platform is pretty fascinating, but at the moment it does not allow to “meddle” with frame creation, cannot capture from the camera (only from the screen!) and is limited to built-in capturing features. Currently, this framework is not intended for full-scale controlled streaming, more resembling something “reserved for future use.” The same applies to another new popular development, support of the new Swift programming language in iOS. Unfortunately, it is not well-integrable with the low-level features needed for video and data handling, so it is unlikely it would be applicable to such tasks anytime soon, except for the interface.


  1. iOS Live Streaming
  2. MP4/fMP4
  4. Metal app
  5. ReplayKit
  6. DemoFMP4 and DemoRTSP
  7. VideoToolbox
  8. AudioToolbox
  9. Bento4
  10. GCDWebServer
  11. PBJVision

How to Optimize Waveform Rendering in iOS

In the article we are going to share our experience of finding the optimal way to visualize a waveform in an iOS app, choosing between CAShapeLayer and drawRect(), and certain nuances of using Swift. The publication is targeted at those who create complex custom UI components for sound processing, but also any iOS developers wishing to broaden their horizons. Continue reading

Faster Than Real Time Stabilization of Smartphone Videos

Casual videos shot with a smartphone can often suffer from shakiness. ‎Software-based video stabilization methods have been actively developing over the ‎last years and can make such videos look much more professional and fluid. We ‎have just released Deshake, our video stabilizer app for iOS that can enable faster ‎than real-time video processing. In this article, we are going to ‎overview the video stabilization techniques available and discuss some of ‎engineering challenges we have faced during the development.‎ Continue reading

How to remove shakiness from a smartphone video

Technology offers a variety of means by which we can now capture special moments, all in the palm of our hands using our mobile devices. At the same time, our devices offer functionalities that would have been unthinkable a mere ten years ago. One such feature would be the anti-shaking functionality on videos. While most smartphones don’t offer some form of video stabilisation function, mobile companies have not yet perfected it. Continue reading

One app story

For quite a long time I was sure that most of my University skills could hardly be used in my business. Just try to think of any use for military- KGB-ideology kind of stuff, taught for 5 years, mixed with development courses in rare devices. No, I didn’t think so either.

I like to think though, that the first part has helped me to deal with complex situations. Next to that, the ideological stuff put me in a lot of very funny situations and they say laugh extends one’s lifetime span, so… The surprise was that my knowledge of development of very specific devices became very helpful. Even better; that knowledge led to a new app creation. Continue reading

How to make a great app preview video

If you are reading this then you most likely know that Apple released a new version of iOS 8 which allows app developers to publish short 15-30 seconds preview videos of their apps. As you know, the App Store is very crowded so making a great app preview video is something you’ll want to do as soon as possible!

In this article I will guide you through the process of making an app preview video as well as share some lessons I learnt while making a video for the Together app using Adobe Premiere Pro CC. The process took almost a whole day and I’m pretty sure that this post will save you a lot of time and make your video better! Continue reading

Mobile Apps UX optimisation with Google Analytics

Probably one of the most popular terms in mobile business is MVP. Numerous evangelists, advocates, consultants and mentors work with startups and help them apply the Lean Startup ethos. They push them towards building an MVP and going further with incremental updates. The main goal of this activity is to find a product market match and understand financial characteristics. To put things simply, these are preliminary steps to build a real business plan and scale the business. If you scale an unprofitable business model, then you scale losses. There are a lot of materials on this topic, but there is not so much information on how to actually do that, or how to get actionable data. This is not a problem, though. I’ll show you how to make it using Google Analytics, which is both simple and free. To make it more fun and more interesting I’ll use PhotoSuerte app as a case study. Continue reading

Video Delivery Analytics with Google Analytics

Many startups and new initiatives feature high quality videos related to their business. Quite often those startups want to optimize user experience with their content. Having such a great demand that keeps growing has caused many cloud infrastructure providers to appear on the horizon. In this article, I provide our story of choosing the best infrastructure providers for our projects across the world. Our secret is in measuring everything and making data driven decisions as there seem to be no silver bullets around. Continue reading

Google Analytics for Pirates

Google Analytics for Pirates

Pirate Metrics is a simple growth hacking (business development) framework by Dave McClure. In this article, I provide a detailed guide for startup founders on how we use Google Analytics to track Pirate Metrics. The framework is based on the assumption that every startup needs to get customers through 5 key stages, Acquisition, Activation, Retention, Referral and Revenue. (AARRR)

Google Analytics, for us, is the best tool to gather actionable data among all our apps and websites. But Google Analytics’ flexibility and power comes at a cost of complexity. It is so not obvious how to get needed insight, even when you completely understand how to aggregate needed information. Here I will provide some very useful recipes, which you could start using while reading the article. Most of the data will be based on our flagship Together app which is a kind of mobile video editor with organizing capabilities. In most common use cases, you just import a recently shot set of videos, splice them into one short video story, add a music soundtrack to set the right mood and impress all of your friends and family with a nice video story. Download it and have some fun. Continue reading

Data Syncing in Core Data Based iOS Apps


In this post, we would like to discuss possible issues that you may face while implementing data syncing in iOS applications, and how you can tackle such issues with various approaches and tools. We are going to approach the issue from different angles, trying to make a generic overview of the topic. Still the main focus for us are applications that have a data model that is sophisticated enough to need a structured database and the Core Data framework.

Continue reading