Live Streaming on iOS

In our work, we often need to implement video streaming from iOS devices in real-time or near real-time. A common example is the use of iOS devices as surveillance cameras or creation of streaming applications like Periscope. Problem statement often imposes additional requirements like enabling of stream playback on another device or running the app smoothly in the browser or VLC player, low latency (video streaming in near real-time), low resource consumption (long battery life), no need in a dedicated media server, etc.

Such a problem in itself is not new, but no straightforward solution has been found for it yet. Putting it more accurately, there is a whole range of possible solutions:

Each approach has its benefits and downsides. For example, in our past but relevant post How to Live Stream Video as You Shoot It in iOS, we wrote about on-device HLS stream generation. It allows you to do without a media server and immediately download content to the CDN (e.g., Amazon S3 + CloudFront). But as such an HLS based approach has its inherent flaws (see below), this time we would like to discuss some better options: on-device generation of an fMP4 flow and RTP streaming based on a local RTSP server.

HTTP Live Streaming

HLS emerged in 2009 and quickly became relatively popular. This was facilitated by full support for HLS in the Apple ecosystem and its pretty clear structure: there is a master index file (or playlist) containing a list of links to video stream segments (or chunks) continuously added during online streaming. In HLS you also can define multiple alternate video streams for clients with alternate bitrate requirements (low/high bandwidth, etc.). However, such a two-tier system with the need to update the playlist has become a limitation to HLS use. HLS has shown poor performance in real-time broadcasts due to the following issues:

  1. The stream is cut into small chunks (several seconds each, as recommended); hence, inherent latency might be too high for real-time.
  2. However, a standard guideline for video players is to buffer a minimum of 3 chunks. Hence, the latency of item 1 would be tripled at the player level;
  3. Moreover, for the player to learn of new chunks in the stream the player has to continually re-fetch the index file; this is another source of latency (otherwise the player would be simply unaware of what to show next).
  4. Each chunk is an MPEG-TS file having substantial overhead to the core media content.

An obvious step to reduce latency in HLS would be to reduce the chunk size to the minimum (1 second). However, with such small file sizes the stream becomes prone to network instability. Hence, under the average “natural” conditions it is almost impossible to achieve smooth playback: more player resources are allocated for re-querying the master file and next chunk than for video display itself.

To resolve such issues, similar approaches were offered in 2011-2012: MPEG – by MPEG-DASH, Smooth Streaming – by Microsoft, and HDS – by Adobe. However, none of them became the “de-facto standard” (although of those three MPEG-DASH is a full-scale ISO standard), as they had the similar underlying shortcomings and, moreover, the Adobe’s and Microsoft’s solutions necessitated a special server-side support. Here you can find the table comparing these formats.

Fragmented MP4

As the time went on, another approach became popular for simple broadcasts, fMP4 (fragmented MP4). It was a minor (but essential) expansion to MP4, a well-known and widespread format almost ubiquitous even at that time, so broad support of fMP4 naturally came along shortly after.

All the difference between a regular MP4 and fMP4 is in the arrangement of elements describing the video and audio streams. In a conventional MP4 file, such elements are located in the end of the file, but in fMP4 they are put in the beginning of the file. As MP4 could natively contain multiple streams divided into separate data chunks, such a simple change made the file “infinite” for the player.

This very fMP4 property is used for live streaming. First, the player reads the description of the video or audio stream and then starts to wait for the data and display them (play back) as they arrive. If you generate your footage chunks on the fly, the player will be able to automatically play back a real-time stream at no additional effort.

And it does work! Of course, certain issues may arise at fMP4 generator implementation. To discuss them, let’s take an example, our DemoFMP4 app.

fMP4 Live Streaming on iOS

This is a demo app which takes the frames from the camera and sends them in the fMP4 format to the connected viewers. To connect to the device viewers have to request a virtual “MP4” file automatically generated at the following address: http://<ip-address>:7000/index.mp4.
For this the app launches a lightweight GCDWebServer which listens to port 7000 and serves requests to download index.mp4.

Here you may encounter several issues.

  1. First of all, video cameras output “raw” frames that cannot be transparently sent to the player: you should compress and format them properly first. Fortunately enough, starting from iOS 8.0, Apple opened software-based access to hardware video compression capable to generate H.264 chunks on the fly. For this purpose, VTCompressionSessionCreate and VTCompressionSessionEncodeFrame families of functions from the VideoToolbox framework, are used.
    1. Audio is compressed in a similar way: by the AudioConverterNewSpecific / AudioConverterFillComplexBuffer functions of AudoToolbox. This results in data chunks in the AAC format.
  2. Second, the camera outputs frames at a comparatively high rate. So we do not lose them, we keep the frames in the CBCircularData ring buffers sending the frames to compression as they are filled. Such ring buffers are also used to generate a chunked response, so the app is not keeping more than a preset number of frames in its memory (otherwise an infinite broadcast would require infinite memory to be allocated).
  3. Third, for proper operation of fMP4 you should correctly set the initial stream data (including Sps/Pps) in the MP4 file’s moov block. To do this, the app seeks through the H.264 blocks generated by the hardware encoder, finds the next key frame and fetches the value of Sps/Pps. Then, when generating the moov header, it uses them to properly time the stream in the player. Hence, from the player perspective the file is always shown “from the very beginning”.
  4. There is another problem: MP4 has its own data requirements, and can include properly formatted H.264 blocks only. We have solved this issue by attaching a wonderful library called Bento4 that helps you to repack the H.264 blocks into properly formatted MP4 atoms on the fly.

So we have made an app capable to deliver the stream almost latency-free from a device camera in real-time by sending an “infinite MP4 file” to any standard HTTP client. This approach results in a fairly small latency of 1-2 seconds. In view of broad support of fMP4 and ease of organizing such a broadcast (no ad-hoc server is needed), such client based generation of fMP4 is a simple and reliable solution.

True Real-Time Live Streaming

But what if we need “true real-time”, like in Skype, for example. Unfortunately, fMP4 is not the best fit for this. Although there is no master index file here (like in HLS) and video stream chunks are small, there are still chunks within the MP4 file. So, until such a chunk has been downloaded completely, the client would not be able to see the frames, so a light latency may emerge.

RTSP Live Streaming on iOS

Hence for true real-time meaning that the player gets a frame almost immediately after it has been generated by the camera, another format natively designed for streaming is more suited. We mean RTSP, which has also been well-supported among the players (for instance, it is easily played back by the well-known VLC player), and it is relatively easy to implement. To explain, let’s look at our sample app for RTSP-based streaming, DemoRTSP.

Unlike HLS and fMP4 which exchange data via HTTP, RTSP originally uses its own format on top of “bare sockets.” Also, RTSP uses two data channels, i.e., a control channel to exchange the control data between the client and server, and delivery channel to stream only the compressed data from the server. This slightly complicates the exchange model, but minimizes latency as the client automatically receives data as soon as they are sent by the server over the network. There is simply no intermediary in RTSP!

DemoRTSP constitutes a minimum set to implement such exchange. At launch time, the app starts to listen the service port (554) for connects from client players. On connect, DemoRTSP sends a response containing a simple line indicating a codec to be used to compress video and audio (for iOS this is a standard H.264/AAC pair) and the data port to deliver compressed frames from the server. Then the client connects to the “data port”, and plays back everything it receives from the server. In this model, the server never waits for anything and all the compressed frames are immediately sent to the player, ensuring the minimum latency.

It must be said that on the modern iOS devices compression and networking are not so resource-consuming. Therefore, apart from a simple streaming, additional stream handling before transmission is possible. For example, it is easy to overlay text, use video effects, or otherwise automatically change video as the app needs. For this purpose, both post-processing of the camera buffer or a more efficient OpenGL based approach can be used. Let us discuss this in more detail.

Live Video Effects on iOS

In our DemoRTSP app we have used a simple approach based on PBVision, but in your apps you can use more advanced solutions based on GPUImage allowing to apply to video stream a chain of OpenGL effects and bring latency almost to zero.

In DemoRTSP you can find an example of blur overlay on the stream.
Who has said Prisma-like video generation on the fly is impossible?!

You can go even further, and use the “quickest” way of image handling currently available in iOS – Metal, which came as a replacement to OpenGL (in the recent iOS versions). For instance, in the MetalVideoCapture sample you can see how to use the CVMetalTextureCache API to pass the camera capture to Metal Render Pass.

Besides Metal, the latest iOS versions came with a few interesting features. One of them is the ReplayKit where you can stream the device’s screen with just a couple of lines of code. This platform is pretty fascinating, but at the moment it does not allow to “meddle” with frame creation, cannot capture from the camera (only from the screen!) and is limited to built-in capturing features. Currently, this framework is not intended for full-scale controlled streaming, more resembling something “reserved for future use.” The same applies to another new popular development, support of the new Swift programming language in iOS. Unfortunately, it is not well-integrable with the low-level features needed for video and data handling, so it is unlikely it would be applicable to such tasks anytime soon, except for the interface.


  1. iOS Live Streaming
  2. MP4/fMP4
  4. Metal app
  5. ReplayKit
  6. DemoFMP4 and DemoRTSP
  7. VideoToolbox
  8. AudioToolbox
  9. Bento4
  10. GCDWebServer
  11. PBJVision

How to make a great app preview video

If you are reading this then you most likely know that Apple released a new version of iOS 8 which allows app developers to publish short 15-30 seconds preview videos of their apps. As you know, the App Store is very crowded so making a great app preview video is something you’ll want to do as soon as possible!

In this article I will guide you through the process of making an app preview video as well as share some lessons I learnt while making a video for the Together app using Adobe Premiere Pro CC. The process took almost a whole day and I’m pretty sure that this post will save you a lot of time and make your video better! Continue reading

Video Delivery Analytics with Google Analytics

Many startups and new initiatives feature high quality videos related to their business. Quite often those startups want to optimize user experience with their content. Having such a great demand that keeps growing has caused many cloud infrastructure providers to appear on the horizon. In this article, I provide our story of choosing the best infrastructure providers for our projects across the world. Our secret is in measuring everything and making data driven decisions as there seem to be no silver bullets around. Continue reading

Doğan TV Holding deploys DENIVIP Video Load Balancer

Moscow, Russia – January 20, 2014 – DENIVIP Media, a leading provider of video load balancing technologies, today announced the deployment of Intelligent Video Load Balancing system for Doğan TV Holding. The tightly integrated solution, delivered by DENIVIP Media enables the leading digital entertainment service of Turkey – NETD – – to efficiently use in-house infrastructure, manage third party CDN providers and assure the best quality of content delivery to its users.

Video content delivery is the cornerstone of any Internet broadcasting service. It impacts the most important aspects of quality of user experience as well as broadcaster expenditures. The bigger the broadcaster the more important being effective in video content delivery. DENIVIP Video Load Balancer solves both issues, it lets broadcasters to deliver a particular content the most effective way to a particular viewer and the same time make it less expensive way choosing over in-house infrastructure and CDN providers.

One of the most important features of DENIVIP Video Load Balancer is the ability to route a user’s video player to the delivery point where the requested content is already cached. So called, cache aware load balancing is very important when you need to minimize the delay between play button called and the real playback begins.

Doğan Media Group experiences solid fluctuations of video load during the time and it was very important to implement a proper handling for possible significant overloads during special events or some viral demand growth. DENIVIP Video Load Balancer helps to distribute traffic not only among internal infrastructure but a set of external CDN providers, making it simple to offload usage peaks to external CDN providers.

“We were mastering intelligence of video load distribution over unstable public internet segments and broad geography for more than 5 years. I’m thrilled that our technology is serving millions of Internet video viewers in Turkey and beyond.” said Denis Bulichenko, CEO of DENIVIP Media.

“We were looking for a very sophisticated solution meeting our needs with a reasonable licensing terms. It was a pleasure to discuss best practices of video load distribution and adopt the best ones in our project.” said Ziya Ozgur, OTT Headend Manager.

About Doğan TV Holding

Doğan Yayın Holding (DYH) is active in a wide range of fields including newspaper, magazine and book publishing, television and radio broadcasting and production, as well as the Internet, digital world, print and distribution. Content providers of the Group include newspapers, magazines, publishing houses, television channels, radio stations, as well as music and production companies. The Group’s service providers are made up of distribution, production, digital platform, news agency, Internet and printing companies, as well as a factoring company. For more information, please visit

About DENIVIP Media

DENIVIP Media is the leading supplier of scalable solutions for multiscreen content delivery. Founded in 2008 and headquartered in Moscow, Russia, the company pioneered the use of intelligent software load balancers to power video content delivery over IP networks. Providing unmatched solutions for leading media companies worldwide, DENIVIP Media helps pay TV operators, content programmers, film studios and sports broadcasters bring video to any screen the most effective way. To learn more, please visit or and follow @DENIVIPMedia on Twitter.

Making a Distributed Storage System

Hardly any project today can avoid storing of a large amount of media objects (video chunks, photos, music, etc).
In our projects, we often needed our storage system to be highly reliable, as loss of content often results in service interruption (I think this is true for most of the projects). Moreover, as the service capacity grows, such characteristics as performance, scalability, manageability, etc., become of key importance.

To enable content storage, you can use different distributed file systems, however each of them obviously has its own upsides and downsides. So selection of an optimum file system is in no way a trivial task. Recently, we have been solving such a task for our Together project, an innovative mobile video content platform to handle user-generated content just like Netflix handles movies. Then, we have started using the platform in other our services, such as: PhotoSuerte to store photos and Veranda to store short videos. In most of our projects, the platform has proven itself efficient. In this post, we would like to tell you how to create a distributed data storage for a video platform. Continue reading

Alternative Audio Tracks in HTML5 Video


A major advantage of HTTP Live Streaming, and it is still unbeaten by any other standard, is that you can use any technology to compose multimedia content. You don’t need expensive media servers to handle media streams. You can easily delegate media composition to some simple PHP / Python / Ruby / Node.js modules. As a long-time Flash lover I was somewhat prejudiced against the Apple’s video delivery standard. But as a developer, every day I use it to solve tasks which have been almost unattainable with the previous stack (or required expensive software). Adobe HDS playlists have been really hard to deal with. Just think of binary data in f4m playlists. They require much more time to develop & debug the solution. MPEG DASH is also far from being intuitive.

In this post, we are going to discuss how to make an alternative audio track for your video. Although HTTP Live Streaming can streamline this task, yet there are some limitations, so you need to make certain hacks on the client side. In our Together project, we had to implement alternate sound tracks for user videos. Luckily, we use HTTP Live Streaming throughout the system. Continue reading

Playlist Player for iOS

DVPlaylistPlayer Title

A player is a core part of your video application. If you ever have tried to build a player into your application, you probably know that you need some time to set up and customize it. We would venture to say, that mostly you need a player to run playlists rather than individual videos. In this post, we would like to tell you about our open source component we use to play back videos in our Together project. Please welcome to our playlist player, DVPlaylistPlayer. Continue reading

How to Design a Video Platform

video platform architecture

In this post, we are going to dwell on the process of video platform design for our Together project. From the technical viewpoint, this project is remarkable by containing the entire content lifecycle, from its creation on mobile devices to distribution and viewing. While designing the platform, we sought to attain solution flexibility and cost-efficiency. With the new video platform you can receive, store and share videos. All video management tasks were implemented on Apple HLS. Continue reading


http streaming hero

Having been involved in the Together project, I was assigned a task to enable Apple HLS video playback on the Flash platform. Video content delivery in a single format (HLS in this case) is usually very easy and offers many benefits. To process video, Flash has an open source OSMF framework that can be easily enhanced with various plugins. But there is one problem: the framework is absolutely HLS-agnostic. Adobe promoted RTMP first, and only then offered HTTP Dynamic Streaming (HDS) as an alternative to Apple HLS. In this post, we’ll cover a free HLS plugin that we have developed to run HLS in OSMF-enabled video players. Continue reading