Live Streaming on iOS

In our work, we often need to implement video streaming from iOS devices in real-time or near real-time. A common example is the use of iOS devices as surveillance cameras or creation of streaming applications like Periscope. Problem statement often imposes additional requirements like enabling of stream playback on another device or running the app smoothly in the browser or VLC player, low latency (video streaming in near real-time), low resource consumption (long battery life), no need in a dedicated media server, etc.

Such a problem in itself is not new, but no straightforward solution has been found for it yet. Putting it more accurately, there is a whole range of possible solutions:

Each approach has its benefits and downsides. For example, in our past but relevant post How to Live Stream Video as You Shoot It in iOS, we wrote about on-device HLS stream generation. It allows you to do without a media server and immediately download content to the CDN (e.g., Amazon S3 + CloudFront). But as such an HLS based approach has its inherent flaws (see below), this time we would like to discuss some better options: on-device generation of an fMP4 flow and RTP streaming based on a local RTSP server.

HTTP Live Streaming

HLS emerged in 2009 and quickly became relatively popular. This was facilitated by full support for HLS in the Apple ecosystem and its pretty clear structure: there is a master index file (or playlist) containing a list of links to video stream segments (or chunks) continuously added during online streaming. In HLS you also can define multiple alternate video streams for clients with alternate bitrate requirements (low/high bandwidth, etc.). However, such a two-tier system with the need to update the playlist has become a limitation to HLS use. HLS has shown poor performance in real-time broadcasts due to the following issues:

  1. The stream is cut into small chunks (several seconds each, as recommended); hence, inherent latency might be too high for real-time.
  2. However, a standard guideline for video players is to buffer a minimum of 3 chunks. Hence, the latency of item 1 would be tripled at the player level;
  3. Moreover, for the player to learn of new chunks in the stream the player has to continually re-fetch the index file; this is another source of latency (otherwise the player would be simply unaware of what to show next).
  4. Each chunk is an MPEG-TS file having substantial overhead to the core media content.

An obvious step to reduce latency in HLS would be to reduce the chunk size to the minimum (1 second). However, with such small file sizes the stream becomes prone to network instability. Hence, under the average “natural” conditions it is almost impossible to achieve smooth playback: more player resources are allocated for re-querying the master file and next chunk than for video display itself.

To resolve such issues, similar approaches were offered in 2011-2012: MPEG – by MPEG-DASH, Smooth Streaming – by Microsoft, and HDS – by Adobe. However, none of them became the “de-facto standard” (although of those three MPEG-DASH is a full-scale ISO standard), as they had the similar underlying shortcomings and, moreover, the Adobe’s and Microsoft’s solutions necessitated a special server-side support. Here you can find the table comparing these formats.

Fragmented MP4

As the time went on, another approach became popular for simple broadcasts, fMP4 (fragmented MP4). It was a minor (but essential) expansion to MP4, a well-known and widespread format almost ubiquitous even at that time, so broad support of fMP4 naturally came along shortly after.

All the difference between a regular MP4 and fMP4 is in the arrangement of elements describing the video and audio streams. In a conventional MP4 file, such elements are located in the end of the file, but in fMP4 they are put in the beginning of the file. As MP4 could natively contain multiple streams divided into separate data chunks, such a simple change made the file “infinite” for the player.

This very fMP4 property is used for live streaming. First, the player reads the description of the video or audio stream and then starts to wait for the data and display them (play back) as they arrive. If you generate your footage chunks on the fly, the player will be able to automatically play back a real-time stream at no additional effort.

And it does work! Of course, certain issues may arise at fMP4 generator implementation. To discuss them, let’s take an example, our DemoFMP4 app.

fMP4 Live Streaming on iOS

This is a demo app which takes the frames from the camera and sends them in the fMP4 format to the connected viewers. To connect to the device viewers have to request a virtual “MP4” file automatically generated at the following address: http://<ip-address>:7000/index.mp4.
For this the app launches a lightweight GCDWebServer which listens to port 7000 and serves requests to download index.mp4.

Here you may encounter several issues.

  1. First of all, video cameras output “raw” frames that cannot be transparently sent to the player: you should compress and format them properly first. Fortunately enough, starting from iOS 8.0, Apple opened software-based access to hardware video compression capable to generate H.264 chunks on the fly. For this purpose, VTCompressionSessionCreate and VTCompressionSessionEncodeFrame families of functions from the VideoToolbox framework, are used.
    1. Audio is compressed in a similar way: by the AudioConverterNewSpecific / AudioConverterFillComplexBuffer functions of AudoToolbox. This results in data chunks in the AAC format.
  2. Second, the camera outputs frames at a comparatively high rate. So we do not lose them, we keep the frames in the CBCircularData ring buffers sending the frames to compression as they are filled. Such ring buffers are also used to generate a chunked response, so the app is not keeping more than a preset number of frames in its memory (otherwise an infinite broadcast would require infinite memory to be allocated).
  3. Third, for proper operation of fMP4 you should correctly set the initial stream data (including Sps/Pps) in the MP4 file’s moov block. To do this, the app seeks through the H.264 blocks generated by the hardware encoder, finds the next key frame and fetches the value of Sps/Pps. Then, when generating the moov header, it uses them to properly time the stream in the player. Hence, from the player perspective the file is always shown “from the very beginning”.
  4. There is another problem: MP4 has its own data requirements, and can include properly formatted H.264 blocks only. We have solved this issue by attaching a wonderful library called Bento4 that helps you to repack the H.264 blocks into properly formatted MP4 atoms on the fly.

flowchart-livestreaming-en
So we have made an app capable to deliver the stream almost latency-free from a device camera in real-time by sending an “infinite MP4 file” to any standard HTTP client. This approach results in a fairly small latency of 1-2 seconds. In view of broad support of fMP4 and ease of organizing such a broadcast (no ad-hoc server is needed), such client based generation of fMP4 is a simple and reliable solution.

True Real-Time Live Streaming

But what if we need “true real-time”, like in Skype, for example. Unfortunately, fMP4 is not the best fit for this. Although there is no master index file here (like in HLS) and video stream chunks are small, there are still chunks within the MP4 file. So, until such a chunk has been downloaded completely, the client would not be able to see the frames, so a light latency may emerge.

RTSP Live Streaming on iOS

Hence for true real-time meaning that the player gets a frame almost immediately after it has been generated by the camera, another format natively designed for streaming is more suited. We mean RTSP, which has also been well-supported among the players (for instance, it is easily played back by the well-known VLC player), and it is relatively easy to implement. To explain, let’s look at our sample app for RTSP-based streaming, DemoRTSP.

Unlike HLS and fMP4 which exchange data via HTTP, RTSP originally uses its own format on top of “bare sockets.” Also, RTSP uses two data channels, i.e., a control channel to exchange the control data between the client and server, and delivery channel to stream only the compressed data from the server. This slightly complicates the exchange model, but minimizes latency as the client automatically receives data as soon as they are sent by the server over the network. There is simply no intermediary in RTSP!

DemoRTSP constitutes a minimum set to implement such exchange. At launch time, the app starts to listen the service port (554) for connects from client players. On connect, DemoRTSP sends a response containing a simple line indicating a codec to be used to compress video and audio (for iOS this is a standard H.264/AAC pair) and the data port to deliver compressed frames from the server. Then the client connects to the “data port”, and plays back everything it receives from the server. In this model, the server never waits for anything and all the compressed frames are immediately sent to the player, ensuring the minimum latency.

It must be said that on the modern iOS devices compression and networking are not so resource-consuming. Therefore, apart from a simple streaming, additional stream handling before transmission is possible. For example, it is easy to overlay text, use video effects, or otherwise automatically change video as the app needs. For this purpose, both post-processing of the camera buffer or a more efficient OpenGL based approach can be used. Let us discuss this in more detail.

Live Video Effects on iOS

In our DemoRTSP app we have used a simple approach based on PBVision, but in your apps you can use more advanced solutions based on GPUImage allowing to apply to video stream a chain of OpenGL effects and bring latency almost to zero.

In DemoRTSP you can find an example of blur overlay on the stream.
Who has said Prisma-like video generation on the fly is impossible?!

You can go even further, and use the “quickest” way of image handling currently available in iOS – Metal, which came as a replacement to OpenGL (in the recent iOS versions). For instance, in the MetalVideoCapture sample you can see how to use the CVMetalTextureCache API to pass the camera capture to Metal Render Pass.

Besides Metal, the latest iOS versions came with a few interesting features. One of them is the ReplayKit where you can stream the device’s screen with just a couple of lines of code. This platform is pretty fascinating, but at the moment it does not allow to “meddle” with frame creation, cannot capture from the camera (only from the screen!) and is limited to built-in capturing features. Currently, this framework is not intended for full-scale controlled streaming, more resembling something “reserved for future use.” The same applies to another new popular development, support of the new Swift programming language in iOS. Unfortunately, it is not well-integrable with the low-level features needed for video and data handling, so it is unlikely it would be applicable to such tasks anytime soon, except for the interface.

Links

  1. iOS Live Streaming
  2. MP4/fMP4
  3. HLS vs MPEG-DASH
  4. Metal app
  5. ReplayKit
  6. DemoFMP4 and DemoRTSP
  7. VideoToolbox
  8. AudioToolbox
  9. Bento4
  10. GCDWebServer
  11. PBJVision

How to Optimize Waveform Rendering in iOS

In the article we are going to share our experience of finding the optimal way to visualize a waveform in an iOS app, choosing between CAShapeLayer and drawRect(), and certain nuances of using Swift. The publication is targeted at those who create complex custom UI components for sound processing, but also any iOS developers wishing to broaden their horizons. Continue reading

Faster Than Real Time Stabilization of Smartphone Videos

Casual videos shot with a smartphone can often suffer from shakiness. ‎Software-based video stabilization methods have been actively developing over the ‎last years and can make such videos look much more professional and fluid. We ‎have just released Deshake, our video stabilizer app for iOS that can enable faster ‎than real-time video processing. In this article, we are going to ‎overview the video stabilization techniques available and discuss some of ‎engineering challenges we have faced during the development.‎ Continue reading

One app story

For quite a long time I was sure that most of my University skills could hardly be used in my business. Just try to think of any use for military- KGB-ideology kind of stuff, taught for 5 years, mixed with development courses in rare devices. No, I didn’t think so either.

I like to think though, that the first part has helped me to deal with complex situations. Next to that, the ideological stuff put me in a lot of very funny situations and they say laugh extends one’s lifetime span, so… The surprise was that my knowledge of development of very specific devices became very helpful. Even better; that knowledge led to a new app creation. Continue reading

Mobile Apps UX optimisation with Google Analytics

Probably one of the most popular terms in mobile business is MVP. Numerous evangelists, advocates, consultants and mentors work with startups and help them apply the Lean Startup ethos. They push them towards building an MVP and going further with incremental updates. The main goal of this activity is to find a product market match and understand financial characteristics. To put things simply, these are preliminary steps to build a real business plan and scale the business. If you scale an unprofitable business model, then you scale losses. There are a lot of materials on this topic, but there is not so much information on how to actually do that, or how to get actionable data. This is not a problem, though. I’ll show you how to make it using Google Analytics, which is both simple and free. To make it more fun and more interesting I’ll use PhotoSuerte app as a case study. Continue reading

Data Syncing in Core Data Based iOS Apps


the-cloud-rainbow-unicorn

In this post, we would like to discuss possible issues that you may face while implementing data syncing in iOS applications, and how you can tackle such issues with various approaches and tools. We are going to approach the issue from different angles, trying to make a generic overview of the topic. Still the main focus for us are applications that have a data model that is sophisticated enough to need a structured database and the Core Data framework.

Continue reading

How to Use UIDynamicAnimator in iOS7 Apps

With the release of iOS 7 developers got empowered by many new tools to make fascinating effects and animations in their apps. However applicability of many of them is still in question. As soon as we made ourselves familiar with them, in our new project we decided to make a new animation, similar to that used by Apple in Messages. Continue reading

How to configure Jenkins CI on Mac OS X to build Android- and iOS Phonegap/Cordova apps and deliver them to TestFlight/HockeyApp

In most cases, mobile apps development is a very exciting experience, particularly if you really enjoy what you do. As we are involved with a substantial number of external projects (mostly custom application and website development), with our native applications also rapidly growing in numbers, we took to minimize the time and effort we spend to build our releases and pre-releases. If you have multiple developers, testers, and lots of projects, the cost to build apps may become substantial. Therefore, to avoid time wastage and focus on really important service-oriented effort, we propose a guide on how to create a system to automatically build your apps. This approach we practice in our apps which we offer to your attention: Together, PhotoSuerte, Veranda. Continue reading

Node.JS & MongoDB: New Apps Prototyping Highway

nodejs

If you have ever started a new mobile app project odds are that you thought about a working prototype right after preliminary mockups were ready. In the context of Lean Startup methodology with its Minimal Viable Product (MVP) concept, it would in no way sound an exageration that making a prototype even before the final design decisions are vital to your project success. Previously, we all had an excuse that making a prototype takes lots of effort. But that’s not true for most of the mobile app projects today. With a prototype you can collect valuable feedback much earlier, making much more conscious decisions in design and app dynamics.

In this post I’m going to describe the usual practice of our projects. We utilize Node.JS and MongoDB to prepare prototypes for all new apps we develop. This involves no significant back end development efforts and gives you a ready to use service in several weeks. In fact, usually you get a scalable architecture as well. So, let’s get deeper into demonstrating a real case study from PhotoSuerte app. It is a photo-messaging app for iPhone that connects users at random. So, let’s not waste your time. Read this post and make a prototype for your next big app! Continue reading

New Photo-Messaging App “PhotoSuerte” Connects Users Around the World

PhotoSuerte-Map
December 2, 2013, Moscow, Russia – in the world where photo chat is one of the most popular and exciting forms of communication, a new app gives users a glimpse into another part of the world.

DENIVIP Group, a Moscow-based app development company, has just released their latest creation, PhotoSuerte. PhotoSuerte is a photo-messaging app for iPhone that connects users at random, allowing them to experience different parts of the world through the eyes of people who live there. PhotoSuerte gives users an exclusive look at another country and share with others their passion for photography and creative expression. Continue reading