Video support for Loom: We’re tackling it, and here’s how…
At Loom we’ve been hard at work on one of the most exciting features we’ve ever released: video support. Streaming videos on all your devices, directly from where they’re stored in the cloud means all your videos at hand, without taking any local storage space.
But we’re not just excited about the feature itself- we’re excited about how we’re executing it. Why? Because we’re not just implementing video support, we’re doing it in the most user-friendly way that is currently possible. In other words, we don’t want Loom to just support video streaming; we want it to be damn good at it.
We’re offering this story not just for our curious users, but also for any other startups that are considering going through these hurdles too.
First, a few background details…
Video and movie files are becoming an increasingly important form of media. YouTube has revolutionized the consumption of videos and paved a way for anyone to easily broadcast their videos to a massive viewership. And more recently, short, quickly made video clips have become incredibly popular- thanks to the likes ofSnapchat, Vine and Instagram. The invention of camera phones gave consumers’ the ability to shoot these videos anywhere. Naturally, users want to access and view these videos on anywhere, anytime. We recognized this need and wanted to solve it. But we learned very quickly here at Loom that how video support is actually accomplished, to put it briefly, is a whole other story.
Currently, a company can stream videos by either encoding live “on-the-fly” when video is being requested or they can encode every video after it’s been uploaded. And they can either do the encoding themselves, or use third-party services to outsource the process.
At Loom, we’ve decided to build it ourselves. Coming to this decision wasn’t easy, and we actually attempted to outsource our transcoding first. But in the end, we sort of created our own solution. There will be more about that later. Let’s first look at the obstacles we faced, and overcame.
Up until about 5 months ago, it would have been impossible (or at least incredibly complex) to build your own software for handling personal videos. Public encoding software was immature and support for streaming requirements simply didn’t exist yet. This was a stumbling block because there are several requirements that need to be met before you can play videos on a device. For instance, to stream video on iOS, Apple requires a specific Apple protocol for downstreaming video called HTTP Live Streaming (HLS). To meet this requirement and others like it, can be daunting to say the least. And that is just the beginning. Handling the entire processing of videos stored in the cloud is far from easy. Video files need to be prepped and formatted correctly before they can be easily downstreamed. There are several fairly significant challenges.
Challenge One: On-demand and real-time live transcoding and segmenting of incoming videos
Currently, server technology that follows HLS protocol involves too much latency to support a quick live transcoding and downstreaming option for video files. HLS protocol was built to be served by normal webservers, not video streaming servers. On normal webservers, HLS is usually easy to implement and scale, because these servers normally have the infrastructure necessary. But it serves videos in segments of a certain length, and, if you follow Apple’s recommendations, you can experience as long as 30 seconds of latency – an unrealistic amount of time for a user to wait for a video after pressing ‘play’. To be fair, you could shave off a few seconds from the recommendations, but it’s still not getting you near an optimal speed. And no matter how fast the Internet will be in the near future, it’s simply inefficient to interact with original video files in the cloud, transcoding them on the fly. Even if server technology advances, it will still create a degree of needless lag for the user.
Challenge Two: Finding out the right bandwidth stream versions
Finding the right bandwidth stream versions was our second challenge. In many cases, the video quality created by a device is too high for it to stream back on the device, and is sometimes higher than the optimal resolution for the size of the screen. The iPhone 5, for example, records 1080p HD videos, and it’s difficult to send that much data through the Internet for most viewing purposes. Connection quality and bandwidth capabilities are too unreliable or insufficient- especially on mobile devices.
Apple recommends that developers have 6 to 7 compressed versions available for video streaming, with each version increasing data at a factor of 1.5 for a gradual gain in quality. Essentially, versions need to be prepared for the slowest devices with slow connections, with increasingly better quality versions as the device quality and connection quality improves.
Challenge three: balancing cost of storage of video versions, segment processing, and viewing experience
This is the real conundrum – balancing cost and quality. Although it would be ideal to have 6 compressed versions of video readily available for downstreaming, this gets expensive. While the price of cloud storage is most certainly a big part of that equation, in this case, encoding and decoding each video carries an even weightier cost. To further complicate the situation, some videos created and stored in a user’s library will be accessed and viewed relatively regularly. Others might be viewed infrequently, if at all. So it seemed to us that it might be an unnecessary cost to segment process 6 versions of every original video file, at least until we had a better idea of the usage rates of the videos. However, at the same time, it’s utterly important that right off the bat there are enough versions to create a good experience for the user- whichever video they choose.
We looked around for third-party encoding providers and quickly tested most of them. We decided against using a third party for two reasons. Firstly, it seemed that we could do better cost wise. Secondly, we didn’t want to rely on a system and infrastructure over which we had no immediate access to or control. We wanted to improve on the inherent latencies and we had no real, hands on ability to do so. There are maybe only a handful of companies out there right now doing what we’re trying to do. That is, enable a user to carry and stream all their videos from any of their devices quickly and efficiently. We couldn’t find quite the right fit. As a result, we preferred to continue seeking another solution. We elected to explore open source software, to see if a solution might emerge.
We decided to try our hand at using FFmpeg. This is an open source software for multimedia data (which is also used by some third party companies), that provides real-time transcoding, enabling video downstreaming. This allowed us to encode the multiple versions ourselves and have control over lag time and cost.
The result was our decision to encode every video as it goes into the cloud. It definitely costs more to do it this way. As we mentioned, it’s the processing of each video that carries the real cost. But this method provides a faster viewing experience than encoding on the fly. Users can skip ahead or backward very quickly and downloading starts within seconds. Our top priority is a good user experience. We want people to be able to fast-forward, jump and skip through their videos. We want it to start as fast as possible when they press ‘play.’ And if you want that, you have to transcode the video in full before the user presses the play button.
From these smaller, compressed versions, we create a ‘playlist’ with multiple streams for each video. Following HLS protocol, when a user requests a video to play from their device, we start by streaming the lowest quality version first, so the video can start instantly. But in the background, the high quality versions buffer, moving up the playlist, switching to an optimal version depending on the Internet speed being used and the screen resolution of the device in use. As connectivity improves, so does the quality.
Streams for most common use cases:
- Low bandwidth (approx. 68kbit/s)
- 3G, Edge (320×240, approx. 320kbit/s)
- Slow Wifi/4G (640×480, approx. 750kbit/s)
- Fast Wifi/LTE (1280×720, approx. 1800kbit/s)
If this sounds familiar, you’re right. Services like Netflix use the same approach. And the end result is similar to a simple, personal and private version of Netflix -optimized for the best possible mobile user experience.
Why this was our best option…
It wasn’t easy, but once we realized this was the best solution, we knew we owed it to our users to deliver. It’s harder work, but more cost effective overall, and the end result is a better user experience. It’s also better for quality control. We do not rely on a third-party server to deliver for us. And most importantly, this method doesn’t require any work on the part of the user. They get fast, reliable streaming of their videos in the palm of their hands, without any hassles. Under normal to good conditions, streaming on Loom starts in under 2 seconds and scrubbing works instantly for a video of any length.
What the future holds….
In the future, we can explore ways to increase video streaming quality or become even more cost efficient. Perhaps intelligently limiting the encoding of particular videos. For example, encoding the first half of videos, and then beginning to encode the second half when users hit play. Or analyzing user patterns to predict which videos to encode; based on things like length, type, creation date, scenes or recent user behavior. Of course there is always the possibility of reducing cost by switching server providers, or renting storage space in larger volume.
But the point is we can explore ways to become better as we go. What continues to be our most important goal is providing an excellent user experience immediately, and maintaining that excellence moving forward. We’re excited for the future and hope to only continue improving – always putting our users first. Thanks for reading!