Guide to becoming a media server professional

Media server pro guide2-2

Are you thinking about becoming a professional media server operator or user, but aren't sure which skills you need? I asked this as an open question to a Facebook group of Media Server Professionals, and the response was overwhelming. This is the second in a series of blogs inspired by their valuable recommendations on traits, tools, media server approach, content and technologies to know, with my own experience thrown into the pot, too. Keep an eye out for a full-blown eBook on the subject later on!

“Dear fellow Media Server Professionals, I am writing an article about things you should know or learn as a fresh media server operator. Like a small best-practice guide, both on soft skills and hardware to know.

I would highly appreciate your input on this. What would an experienced professional like yourself say to a newbie?”

This second blog looks at media server philosophy and hones in on audio and video. Let's get going…

- Media server philosophy
- Media server content
- Codecs & containers
- Audio
- Uncompressed audio
- Lossless compression of audio
- Lossy compression of audio
- Audio bitrate
- Video
- Video formats (codecs)
- Most commonly used video codecs in media servers
- Video bitrates
- Video frame rates

Media Server philosophy

This is perhaps one of the most intriguing parts of this topic, as it touches on the foundation of your (future) career as a media server operator/professional. As I wrote in the previous post, you need to consider the differences between the servers in the market. While some are built with specific purposes and associated features, other are more generalist tools.

Some media server systems can be bought as software only, others only as hardware servers, or a combination. Some systems work on Mac while others on Windows.

One system might be perfectly tailored to handle live events, one for projection mapping, another for integration with lighting systems, high-end uncompressed video playback, pre-visualization, AR and VR. Other systems might not be best-in-class in a specific application, but can be best-in-class overall.

In addition, you need to look at the market size and availability for the different media servers. Ask yourself: “Will I get more work by opting for system A versus system B?” A newly released media server solution will quite naturally have a smaller install base, and not as much deployment with rental houses, compared to a media server system that has been in the market “forever”.

Perhaps a smart choice would be to go for a well-established system and, later on, broaden your skills with a nifty up-and-coming system with an underdog attitude.

Media Server Content

Content is what your media server plays back (sound, video, images) and there is a plethora of file formats in the market. Let’s start by having a look at codecs and containers in general, while noting the difference.

Codecs & Containers

A container is what most of us talk about when it comes to video formats. The container defines the structure and content of the file, which can be audio, video and other metadata such as subtitles, or menus and menu structure. QuickTime and MP4 are examples of containers.

A codec on the other hand, is how a specific stream (video) is encoded for later playback (when it is decoded). The codec is the algorithm used to compress the media file. The purpose of the encoding can be to compress the stream and reduce file size, or to make the video playable on devices other than the one on which it was created (format conversion). Examples of codecs are HAP (video), H.264 (video) and MP3 (audio).

A container can support many different codecs.

If you want to dig deeper into codecs and containers, read our previous blog posts on the topic:

Or have a peak at the Wikipedia pages for codecs and containers – there’s a lot to read.

Audio

Audio is an integral part of any event or installation. And understanding audio is, of course, a necessary set of skills for any media server professional or operator.

In brief, you could say there are three major format types of audio: uncompressed, compressed with lossy compression, and compressed audio formats with lossless compression.

Uncompressed audio

In this case, the audio is completely un-altered from its original state. Uncompressed files are quite large and will take up a lot of disk space. The most common uncompressed audio formats are WAV, AIFF, AU, PCM or BWF – most media servers seem to have a preference for WAV files.

Lossless compression of audio

Here the audio files are compressed, but without losing any information in the file (you can reverse it to an original uncompressed file from the compressed version). The benefit of lossless compression is that sound files are reduced in size. The most common formats: FLAC, WavPack, ALAC and Monkey’s Audi.

Lossy compression of audio

For the smallest files, you need to use lossy compression. This is achieved by removing some of the audio information and simplifying the data in the files. The files can end up very small, but the price is lower audio quality (depending on the level of compression).

More details on these three types of audio may be found here:
Common audio formats: Which one to use

Audio bitrate

Audio bitrate is measured in “kilobits per second” (Kbps) and is the number of data (bits) encoded or decoded per second. Higher bitrates produce better audio quality, but also require higher bandwidth and use more storage space. With that said, I have never heard of a project where the size of the audio files was the show stopper (video file sizes on the other hand… )

If you want to dig further into digital audio, here are a few links:

Video

No visuals – no fun (as you can read in this blog: Video has never been better – make it work for you). Video and image playback are core features in any media server, and of course an area you need to dive into. As seen earlier, both video and images/still photos have their own codecs and formats with a lot of differences and plenty to learn. Let’s start with video, then images, before we take it to the next level with 3D files…

Video Formats (codecs)

Your media server will most certainly support a wide range of video codecs. Each of the codecs will have its particular strengths and weaknesses. You will find people who absolutely love Apple ProRes and you will find people who hate Apple ProRes with a passion. It’s one thing to have your own personal favorite and workflow preferences, but another to know how well (or not) your media server supports the different codecs.

Some media server manufacturers will list the videos they support, alongside data on how many layers of videos it is possible to play simultaneously in different resolutions, etc. As always, marketing data can be different from real-life data. Rely on what you are capable of getting the server to perform as opposed to what can be achieved in a theoretical setting.

Lastly, some media server manufacturers have developed their own (preferred) codec for their systems. Make sure you have support for these special codecs in your workflow (content production) to avoid potential bottlenecks.

Most commonly used video codecs in media servers

Some of the most widely used video formats in different media servers are MPEG, H.264, HAP, Apple ProRes and QuickTime Animation . Most of these codecs have different settings/parameters, to adjust the level of compression/video quality. Let’s take a more in-depth look at one of them, the HAP family.

HAP

The HAP family of codecs consists of four different codecs: HAP, HAP Q, HAP Alpha and HAP Q Alpha. From the developers of HAP, we can see the following differences between these four flavors of HAP:

HAP has the lowest data-rate and reasonable image quality.
HAP Alpha has the same image quality as HAP, and supports an alpha channel.
HAP Q has improved image quality, at the expense of larger file sizes.
HAP Q Alpha has improved image quality and an alpha channel, at the expense of larger file sizes.

The HAP website explains: “Some encoders allow for encoding with an optional specified 'chunk' size to optimize for ultra-high resolution video on a particular hardware system. This setting should typically only be used if you are reaching a CPU performance bottleneck during playback. As a general guide, for HD footage or smaller you can set the chunk size to 1 and for 4k or larger footage the number of chunks should never exceed the number of CPU cores on the computer used for playback.”

They finish off with a recommendation to consult the documentation of your media server to see how HAP should be encoded before importing it to the media server, or even better – perhaps the media server can do the conversion for you.

HAP used to be supported natively by Adobe, but after an upgrade of the Adobe software they no longer supported some 32-bit components and native HAP encoding was lost.

Apple ProRes

ProRes is a codec developed by Apple. In their marketing they write “Apple ProRes codecs provide an unparalleled combination of multistream, real-time editing performance, impressive image quality, and reduced storage rates. Apple ProRes codecs take full advantage of multicore processing and feature fast, reduced-resolution decoding modes.”

The ProRes family has the following versions, and you can read more about all of them at the Apple Support page for ProRes:

Apple ProRes 4:2:2 Proxy
Apple ProRes 4:2:2 LT
Apple ProRes 4:2:2 SD
Apple ProRes 4:2:2 HQ
Apple ProRes 4:4:4:4
Apple ProRes 4:4:4:4 XQ

In addition, Apple has announced ProRes RAW and ProRes RAW HQ, giving you more control of the processing of the content.

Video Bitrates

Understanding how video bitrate is related to video quality and file size is important and follows the same principles as audio bitrates. Video bitrates do not need to be constant (CBR – constant bitrate) for the entire video as some codecs support variable bitrate (VBR).

Like audio, the video bitrate is the number of data (bits) that are processed in a specific time, and is measured in kilobits (Kbps), megabits (mbps) or gigabits (gbps) per second. With the exception of some global players we are not yet at the terabyte (Tbps) level yet.

(Just as an aside, Akamai, a company that is a Content Delivery Network (CDN) set a record on December 11^th 2018 with data volumes higher than 72 (!) Tbps. Now THAT is a lot of data!)

If you want to dig further into bitrates, encoding.com has an interesting article.

Video Frame Rates

The frame rate/frames per second (fps) is the number that tells you how often consecutive images (frames) are shown on a display. The term is used both in film, video cameras, computer systems and TV sets. For TV and computer monitors, it is more commonly referred to as the frame frequency and in this case, it can be expressed in hertz and not fps.

Wikipedia has a long and interesting article about frame rates, which also covers the historical reason behind the most common frame rates. Another source of information is the Techsmith site and their “Frame Rate: A beginner’s guide.”

Short version here: The most commonly used frame rates are 50 and 60 Hz. These numbers came from the mains frequency of the electric grids when analog TV broadcast was developed: 60 Hz is the standard in US and Canada and 50 Hz in most of the world.

Film was traditionally shot in 24 fps and to convert that to 50 or 60 you need to play several frames over again (this process is called pulldown). To convert 24 frames per second into 60 frames per second, every odd frame is repeated, playing twice; every even frame is tripled.

Today most displays are at 60, 120, 240, 300 (or even higher) frames per second/hertz, numbers that allow frames to be evenly multiplied for the most common frame rates such as 24,25 or 30 fps video. Most new cameras shoot in far higher fps than 24.

Obviously, the number of frames significantly affects the required space and read-speed of the storage systems. Which is quite natural – if you go from 25 to 50 you double-up.

But (there is always a but, isn’t there?) why do you see frame rates such as 29.97 or 59.94? That is because 30 or 60 fps is an approximation of the real frame rate number in the US, as defined in the NTSC standard. With black and white TV, the numbers were 30 and 60, but when color was introduced and added to the TV signals an issue became apparent: the color carrier signal was in phase with the sound signal, ruining the image quality. A quick fix was to reduce the frame rate with 0.03 fps which did not make the two signals phase with each other.

This “quick” fix means we are still stuck with these exact fps numbers. This has an impact on frames and frame drops which will be covered later.

As a media server operator, you will need to learn to live with the fact that not all content you receive for your event or installation will have the exact same frame rates. And you need to learn how to handle multiple frame rate videos in a single system.

Conclusion (so far)

This was just the start of the technical part of our guide to becoming a media server professional. In the next blog, we will cover video bit depth (color depth), chroma subsampling, video data rates and storage/bandwidth calculations, frame drops, frame accuracy and more.

I also want to give my warmest regards to the awesome people who commented on my original post and inspired these articles. Without your input, this would not have happened! Thank you very much to:

Patrick Campbell, Ian McClain, Ola Fredenlund, Matt Ardine, Marek Papke, Eric Gazzillo, Axel Sundbotten, Joe Bleasdale, Parker Langvardt, Alex Mysterio Mueller, Christopher John Bolton, Andy Bates, David Gillett, Charlie Cooper, Tom Bass, Fred Lang, Nhoj Yelnif, Hugh Davies-Webb, Marcus Bayer, Arran Vj-Air, Manny Conde , Joel Adria, Alex Oliszewski, Ruben Laine, Jan Huewel, Majid Younis, Ernst Ziller, Marco Pastuovic, Geoffrey Platt, Ted Pallas, Dale Rehbein, Michael Kohler, Joe Dunkley, John Bulver, Jack Banks, Stuart McGowan, Todd Neville Scrutchfield

Guide to becoming a media server professional – Part 2: The tech