How digital television works
intended as a general overview of the technical aspects of digital television
distribution. It covers satellite, terrestrial, cable and internet
distribution of audiovisual information.
To understand the system as a whole, there are quite a few concepts to cover, and that requires the use of technical terms: these will be explained as straightforwardly as possible. These fall into some basic categories:
- digital representation
- analogue to digital conversion
- data compression
- transmission and error correction
- reception and storage
Concept 1: digital representationDigital, at the most basic level, is the use of just two values. Either the value is 0 or 1. This is unlike most real-world things that can take a wide range of values.
For many millennia human civilisation was quite able to get along using ten digits, no doubt inspired by the collection of fingers and thumbs found to hand. The Romans used letters for numbers (III for three, VII for seven), the ancient Egyptians sections of an eye-symbol. It was only several hundred years ago that 'nothing' gained the familiar ring symbol. This naught gave the ability of just the ten basic symbols to represent any number by using the position to denote tens, hundreds and so on.
For humans, there is little to be gained by using binary. Although it is quite easy to understand, it is of little everyday use. However, the concept provides computers with their awesome calculation and storage powers.
Just as the number 23 actually means 'two lots of ten' plus 'three lots of one', and 15 means 'one lot of ten' plus 'five lots of one', in binary each column makes lots of (from right to left) 1, 2, 4, 8, 16, 32, 64. Each column value being twice that to the right, so:
23=16+4+2+1, in binary 10111
15=8+4+2+1, in binary 01111
This would be of limited interest, but making number so simple allows very powerful arrays of transistors to process the numbers. In the earliest days this was just four bits at a time (0 to 15), then eight bits (0 to 255), later sixteen bits (0 to 65,535), then 32 bits (0 to 4,294,967,295) and now 64 bits (0 to 18,446,744,073,709,551,615).
The representation of data in binary form is therefore desirable as it allows high power, reliable, computers to perform actions that are truly impossible otherwise. This is because, it turns out, it is much more practicable and cost effective to make something very simple run very fast.
More than just counting numbers can be stored using binary digits: they can be used for other kinds of data. In the 'ASCII' standard, the capital letter A is stored as 01000001.
Concept 2: analogue to digital conversionThe above examples have all used positive whole numbers (known as integers), but the real world is not always like that. Whilst there are plenty of things we can count (sheep, beans, lamb chops, tins of beans) there are many that we cannot: temperature, distance, weight or brightness.
If you got a group of people together and measured their heights you would find two things. First that you would have a wide range of values, and secondly that none of them would be exactly a whole number, even if you measured in, say millimetres. The latter factor would be down to two elements: how carefully you worked out the value and how accurate your measuring equipment is.
You might decide to write each value down in millimetres, rounding up or down using a laser measure. Making this kind of decision turns the analogue values anyone can be any height into counting values. This process is known as quantization.
The process of turning an analogue values is at the heart of the first process used for digital audiovisual processing: analogue to digital conversion (ADC).
The next element to add is time. By setting a fast and accurate timer, we can use the ADC process to produce a stream of values. A simple form of this takes a mono sound signal and, 44,000 times a second, makes a value from the current signal level.
By storing this data and then using a reverse process (DAC) the original sound is recreated, almost perfectly. If you have ever listened to a compact disc (CD), you will be familiar with how well this system works.
There are limitations only frequencies up to half of the 'sample rate' can be coded this way.
Encoding by time ('temporal encoding') is not the only option. A digital picture is also using quantized values to represent the picture elements (pixels) that were analogue in the real world. In this digital system the values represent red, green and blue levels in a matrix.
It is also possible, therefore, to digitize a moving image too. This involves taking 'samples' of a 'digital still' many times a second. This is usually 24 (for movies), 25 (UK and the EU) or 30 (USA) times a second.
Concept 3: data compressionHowever, this generates an awful lot of data: a standard definition television picture (720x576) at 25 frames per second (25fps) with 24 bits per pixel (that is 8 bits per colour), plus the stereo audio generates:
(720x576x24x25)+(44000x16x2) bits per second. 248832000+1408000=250240000 bits per second
By convention, we call 1024 bits one kilobit, and 1024 kilobits one megabit. Using this example we can see that we would need to transfer 238.6 megabits per second for a digital TV picture. As this is about thirty times the fastest broadband connection: this is an impracticable amount of data.
To save space, we need to compress this data. There are two forms of data compression: lossless and lossy.
Lossless compression takes the original data and applies one or more systems of mathematical analysis to it and (hopefully) spits out less data that can be then stored. If that stored data is put through the reverse process, the exact original data is re-created, bit for bit.
This principle is used by file format such as ZIP, RAR, and SIT that are used to transfer big files between desktop computers.
However, there is a small down-side to this type of compression: it is impossible to guarantee the level of compression achieved it all depends on the source data. Sometimes you may get a almost no data output, and sometimes you get as much as you started with. However you can attempt to compress and decompress any type of data using lossless compression, the program algorithms do not need to know anything about what the data represents.
If the data is to be broadcast (or, say, streamed on-line) then there is a need to ensure that the amount of data is always reduced, so the compressed data can be transmitted in real time using the available bandwidth.
This calls for the use for the second type of data compression, called 'lossy' compression.
Lossy compression techniques are not general-purpose. They rely on knowing two things the form of the data that is represented and a little about the target device for the data: human beings.
For example, the retina of the human eye has 'rods' and 'cones' packed together. The 'cones', located in the centre allow us to perceive three colours: red, green and blue. The 'rods' are away from the centre and react accurately to many light levels, but only in monochrome. The human brain takes the monochrome, red, green and blue elements and combines them into full-colour pictures.
Knowing this about the human eye provides the simplest form of lossy compression. The original image is converted from Red, Green, Blue format into three corresponding values: the hue, saturation and lightness. The first is the colour, the second the amount of that colour and the final the brightness.
This means that we can now dispose of some of this data because we humans will still perceive that the image is the same as demonstrates:
The next stage is to take the three image components (hue, saturation and lightness) and break them down into chess-boards. From our original image we will have:
720x576 → 90 x 72 = 6,480 chessboards x 1
360x288 → 45 x 36 = 1,620 chessboards x 2 = 3,240
Each of these 9,720 chessboards is an 8x8 matrix of values, ready for compression. There are several stages:
- first the 'average' value for the whole chessboard is calculated
- next each value on the board is recalculated by subtracting it from the average value
- then each of these new values (which could be positive or negative) are divided by a 'compression factor'.
- Then the values are read from the chessboard in a special zig-zag pattern
- Finally the zig-zag values are then 'run length encoded'. Because many of the values from the zig-zag 'walk' will be zeros, this achieves good data compression.
When this data is eventually used to recreate the image, the higher the compression factor the less detail there will be in the recreated image. A very large factor could result in just a single chessboard with the just the 'average value' in each square. A low factor will have almost all the original detail.
However, it is awkward to compute the compression factor value: a fixed amount of output data is needed for transmission. Too much data would not fit in the capacity for broadcast, but too little data would result in a first a blurry and then blocky image.
Concept 4: Temporal compressionThe next compression technique has the marvellous name 'temporal compression'. Under normal circumstances some or all of the one frame of a TV picture will be identical to the previous one. By comparing consecutive frames and identifying those parts that have not changed, the compression system can just bypass these sections. If the picture is mainly static (such as a 'talking head', such as a newsreader) the only data that needs to be transmitted is the small sections that have changed.
The only drawback to such a system is that a frame that is dependent on a previous one cannot be displayed if the previous was received: the viewer does not want to wait for several seconds when 'flicking' between TV channels or for the picture to 'unjam' if there is just a momentary reception break.
There are many situations where a considerable portion of the picture does not change between frames, but moves slightly. This is the final stage of the MPEG2 compression system and the most computationally intense. Having identifying those sections of the picture that have remained static between frames, the encoder has to identify which parts of the image have moved, and where they have moved to.
This is a very complicated task! There is an almost infinite combination of movements that could happen. For example, a camera of a football match may pan horizontally, but a camera following a cricket ball's trajectory has many options.
TV channels can have scrolling graphics, fades and wipes; material can wobble or shake. Objects can move around the screen like a tennis ball. And this can all happen at the same time.
The better the encoding software is, and the more powerful the hardware the more motion can be detected. The better the detection is the less data capacity is required to describe the moving image and the more can be allocated to accurately reproducing the detail of those sections that have.
You may wonder how effective this computing is. Using them all in combination will reduce the initial 238Mb/s (megabits per second) to as low as 2Mb/s, with higher quality results at 5Mb/s - a compression ration of from 1:50 to 1:120!
Concept 5: Statistical multiplexing and opportunistic dataThis effect can be enhanced by using more techniques! On Freeview, for example, each transmission multiplex carries either 18Mb/s or 24Mb/s. By dynamically co-coordinating the 'compression factors' of a number of TV channels together using 'statistical multiplexing' one or two more channels can be fitted onto the multiplex.
And if there is any capacity left at any time, this is allocated to the interactive text services (for example BBCi) as 'opportunistic data'.
Concept 6: Audio compressionBy comparison the audio data compression is simple!
The "MP3" encoding of sound in fact refers to "layer III of MPEG2". This technique uses some mathematical functions called fast Fourier Transforms to convert each small section of sound into a number of component waveforms. When these waveforms are recombined, the original sound can be heard.
The audio compression simply prioritizes the information in the sections of sound that humans can hear, and reduces or removes sound information that cannot be heard. As this changes from sample to sample, the compression routines optimize for each one. This produces a constant stream of bits at a given rate which is included alongside the picture information in the "multiplex" (see below).
Concept 7: The "transport stream"It is worth taking a moment to consider the multiplexing process a little more. As we have seen above, the video and audio are highly processed and result in a stream of bits, and there can be many simultaneous audio and videos to be transmitted together.
The concept of a multiplex has nothing to do with a large cinema, but is a mathematical concept. The actual implementation is quite complex, but the concept is not difficult.
At the "multiplexing" end of the system, there are a number of "data pipes" that have audio, video and other forms of data. The "other forms" can be the "now and next" information, a full Electronic Programme Guide, subtitles or the text and still images for a MHEG-5 system (such as BBCi or Digital Teletext).
The encoder takes a little data from each "data pipe" in turn. This amount of data, called a "packet" is the same size for each incoming stream. Before the packet is sent to be broadcast, it is "addressed" with a number of the identify the data pipe from where it came.
At the receiver, these packets are received in turn. Whilst it is perfectly possible to decode all the original data pipes, this is not normally required as the user will normally only be able to view one video and listen to one audio channel at a time.
This "demultiplexing" process therefore allows most of the data to be discarded by the receiver, with only one selected video, one selected audio and one selected text being used by the rest of the receiver's circuitry.
In practice, the receiver will also demultiplex and store information that comes from a number of special "data pipes" provided by the broadcaster. This will include EPG information, and a directory of the services included in the broadcast.
For example, this includes the Network Information Table (NIT) that lists the names of the channels provided, and the pipe identifiers for the video (VPID) and audio (APID) for each. This type of information is provided on a constant loop as it is required when a tuner is scanning for channels during set-up, and allows for the allocation of the "logical channel numbers" - the numbers you type into the remote control to view the channel.
Channels persist in the NIT when they are off-air, allowing channels that broadcast part time to still be discovered. Radio stations simply have no VPID, with radio and part-time channels relying on an automatically started text service to provide some vision.
Just a final note, the term "statistical multiplexing" refers to the multiplexer. In contrast to "time division multiplexing" where each of the incoming data pipes are processed in a "round robin" fashion, each in turn, the "statistical multiplex" processes each pipe in turn, but allows "extra goes" for those with the most, or most critical data: priority is for video and audio, with the text and EPG services being the least important.
Concept 8: Transmission and error correctionFollowing all the processes above, we have a single data stream. There are three main ways this is broadcast:
- via satellite
- via terrestrial transmitters
- via cable TV
This differs considerably from most digital computer systems, which are usually one-to-one (either client-server or peer-to-peer), bi-directional and (usually) asynchronous. It is for this reason that it has been quite hard to provide TV services on the internet.
To transmit so much data perfectly via satellite, cable and terrestrial means is quite a challenge. Even the most advanced analogue TV with the best connections, dish or aerial will not provide a perfect image 100% of the time.
The digital TV transmission system, COFDM (Coded Orthogonal Frequency Division Multiplexing) assumes that the path between the transmitter and the receiver will be less than perfect, and uses a number of further techniques.
The first is "forward error correction". This is vital because the transmissions are one-way, not allowing the receiver to ask for corrupt data to be resent. The most simple way of providing FEC, is to just broadcast every bit twice. As inefficient as this may sound, this is almost what is actually done. Using a number of mathematical techniques, this can be reduced slightly, and is often sent NEARLY twice. The FEC system used DVB-T, DVB-S and DVB-CS (terrestrial, satellite, cable) is usually quoted as "5/6" or "3/4", meaning the data is sent one and five-sixths times or one and three-quarters times.
Concept 9: COFDMThe next system used is the COFDM itself.
Having added the FEC to the multiplex data, the COFDM transmitter now takes this and splits in into 'sub carriers' which are then carried within the analogue transmission space. The number of sub carriers is 2x2=4 (which gives us Quad Shift Phase Key), 4x4=16 QAM (quadrature amplitude modulation) or 8x8=64 QAM. Newer standards such as ATSC (in the US), DVB-S2 and DVB-T2 also use 16x16=256 QAM.
The more sub carriers that are used, the more data can be carried by the transmission. However, increasing the number of carriers means that they are all "spaced closer together", making them more prone to interfering with each other. In practice, the system used called "phased key shifting" can compensate for the closeness problem by transmitting them at higher power.
To deal with the potential interference, the sub carriers do not all broadcast at once. For much of the time they are unused. The effect of this is that external interference from analogue transmitters, other digital transmitters or anywhere else will cause an error that the FEC encoding can correct. The amount of time each subcategory is not transmitting is called the "guard interval".
Thus, more sub carriers provides more data capacity, as does lowering the guard interval. But doing these reduces the reliability of the service.
Concept 10: Reception and storageThe receiver simply has to do all these processes in reverse, so it:
- decodes the COFDM sub carriers;
- uses the FEC to regenerate the multiplex bit stream
- decodes the required audio, video, text, subtitle, EPG and information data pipes
- decodes the encodes audio back to analogue
- decodes the encoded video back to moving images
- uses the other data for the appropriate service
Once useful feature of this system is that the information decoded from the multiplex can be stored on a local hard disk drive. It can then, at any time later finish the decoding process to be replayed as video and audio.
As storage is the most basic of computer processes (as no computationally complex encoding is needed) the cost of digital video recorders (also known as Personal Video Recorder, PVR) is very low. In addition, as the relevant part of the digital broadcast is stored, replay on these devices is a perfect replay. This compares favourably to analogue recordings on clumsy video tape which are imperfect to start with and decay immediately.
11:58 PM Wakefield
Brian, and I am pleased to find that I am getting uninterrupted tv on all channels in my bedroom, on a Phillips (Pace) DTR220 box fed by an old portable set top aerial. Just a 10 inch loop. Brilliant. You have been a great help during the whole process. Thank you so much.
|link to this|
Ron's: Freeview map terrain plot frequency data R&TI Service digitaluk trade DAB coverage
12:50 PM Ballynahinch
i have great pictures on all freeview channels but can't recieve any on HD . do i plug the tv aerial into the aerial socket or do i need a plug for the HD socket? ps my tv is full HD. also i live in northern ireland and should be able to recieve RTE channels but don't.
|link to this|
william's: Freeview map terrain plot frequency data R&TI Service digitaluk trade DAB coverage