论文部分内容阅读
Multimedia data traffic occupies more than 70% of the Internet traffic and is still growing. On?demand video is already a major video content platform and private broadcast is getting more popular. In addition to this, virtual reality (VR) and augmented reality (AR) data traffic is increasing very fast. To provide the good quality of the multimedia service, huge amount of resource is needed because users’ service experience is usually proportional to the video rates they can receive. Moreover, the variation of the bandwidth also affects to the users’ experience, while more users want to use their mobile devices to see multimedia data by accessing the network through wireless links, such as Long Term Evolution (LTE) and Wi?Fi. Therefore, better spectral efficiency during wireless transmission and video rate adaptation to provide better quality to users are in great demand. Multicast system is one of the technologies that can improve the spectral efficiency drastically, and Dynamic Adaptive Streaming over HTTP (DASH) is one of the most popular video rate adaptation platforms. In this paper, we investigate the state?of?the?art video multicast technologies. LTE supports the multicast service through evolved Multimedia Broadcast Multicast Service (eMBMS) systems, and there are different algorithms to perform the video multicast along with adaptive video quality control. The algorithms include the procedure to decide the video rates, resource allocations, and user groupings. Moreover, we propose a novel approach to improve the quality of experience for DASH?VR video multicast systems.
VR; video multicast; LTE; eMBMS
1 Introduction
he enhanced capabilities of mobile devices and the improved capacities of wireless networks have led to a massive growth in mobile video consumption. A recent report [1] shows that the video traffic occupies more than 70% of the whole Internet traffic in peak time, and moreover, half of the video consumers use mobile devices. Moreover, virtual reality (VR) and augmented reality (AR) applications are getting more popular and users can enjoy diverse experience with them. However, VR/AR applications need more data than conventional video streaming services. As multimedia data traffic is increasing over wireless networks, efficient utilization of the wireless resources is getting more important for serving more users. Moreover, wireless channel condition frequently varies with channel environments and user behaviors. MPEG’s Dynamic Adaptive Streaming over HTTP (MPEG?DASH) [2] is thus proposed as an effective video streaming platform, which enables the adaptive rate selection based on the channel conditions. DASH can provide superior video experience by giving clients a chance to receive the video quality based on their channel condition and buffer status, resulting in better quality of experience (QoE). Most of Internet video service providers, such as Netflix and Youtube, support DASH in their video streaming platforms. DASH is extended for VR video streaming (DASH?VR) and it supports tiled video rate adaptations and reconstruction of VR videos. With the overloaded scenarios, for example, many people watch the popular live videos such as sports events, bandwidth can be easily used up and many people will suffer from delay or low video quality. To overcome the problem, video multicasting can be utilized. LTE allows using their spectrum for multicasting or broadcasting up to 60% of the spectrum and it is standardized as evolved Multimedia Broadcast Multicast Service (eMBMS) [3]. The multicast channel (MCH), that delivers eMBMS data, cannot get any advantage from either Hybrid Automatic Repeat?Request (HARQ) or retransmission since the MCH transfers the data in radio link control (RLC) unacknowledged mode (UM) [3], due to the fact that a single user’s channel condition cannot represent all users’ channel conditions. Besides, it is very inefficient to retransmit many lost packets to user equipment (UE) with poor channel conditions, resulting in further consumption of bandwidth. This situation makes QoE worse because it cannot transmit the appropriate video representations to the subscribed users, resulting in very high packet loss rate or possibility that users cannot get a video with enough quality even when the channel condition is very good. To overcome the problem of DASH multicasting, the File Delivery over Unidirectional Transport (FLUTE) [4] protocol is thus introduced. The Internet Engineering Task Force (IETF) introduces FLUTE for unidirectional data transfer over the Internet. To avoid packet loss, FLUTE adds redundant packets to help the recovery of the lost packet, which is done by forward error correction (FEC) [5]. Moreover, if FEC is not enough to recover all the lost packets, DASH clients can request packet recovery through reliable TCP unicast transmission [6].
Combining FLUTE and eMBMS of LTE makes DASH multicasting possible with capability of adaptive video quality control, however, it introduces more complexity to the systems. Since there are multiple copies of the video with different rates, the system has to choose which video rates to be scheduled based on the channel information and user’s requests. FLUTE sessions have to add redundant APP?layer FEC packets to protect the video data while not losing efficiency. Moreover, resources for each FLUTE session must be allocated in the orthogonal frequency?division multiple access (OFDMA) frames and PHY?layer modulation and coding schemes (MCS) for the chosen resource blocks also need to be selected for reliable communications. Its complexity exponentially increases as number of users and/or number of video increases to find the optimal solution. Moreover, channel condition always changes frequently, therefore, it is more difficult to optimize the whole system in real time. DASH or scalable video coding (SVC)?based multicasting algorithms have been introduced to efficiently solve this problem and give more users better video quality [7]. Park et al. [8] show that the total utility can be improved and more users can watch better video by using DASH multicast over LTE. This algorithm allocates one video representation in one multicast video session; therefore, there is corresponding video quality when we allocate the resource to the video sessions. However, in case of tiled VR videos, the multiple tiles share the resource, and many combinations of tiles with different representations are possibly allocated in a single multicast video session. Therefore, the video quality not only depends on the allocated resource but also on the tile?based rate?selection algorithm. In this paper, a new approach is proposed to allocate the DASH?VR video on LTE eMBMS systems.
The remainder of this paper is organized as follows. Section 2 summarizes the related works. Section 3 introduces the existing DASH mutlicasting systems and algorithms. Section 4 proposes a new approach to perform the DASH?VR multicast systems, followed by the conclusion in Section 5.
2 Related Works
2.1 LTE eMBMS
LTE supports multicasting of video streams by eMBMS [3] system (Fig. 1), and the broadcast multicast service center (BMSC) is responsible for managing multicast sessions. It provides membership, session and transmission, proxy and transport, service announcement, security, and content synchronization. An MBMS gateway (MBMS?GW) distributes the video data to eNBs. It performs session control signaling towards the mobile management entity (MME). Multi?cell/multicast coordination entities (MCEs) are part of eNBs and they provide admission control. They allocate the radio resource for multicast sessions and decides MCS. Multiple video multicasting sessions can thus be created and users can subscribe those sessions at the same time.
The physical layer of an LTE downlink is based on the OFDMA technology, and the basic resource unit in the LTE system is a physical resource block (RB), which has 180 KHz bandwidth with 12 subcarriers and 7 symbols [9]. Within an RB, the same MCS is applied for all subcarriers. Therefore, if we define the MCS of an RB, there is the corresponding number of bits that one RB can carry, which is
[cMCS=12subcarriers×7symbols×efficiency.] (1)
Table 1 shows the MCS along with efficiency [10] for various channel quality indication (CQI) indices. In this paper, we use the CQI index as an MCS index for notational convenience. Using the information in the table, we can find what is an expected data rate when we know how many RBs are allocated to the FLUTE sessions. 2.2 File Delivery over Unidirectional Transport
The FLUTE protocol is proposed by IETF [4] to multicast a file over the networks using User Datagram Protocol (UDP)?based protocols with application?layer forward error correction (AL?FEC) being provided for protecting the file from the packet losses. Additional file repair procedures are allowed by the HTTP file repair request. A file repair response message consists of HTTP header and file repair payload. The file repair response message consists of HTTP header which informs that point?to?multipoint repair, instead of point?to?point repair, is used.
2.3 Application?Layer Forward Error Correction
The radio channel conditions vary among all the users receiving the multicast service. Therefore, the block error rate of the users that receive the video service delivered with a single MCS may have a great variance. In order to increase the robustness and reliability of multicast transmissions, FEC redundancy packets are incorporated at the APP?layer [11].
The solution proposed by 3GPP to deliver video streaming over eMBMS uses the FLUTE protocol with UDP transport to send video segments with the corresponding APP?layer FEC over multicast. An FEC block consists of N packets including K source packets and N?K redundancy packets, resulting in the encoding rate K/N. The FEC decoder can ideally recover the original K source packets from any K out of N received packets with correction capability t=N?K. The Reed Solomon (RS) code [12] is a well?known FEC code which operates on non?binary symbols and has the ideal correction capability. However, the RS code has a high decoding complexity because of its non?binary operations, which is not suitable for high?definition (HD) video streaming applications. The Raptor code [13] is a more attractive solution for HD video streaming services due to the flexible parameter selection and linear decoding cost. The correction capability of a Raptor code is [t=N-(1+?)K], where [?] is the reception overhead efficiency. The correction capability of the Raptor code is sub?optimal, however, a standardized Raptor code can closely achieve the ideal correction capability with negligible [?]. Therefore, it is used in our scheme. In this paper, FEC block size is fixed as N and the number of source blocks Km is determined to choose appropriate FEC code rates Km /N for the m?th FLUTE session. Fig. 2 shows the example of an FEC block with two multicasting groups sharing the resource with different FEC code rates. [nRB,1] and [nRB,2] are the number of resource blocks allocated in an OFDMA frame for groups 1 and 2 respectively. There are total N?OFDMA frames; K1 and K2 of them are video data and others are redundant data. 2.4 Tiled VR Video?Streaming Systems
In a tiled video scheme, the 360?degree video is divided into smaller tiles, which can be encoded independently [14]. There can be multiple copies of the same tile with different representation qualities. These tiles are transmitted through the wireless channel. DASH extends its standard to cover tiled 360?degree videos, i.e., DASH?VR [15]. DASH?VR has included virtual reality video descriptor (VRD) and spatial relationship descriptor (SRD), in addition to media presentation descriptor (MPD), to describe the projection types and spatial relationships among tiles. VRD contains the projection format and orientation information, SRD includes the region?wise quality of rectangular videos within the projected frame, and MPD includes the size of video chunks, locations of video files, and the codec information. As the clients join the multicast system, the MPD, VRD, and SRD are provided to the client, and the client can reconstruct the VR?video by using received tiles based on the received descriptor information.
A DASH multicast system [16] is introduced to efficiently utilize the limited resource and provide better videos to the users. The DASH multicast system allocates multiple copies of the same video with different quality to satisfy more users, but it inevitably generates redundant data that decreases the spectral efficiency. Especially in case of VR videos, most of area are not visible to users. Therefore, more redundant data than conventional video are transmitted if we directly use the DASH multicast for VR?video dissemination. To be more efficient, redundant data should be removed and the tiled video [17] allows to flexibly remove or allocate lower bits to the redundant part of the video. For example, necessary parts of video are transmitted with multiple copies with different quality to satisfy users with good channel quality and the parts with lower probability of view are transmitted just once with single quality to save spectrum.
The most popular and promising technology for controlling regional quality of the video is the use of tiled videos, which has been used for the panoramic interactive video [18], since the interactive video can change its view and users cannot see the whole video at once. VR video is divided into smaller rectangular videos (tiles) and each video is encoded independently using legacy video encoders. Every tile has multiple copies with different encoding rates. Different representations of tiles are transmitted as users’ viewport changes and network channel condition changes. There are simple rate allocation algorithms for tiled?videos, which are Binary, Thumbnail, and Pyramid [19]. Binary allocates the higher representations on the visible tiles, and non?visible tiles have lowest representations to save the bandwidth. It is the most efficient way to allocate the bits, but users can easily see the lowest quality when they move their viewport since the network has latency to respond with viewport changes. Thumbnail allocates the minimum bits of lowest representations for the whole video as the background video, and the remaining bits are allocated for visible tiles with better representations. However, users still can see the lowest quality background video when they move the viewport faster than network latency. The Pyramid algorithm allocates the best representations on visible tiles and gradually lower the representations as the tiles located far from the viewport. However, these rate allocation algorithms are not network?aware and not flexible enough to provide best quality to the users with variable network channel conditions and viewport movement.
Alface et al. [20] propose a rate?selection algorithm to provide the best quality to users with a higher representation in the viewport and lower representations in the other tiles. The algorithm allocates the video rates on the tiles based on utility?over?cost ratios. The utility includes the video bitrates and a probability of view. Since it allocates the best representations for tiles in order to maximize the total utility, as long as there is available resource, the algorithm can achieve the best utility performance compared to other existing solutions.
However, none of the existing algorithms are directly applicable to the multicasting scenario. A new approach to perform the VR video multicasting is proposed in this paper.
3 DASH Multicast
Heuristic algorithms have been introduced to solve the video multicasting problem. These algorithms are differentiated based on types of the video sources. SVC [7] videos are originally used for video multicasting systems because of its layer?dependent characteristics. More enhancement video layers can be combined with the base layer to create higher quality video for the users who have good channel conditions, while the users with poorer channel quality can only receive less enhancement video layers coded with more reliable but less efficient MCS. For the DASH systems, usually videos are encoded as multiple different video rates and stored at the server as small chunks, and they are transmitted to the clients who request the videos. Therefore, different video representations are independent of each other and they can be scheduled independently for multicasting. DASH can also transmit SVC type video sources, but, in this paper, for notational convenience, DASH only denotes multiple video rates without dependencies among representations and SVC denotes the layered video with dependencies among layers. There have been studies on multicasting videos over wireless networks. Chen et al. [21] consider the fair and optimal resource allocation for LTE multicast. They also consider the unicast for some users with lower SNR without considering FEC for packet protection. Belda et al. [22] introduce a hybrid FLUTE/DASH video delivery system, which can multicast the video through FLUTE sessions and repair requests through the unicast channel to recover lost packets. They fix the FEC code rates of FLUTE sessions and provide the simulation results to show that the hybrid video delivery can improve the video quality compared to the video delivery systems using the unicast only. Nonetheless, in their approach, FEC code rates and resource allocation for multiple FLUTE sessions are not jointly optimized and it may create some repair requests through the unicast channel. Our research starts with the assumption that if we can jointly find the optimal resource allocation and FEC code rate selection, we can transmit the videos without repair requests which call for additional bandwidth. Our goal in this research is to jointly find the optimal resource allocations, the optimal MCS and FEC code rates for multiple FLUTE sessions so as to efficiently serve the DASH clients in the LTE networks without unicast channels for repairing the lost packets.
SVC?based video multicasting algorithms have been previously proposed, e.g., Conservative Multicasting Scheme (CMS) [23], Opportunistic Layered Multicasting (OLM) [24], Multicast Subgrouping for Multi?Layer (MSML) video applications [25], Median Quality Scheme (MQS) [26], Median User Scheme (MUS) [27], etc. Theses heuristic algorithms describe how to divide multiple users into several multicast groups (each group corresponds to one SVC layer) and select proper resource, MCS and FEC code rates, for the different groups based on the channel quality feedbacks from the users within the same group. More specifically, the CMS [23] first allocates each sub?channel to a group of users in the multicast session based on their sub?channel gains, then a greedy algorithm is adopted for resource allocation to achieve proportional fairness among sessions. OLM [24] can choose more aggressive MCS to achieve higher spectral efficiency and protect lost packets by using FEC for each group. On the other hand, MSML [25] utilizes the frequency diversity to achieve better throughput than other schemes. For example, a user with very low average SNR can possibly have some RBs that have high channel gains, and MSML utilizes these RBs to schedule lower video layers. Since MSML can choose the best RBs for each multicast group, it can select more efficient MCS than those chosen by other conservative schemes that are constrained by the users with lowest channel quality, such as the less spectrally efficient CMS scheme. MUS and MQS choose the subgroups based on the number of users and the quality of the channel respectively. Their schemes can achieve better spectral efficiency than CMS, but less than those achieved by OLM and MSML. SVC multicasting and DASH multicasting are differed by users’ video receiving methods. Fig. 3 shows the difference between SVC and DASH multicasting systems. The users receiving SVC type of videos should receive base layer as mandatory and can improve the quality of the video by receiving multiple enhancement layers. Therefore, the users need to subscribe multiple multicast sessions. However, the sources of DASH multicasting systems are independent videos. Therefore, the users only need to select one multicast session and receive single video representation to see the video. Since DASH multicasting inevitably generates some redundant data, there could be waste on resource usage. However, Park et al. [8] propose the optimal DASH multicast (ODM) algorithms and show that optimal resource allocation, video rate selection and user grouping can take advantage of multicasting. Therefore, DASH multicasting methods can achieve better utility and provide better video quality to the users. Fig. 4 shows the utility performance (a, c), and the spectral efficiency (b, d) when DASH and SVC types of video sources are used respectively. It can be found that the proposed ODM achieves the best utility and spectral efficiency performance, compared to OLM, MSML, and fixed FEC code rate methods.
4 DASH?VR Multicast
VR is getting more popular these days, and more people can enjoy more realistic experiences with VR systems [28]. Moreover, it allows people to look around the virtual world and feel like they are actually in the environment. VR gaming can provide a more exciting experience to gamers. However, it is a more challenging task to make users satisfied with the quality of VR videos, because VR videos need much higher resolution than conventional videos. Users cannot see the whole video at the same time, they can only focus on the area that they want to see and the area is usually only 20% of whole video [29]. Therefore, 4-6 times more resolution is required for VR videos to provide the same experience as conventional videos. On the other hand, this fact allows the saving of bandwidth, because 80% of the video is unseen by the user at a given time. In an ideal case, we could save 80% of the bandwidth; but in practice, we still need to transmit redundant areas of the video because it is difficult to predict how a user’s viewport will change.
The original DASH system allows the clients to do the video rate adaptation, but it is difficult to do the individual rate adaptation in multicast systems because the users grouped into the same group share the same spectrum resource and they receive the same video rate even though they have different channel quality. Rather than doing the individual rate adaptation, the server can adaptively choose the video rates of the tiles to maximize the expected total utility of the users. Feedback information of users’ viewports can be used to decide which tiles should have better video rates to satisfy more users. Another way to decide which tiles are more important than others is analyzing the video at the server side. The server can analyze the video contents first and then decide which part may have higher interest from users. Saliency [30] of the video is one of the useful indicator to find the important area of the video. Therefore, we can give more bits to the area that has higher saliency to satisfy more users. Saliency detection algorithms usually find the areas that have high contrast or active movement in the video [31], because those areas usually have richer or more appealing information such as important texture or moving objects. By using the saliency information, the server can allocate more bits to the areas that have higher saliency scores to make them clearer. There are many video saliency detection algorithms to find which parts are more important and interesting to users. There are two possible ways to do the VR video multicasting. The multicasting is featured by grouping the users to share the same resource. First, the users with the same view can be grouped into a multicast group. The number of multicast groups is the same as the number of views [32]. It can save some resource by sharing the same view with many users, but cannot take advantage of using a multicast scheme when users have different channel quality. All the multicasting groups will suffer with the user with very bad channel quality. Moreover, all the users eventually need to receive all the tiles because there is latency between the server and the client which is difficult to overcome. Second, users can be grouped with their channel quality. This grouping strategy helps to select more efficient MCS and AL?FEC code rate to allocate better video. As the number of users who could join the group with better video increases, total utility is also improved. Therefore, we have designed the multicast systems based on the second scheme that groups the users with their channel quality.
The clients in a DASH?VR multicast system request video chunks to the server based on MPD, SRD, and VRD information. The DASH server starts to deliver the tiled?video data. BM?SC creates the multiple video sessions that will deliver the tiled?videos with multiple video representations. BM?SC is also responsible for adding AL?FEC redundant blocks for the lost packet recovery. Multiple video multicast sessions are created to deliver multiple VR videos and multiple video representations to different user groups. A video session can contain a single tile or multiple tiles. MBMS?GW passes the video data to the eNBs and MCE allocates the resource for video sessions and assigns the proper MCS for the resource. Users participate on video sessions and the users who can participate on the multiple video sessions have chances to choose better representations. The eNB receives the CQI feedback information from the UEs to help allocating RB and choosing AL?FEC code rate and MCS for the multicasting sessions.
The difference between a multicast session and a multicast group is that a multicast session denotes a video session that uses the radio resource controlled by the MCE, while a multicasting group denotes a set of users grouped by their channel conditions and subscribing the same video. Note that users can subscribe multiple multicast sessions at the same time, therefore, the number of multicast sessions and the number of multicast groups are not necessarily the same. The multicast groups are arranged based on the channel condition and the user groups with high channel quality can take advantage of subscribing multiple multicast sessions. We can consider two different ways to create video multicast sessions. One is the per?tile multicasting (PTM) that considers the tiles as independent videos, where each tile has its own resource and every UE subscribes all necessary sessions to regenerate the VR video. It needs to create multiple multicasting sessions as many as the number of tiles times the number of representations for a single VR?video content. All possible video representations of all tiles are available for the users based on their channel quality, and the users regenerate the VR?video with the tiles that have the best quality they can decode. For example, if there are T tiles and M representations for each tile, total T×M multicast sessions can be created. MCS, AL?FEC, and resources for all multicast sessions have to be determined to maximize the total utility. Its search space to find optimal solution is [MT]. Each user selects one representation for one tile and subscribe T multicast sessions to regenerate the VR?video. It generates too much control signal and the complexity of the solution increases with the number of multicast sessions.
The other is the multi?session multicasting (MSM), which creates the same number of multicast sessions as the number of user groups. Each multicast session includes multiple tiles with different quality. Fig. 5 shows an example of MSM system with 3 groups and 3 multicast sessions. Fig. 5a shows the tiled?video encoded with multiple representations. Every tile has multiple copies with different representations (qualities) and they are generated by legacy video encoder with different quantization parameters (QP). Higher representations indicate better qualities, and they need more bandwidth to be transmitted. Fig. 5b shows the rate selection results for multiple multicast sessions. The first multicast session has all the tiles with lower representations to guarantee all the users requesting the VR video to receive at least lower quality video. The second and third multicast sessions do not need to have all the tiles. They allocate higher representations to improve the quality of the tiles for the users with better channel quality. Therefore, they are allocated on the wireless resource with more efficient MCS and AL?FEC code rates (Fig. 5c). The users can subscribe multiple multicast sessions at the same time, but their channel quality should be good enough to decode the data packets assigned with certain MCS and AL?FEC. In Fig. 5e, the user group 1 can only receive the data in the multicast session 1, while user group 2 can receive multicast sessions 1 and 2. The user group 3 can receive all three multicast sessions. Therefore, the user groups 2 and 3 have chances to choose better representations from multiple representations they can receive. Since MSM’s multicast session includes the multiple tiles, it creates less multicast sessions than PTM. Another advantage of MSM is that it can use existing rate selection algorithms introduced in Section 2.4. The rate selection algorithm [21] can work to allocate the tiles of different representations with the bit rate constraint of each multicast session. The bit rate constraints of multicast sessions are determined by the resource allocated on the multicast sessions, MCS, and AL?FEC code rates.
5 Conclusions
In this paper we presented an overview of several wireless video multicasting systems and algorithms. SVC based video multicasting systems are introduced first, but DASH is getting more popular. DASH multicasting can take advantage of allocating multiple copies with different quality to allow users to select appropriate video quality. It improves the utility performance of the systems. The wireless multicasting systems, such as LTE eMBMS, can deliver the VR video more efficiently combined with tiled?video rate adaptation. We propose MSM system to allocate the multiple tiles on a single multicast session and generates the multiple multicast session to provide a set of tiles with different representations. The proposed tiled video multicasting scheme uses the limited wireless resource more efficiently than other VR multicasting schemes. The optimal resource allocation, MCS and AL?FEC code rate selection for multicast sessions to improve DASH?VR multicasting systems are the problems that we have to do as a future work.
VR; video multicast; LTE; eMBMS
1 Introduction
he enhanced capabilities of mobile devices and the improved capacities of wireless networks have led to a massive growth in mobile video consumption. A recent report [1] shows that the video traffic occupies more than 70% of the whole Internet traffic in peak time, and moreover, half of the video consumers use mobile devices. Moreover, virtual reality (VR) and augmented reality (AR) applications are getting more popular and users can enjoy diverse experience with them. However, VR/AR applications need more data than conventional video streaming services. As multimedia data traffic is increasing over wireless networks, efficient utilization of the wireless resources is getting more important for serving more users. Moreover, wireless channel condition frequently varies with channel environments and user behaviors. MPEG’s Dynamic Adaptive Streaming over HTTP (MPEG?DASH) [2] is thus proposed as an effective video streaming platform, which enables the adaptive rate selection based on the channel conditions. DASH can provide superior video experience by giving clients a chance to receive the video quality based on their channel condition and buffer status, resulting in better quality of experience (QoE). Most of Internet video service providers, such as Netflix and Youtube, support DASH in their video streaming platforms. DASH is extended for VR video streaming (DASH?VR) and it supports tiled video rate adaptations and reconstruction of VR videos. With the overloaded scenarios, for example, many people watch the popular live videos such as sports events, bandwidth can be easily used up and many people will suffer from delay or low video quality. To overcome the problem, video multicasting can be utilized. LTE allows using their spectrum for multicasting or broadcasting up to 60% of the spectrum and it is standardized as evolved Multimedia Broadcast Multicast Service (eMBMS) [3]. The multicast channel (MCH), that delivers eMBMS data, cannot get any advantage from either Hybrid Automatic Repeat?Request (HARQ) or retransmission since the MCH transfers the data in radio link control (RLC) unacknowledged mode (UM) [3], due to the fact that a single user’s channel condition cannot represent all users’ channel conditions. Besides, it is very inefficient to retransmit many lost packets to user equipment (UE) with poor channel conditions, resulting in further consumption of bandwidth. This situation makes QoE worse because it cannot transmit the appropriate video representations to the subscribed users, resulting in very high packet loss rate or possibility that users cannot get a video with enough quality even when the channel condition is very good. To overcome the problem of DASH multicasting, the File Delivery over Unidirectional Transport (FLUTE) [4] protocol is thus introduced. The Internet Engineering Task Force (IETF) introduces FLUTE for unidirectional data transfer over the Internet. To avoid packet loss, FLUTE adds redundant packets to help the recovery of the lost packet, which is done by forward error correction (FEC) [5]. Moreover, if FEC is not enough to recover all the lost packets, DASH clients can request packet recovery through reliable TCP unicast transmission [6].
Combining FLUTE and eMBMS of LTE makes DASH multicasting possible with capability of adaptive video quality control, however, it introduces more complexity to the systems. Since there are multiple copies of the video with different rates, the system has to choose which video rates to be scheduled based on the channel information and user’s requests. FLUTE sessions have to add redundant APP?layer FEC packets to protect the video data while not losing efficiency. Moreover, resources for each FLUTE session must be allocated in the orthogonal frequency?division multiple access (OFDMA) frames and PHY?layer modulation and coding schemes (MCS) for the chosen resource blocks also need to be selected for reliable communications. Its complexity exponentially increases as number of users and/or number of video increases to find the optimal solution. Moreover, channel condition always changes frequently, therefore, it is more difficult to optimize the whole system in real time. DASH or scalable video coding (SVC)?based multicasting algorithms have been introduced to efficiently solve this problem and give more users better video quality [7]. Park et al. [8] show that the total utility can be improved and more users can watch better video by using DASH multicast over LTE. This algorithm allocates one video representation in one multicast video session; therefore, there is corresponding video quality when we allocate the resource to the video sessions. However, in case of tiled VR videos, the multiple tiles share the resource, and many combinations of tiles with different representations are possibly allocated in a single multicast video session. Therefore, the video quality not only depends on the allocated resource but also on the tile?based rate?selection algorithm. In this paper, a new approach is proposed to allocate the DASH?VR video on LTE eMBMS systems.
The remainder of this paper is organized as follows. Section 2 summarizes the related works. Section 3 introduces the existing DASH mutlicasting systems and algorithms. Section 4 proposes a new approach to perform the DASH?VR multicast systems, followed by the conclusion in Section 5.
2 Related Works
2.1 LTE eMBMS
LTE supports multicasting of video streams by eMBMS [3] system (Fig. 1), and the broadcast multicast service center (BMSC) is responsible for managing multicast sessions. It provides membership, session and transmission, proxy and transport, service announcement, security, and content synchronization. An MBMS gateway (MBMS?GW) distributes the video data to eNBs. It performs session control signaling towards the mobile management entity (MME). Multi?cell/multicast coordination entities (MCEs) are part of eNBs and they provide admission control. They allocate the radio resource for multicast sessions and decides MCS. Multiple video multicasting sessions can thus be created and users can subscribe those sessions at the same time.
The physical layer of an LTE downlink is based on the OFDMA technology, and the basic resource unit in the LTE system is a physical resource block (RB), which has 180 KHz bandwidth with 12 subcarriers and 7 symbols [9]. Within an RB, the same MCS is applied for all subcarriers. Therefore, if we define the MCS of an RB, there is the corresponding number of bits that one RB can carry, which is
[cMCS=12subcarriers×7symbols×efficiency.] (1)
Table 1 shows the MCS along with efficiency [10] for various channel quality indication (CQI) indices. In this paper, we use the CQI index as an MCS index for notational convenience. Using the information in the table, we can find what is an expected data rate when we know how many RBs are allocated to the FLUTE sessions. 2.2 File Delivery over Unidirectional Transport
The FLUTE protocol is proposed by IETF [4] to multicast a file over the networks using User Datagram Protocol (UDP)?based protocols with application?layer forward error correction (AL?FEC) being provided for protecting the file from the packet losses. Additional file repair procedures are allowed by the HTTP file repair request. A file repair response message consists of HTTP header and file repair payload. The file repair response message consists of HTTP header which informs that point?to?multipoint repair, instead of point?to?point repair, is used.
2.3 Application?Layer Forward Error Correction
The radio channel conditions vary among all the users receiving the multicast service. Therefore, the block error rate of the users that receive the video service delivered with a single MCS may have a great variance. In order to increase the robustness and reliability of multicast transmissions, FEC redundancy packets are incorporated at the APP?layer [11].
The solution proposed by 3GPP to deliver video streaming over eMBMS uses the FLUTE protocol with UDP transport to send video segments with the corresponding APP?layer FEC over multicast. An FEC block consists of N packets including K source packets and N?K redundancy packets, resulting in the encoding rate K/N. The FEC decoder can ideally recover the original K source packets from any K out of N received packets with correction capability t=N?K. The Reed Solomon (RS) code [12] is a well?known FEC code which operates on non?binary symbols and has the ideal correction capability. However, the RS code has a high decoding complexity because of its non?binary operations, which is not suitable for high?definition (HD) video streaming applications. The Raptor code [13] is a more attractive solution for HD video streaming services due to the flexible parameter selection and linear decoding cost. The correction capability of a Raptor code is [t=N-(1+?)K], where [?] is the reception overhead efficiency. The correction capability of the Raptor code is sub?optimal, however, a standardized Raptor code can closely achieve the ideal correction capability with negligible [?]. Therefore, it is used in our scheme. In this paper, FEC block size is fixed as N and the number of source blocks Km is determined to choose appropriate FEC code rates Km /N for the m?th FLUTE session. Fig. 2 shows the example of an FEC block with two multicasting groups sharing the resource with different FEC code rates. [nRB,1] and [nRB,2] are the number of resource blocks allocated in an OFDMA frame for groups 1 and 2 respectively. There are total N?OFDMA frames; K1 and K2 of them are video data and others are redundant data. 2.4 Tiled VR Video?Streaming Systems
In a tiled video scheme, the 360?degree video is divided into smaller tiles, which can be encoded independently [14]. There can be multiple copies of the same tile with different representation qualities. These tiles are transmitted through the wireless channel. DASH extends its standard to cover tiled 360?degree videos, i.e., DASH?VR [15]. DASH?VR has included virtual reality video descriptor (VRD) and spatial relationship descriptor (SRD), in addition to media presentation descriptor (MPD), to describe the projection types and spatial relationships among tiles. VRD contains the projection format and orientation information, SRD includes the region?wise quality of rectangular videos within the projected frame, and MPD includes the size of video chunks, locations of video files, and the codec information. As the clients join the multicast system, the MPD, VRD, and SRD are provided to the client, and the client can reconstruct the VR?video by using received tiles based on the received descriptor information.
A DASH multicast system [16] is introduced to efficiently utilize the limited resource and provide better videos to the users. The DASH multicast system allocates multiple copies of the same video with different quality to satisfy more users, but it inevitably generates redundant data that decreases the spectral efficiency. Especially in case of VR videos, most of area are not visible to users. Therefore, more redundant data than conventional video are transmitted if we directly use the DASH multicast for VR?video dissemination. To be more efficient, redundant data should be removed and the tiled video [17] allows to flexibly remove or allocate lower bits to the redundant part of the video. For example, necessary parts of video are transmitted with multiple copies with different quality to satisfy users with good channel quality and the parts with lower probability of view are transmitted just once with single quality to save spectrum.
The most popular and promising technology for controlling regional quality of the video is the use of tiled videos, which has been used for the panoramic interactive video [18], since the interactive video can change its view and users cannot see the whole video at once. VR video is divided into smaller rectangular videos (tiles) and each video is encoded independently using legacy video encoders. Every tile has multiple copies with different encoding rates. Different representations of tiles are transmitted as users’ viewport changes and network channel condition changes. There are simple rate allocation algorithms for tiled?videos, which are Binary, Thumbnail, and Pyramid [19]. Binary allocates the higher representations on the visible tiles, and non?visible tiles have lowest representations to save the bandwidth. It is the most efficient way to allocate the bits, but users can easily see the lowest quality when they move their viewport since the network has latency to respond with viewport changes. Thumbnail allocates the minimum bits of lowest representations for the whole video as the background video, and the remaining bits are allocated for visible tiles with better representations. However, users still can see the lowest quality background video when they move the viewport faster than network latency. The Pyramid algorithm allocates the best representations on visible tiles and gradually lower the representations as the tiles located far from the viewport. However, these rate allocation algorithms are not network?aware and not flexible enough to provide best quality to the users with variable network channel conditions and viewport movement.
Alface et al. [20] propose a rate?selection algorithm to provide the best quality to users with a higher representation in the viewport and lower representations in the other tiles. The algorithm allocates the video rates on the tiles based on utility?over?cost ratios. The utility includes the video bitrates and a probability of view. Since it allocates the best representations for tiles in order to maximize the total utility, as long as there is available resource, the algorithm can achieve the best utility performance compared to other existing solutions.
However, none of the existing algorithms are directly applicable to the multicasting scenario. A new approach to perform the VR video multicasting is proposed in this paper.
3 DASH Multicast
Heuristic algorithms have been introduced to solve the video multicasting problem. These algorithms are differentiated based on types of the video sources. SVC [7] videos are originally used for video multicasting systems because of its layer?dependent characteristics. More enhancement video layers can be combined with the base layer to create higher quality video for the users who have good channel conditions, while the users with poorer channel quality can only receive less enhancement video layers coded with more reliable but less efficient MCS. For the DASH systems, usually videos are encoded as multiple different video rates and stored at the server as small chunks, and they are transmitted to the clients who request the videos. Therefore, different video representations are independent of each other and they can be scheduled independently for multicasting. DASH can also transmit SVC type video sources, but, in this paper, for notational convenience, DASH only denotes multiple video rates without dependencies among representations and SVC denotes the layered video with dependencies among layers. There have been studies on multicasting videos over wireless networks. Chen et al. [21] consider the fair and optimal resource allocation for LTE multicast. They also consider the unicast for some users with lower SNR without considering FEC for packet protection. Belda et al. [22] introduce a hybrid FLUTE/DASH video delivery system, which can multicast the video through FLUTE sessions and repair requests through the unicast channel to recover lost packets. They fix the FEC code rates of FLUTE sessions and provide the simulation results to show that the hybrid video delivery can improve the video quality compared to the video delivery systems using the unicast only. Nonetheless, in their approach, FEC code rates and resource allocation for multiple FLUTE sessions are not jointly optimized and it may create some repair requests through the unicast channel. Our research starts with the assumption that if we can jointly find the optimal resource allocation and FEC code rate selection, we can transmit the videos without repair requests which call for additional bandwidth. Our goal in this research is to jointly find the optimal resource allocations, the optimal MCS and FEC code rates for multiple FLUTE sessions so as to efficiently serve the DASH clients in the LTE networks without unicast channels for repairing the lost packets.
SVC?based video multicasting algorithms have been previously proposed, e.g., Conservative Multicasting Scheme (CMS) [23], Opportunistic Layered Multicasting (OLM) [24], Multicast Subgrouping for Multi?Layer (MSML) video applications [25], Median Quality Scheme (MQS) [26], Median User Scheme (MUS) [27], etc. Theses heuristic algorithms describe how to divide multiple users into several multicast groups (each group corresponds to one SVC layer) and select proper resource, MCS and FEC code rates, for the different groups based on the channel quality feedbacks from the users within the same group. More specifically, the CMS [23] first allocates each sub?channel to a group of users in the multicast session based on their sub?channel gains, then a greedy algorithm is adopted for resource allocation to achieve proportional fairness among sessions. OLM [24] can choose more aggressive MCS to achieve higher spectral efficiency and protect lost packets by using FEC for each group. On the other hand, MSML [25] utilizes the frequency diversity to achieve better throughput than other schemes. For example, a user with very low average SNR can possibly have some RBs that have high channel gains, and MSML utilizes these RBs to schedule lower video layers. Since MSML can choose the best RBs for each multicast group, it can select more efficient MCS than those chosen by other conservative schemes that are constrained by the users with lowest channel quality, such as the less spectrally efficient CMS scheme. MUS and MQS choose the subgroups based on the number of users and the quality of the channel respectively. Their schemes can achieve better spectral efficiency than CMS, but less than those achieved by OLM and MSML. SVC multicasting and DASH multicasting are differed by users’ video receiving methods. Fig. 3 shows the difference between SVC and DASH multicasting systems. The users receiving SVC type of videos should receive base layer as mandatory and can improve the quality of the video by receiving multiple enhancement layers. Therefore, the users need to subscribe multiple multicast sessions. However, the sources of DASH multicasting systems are independent videos. Therefore, the users only need to select one multicast session and receive single video representation to see the video. Since DASH multicasting inevitably generates some redundant data, there could be waste on resource usage. However, Park et al. [8] propose the optimal DASH multicast (ODM) algorithms and show that optimal resource allocation, video rate selection and user grouping can take advantage of multicasting. Therefore, DASH multicasting methods can achieve better utility and provide better video quality to the users. Fig. 4 shows the utility performance (a, c), and the spectral efficiency (b, d) when DASH and SVC types of video sources are used respectively. It can be found that the proposed ODM achieves the best utility and spectral efficiency performance, compared to OLM, MSML, and fixed FEC code rate methods.
4 DASH?VR Multicast
VR is getting more popular these days, and more people can enjoy more realistic experiences with VR systems [28]. Moreover, it allows people to look around the virtual world and feel like they are actually in the environment. VR gaming can provide a more exciting experience to gamers. However, it is a more challenging task to make users satisfied with the quality of VR videos, because VR videos need much higher resolution than conventional videos. Users cannot see the whole video at the same time, they can only focus on the area that they want to see and the area is usually only 20% of whole video [29]. Therefore, 4-6 times more resolution is required for VR videos to provide the same experience as conventional videos. On the other hand, this fact allows the saving of bandwidth, because 80% of the video is unseen by the user at a given time. In an ideal case, we could save 80% of the bandwidth; but in practice, we still need to transmit redundant areas of the video because it is difficult to predict how a user’s viewport will change.
The original DASH system allows the clients to do the video rate adaptation, but it is difficult to do the individual rate adaptation in multicast systems because the users grouped into the same group share the same spectrum resource and they receive the same video rate even though they have different channel quality. Rather than doing the individual rate adaptation, the server can adaptively choose the video rates of the tiles to maximize the expected total utility of the users. Feedback information of users’ viewports can be used to decide which tiles should have better video rates to satisfy more users. Another way to decide which tiles are more important than others is analyzing the video at the server side. The server can analyze the video contents first and then decide which part may have higher interest from users. Saliency [30] of the video is one of the useful indicator to find the important area of the video. Therefore, we can give more bits to the area that has higher saliency to satisfy more users. Saliency detection algorithms usually find the areas that have high contrast or active movement in the video [31], because those areas usually have richer or more appealing information such as important texture or moving objects. By using the saliency information, the server can allocate more bits to the areas that have higher saliency scores to make them clearer. There are many video saliency detection algorithms to find which parts are more important and interesting to users. There are two possible ways to do the VR video multicasting. The multicasting is featured by grouping the users to share the same resource. First, the users with the same view can be grouped into a multicast group. The number of multicast groups is the same as the number of views [32]. It can save some resource by sharing the same view with many users, but cannot take advantage of using a multicast scheme when users have different channel quality. All the multicasting groups will suffer with the user with very bad channel quality. Moreover, all the users eventually need to receive all the tiles because there is latency between the server and the client which is difficult to overcome. Second, users can be grouped with their channel quality. This grouping strategy helps to select more efficient MCS and AL?FEC code rate to allocate better video. As the number of users who could join the group with better video increases, total utility is also improved. Therefore, we have designed the multicast systems based on the second scheme that groups the users with their channel quality.
The clients in a DASH?VR multicast system request video chunks to the server based on MPD, SRD, and VRD information. The DASH server starts to deliver the tiled?video data. BM?SC creates the multiple video sessions that will deliver the tiled?videos with multiple video representations. BM?SC is also responsible for adding AL?FEC redundant blocks for the lost packet recovery. Multiple video multicast sessions are created to deliver multiple VR videos and multiple video representations to different user groups. A video session can contain a single tile or multiple tiles. MBMS?GW passes the video data to the eNBs and MCE allocates the resource for video sessions and assigns the proper MCS for the resource. Users participate on video sessions and the users who can participate on the multiple video sessions have chances to choose better representations. The eNB receives the CQI feedback information from the UEs to help allocating RB and choosing AL?FEC code rate and MCS for the multicasting sessions.
The difference between a multicast session and a multicast group is that a multicast session denotes a video session that uses the radio resource controlled by the MCE, while a multicasting group denotes a set of users grouped by their channel conditions and subscribing the same video. Note that users can subscribe multiple multicast sessions at the same time, therefore, the number of multicast sessions and the number of multicast groups are not necessarily the same. The multicast groups are arranged based on the channel condition and the user groups with high channel quality can take advantage of subscribing multiple multicast sessions. We can consider two different ways to create video multicast sessions. One is the per?tile multicasting (PTM) that considers the tiles as independent videos, where each tile has its own resource and every UE subscribes all necessary sessions to regenerate the VR video. It needs to create multiple multicasting sessions as many as the number of tiles times the number of representations for a single VR?video content. All possible video representations of all tiles are available for the users based on their channel quality, and the users regenerate the VR?video with the tiles that have the best quality they can decode. For example, if there are T tiles and M representations for each tile, total T×M multicast sessions can be created. MCS, AL?FEC, and resources for all multicast sessions have to be determined to maximize the total utility. Its search space to find optimal solution is [MT]. Each user selects one representation for one tile and subscribe T multicast sessions to regenerate the VR?video. It generates too much control signal and the complexity of the solution increases with the number of multicast sessions.
The other is the multi?session multicasting (MSM), which creates the same number of multicast sessions as the number of user groups. Each multicast session includes multiple tiles with different quality. Fig. 5 shows an example of MSM system with 3 groups and 3 multicast sessions. Fig. 5a shows the tiled?video encoded with multiple representations. Every tile has multiple copies with different representations (qualities) and they are generated by legacy video encoder with different quantization parameters (QP). Higher representations indicate better qualities, and they need more bandwidth to be transmitted. Fig. 5b shows the rate selection results for multiple multicast sessions. The first multicast session has all the tiles with lower representations to guarantee all the users requesting the VR video to receive at least lower quality video. The second and third multicast sessions do not need to have all the tiles. They allocate higher representations to improve the quality of the tiles for the users with better channel quality. Therefore, they are allocated on the wireless resource with more efficient MCS and AL?FEC code rates (Fig. 5c). The users can subscribe multiple multicast sessions at the same time, but their channel quality should be good enough to decode the data packets assigned with certain MCS and AL?FEC. In Fig. 5e, the user group 1 can only receive the data in the multicast session 1, while user group 2 can receive multicast sessions 1 and 2. The user group 3 can receive all three multicast sessions. Therefore, the user groups 2 and 3 have chances to choose better representations from multiple representations they can receive. Since MSM’s multicast session includes the multiple tiles, it creates less multicast sessions than PTM. Another advantage of MSM is that it can use existing rate selection algorithms introduced in Section 2.4. The rate selection algorithm [21] can work to allocate the tiles of different representations with the bit rate constraint of each multicast session. The bit rate constraints of multicast sessions are determined by the resource allocated on the multicast sessions, MCS, and AL?FEC code rates.
5 Conclusions
In this paper we presented an overview of several wireless video multicasting systems and algorithms. SVC based video multicasting systems are introduced first, but DASH is getting more popular. DASH multicasting can take advantage of allocating multiple copies with different quality to allow users to select appropriate video quality. It improves the utility performance of the systems. The wireless multicasting systems, such as LTE eMBMS, can deliver the VR video more efficiently combined with tiled?video rate adaptation. We propose MSM system to allocate the multiple tiles on a single multicast session and generates the multiple multicast session to provide a set of tiles with different representations. The proposed tiled video multicasting scheme uses the limited wireless resource more efficiently than other VR multicasting schemes. The optimal resource allocation, MCS and AL?FEC code rate selection for multicast sessions to improve DASH?VR multicasting systems are the problems that we have to do as a future work.