{"id":3878,"date":"2022-12-20T17:28:19","date_gmt":"2022-12-20T20:28:19","guid":{"rendered":"http:\/\/lode.uno\/linux-man\/index.php\/2022\/12\/20\/tchfsc-man7\/"},"modified":"2022-12-20T17:28:19","modified_gmt":"2022-12-20T20:28:19","slug":"tchfsc-man7","status":"publish","type":"post","link":"https:\/\/lode.uno\/linux-man\/2022\/12\/20\/tchfsc-man7\/","title":{"rendered":"TC&minus;HFSC (man7)"},"content":{"rendered":"<h1 align=\"center\">TC\u2212HFSC<\/h1>\n<p> <a href=\"#NAME\">NAME<\/a><br \/> <a href=\"#HISTORY &#038; INTRODUCTION\">HISTORY &#038; INTRODUCTION<\/a><br \/> <a href=\"#ABBREVIATIONS\">ABBREVIATIONS<\/a><br \/> <a href=\"#BASICS OF HFSC\">BASICS OF HFSC<\/a><br \/> <a href=\"#REALTIME CRITERION\">REALTIME CRITERION<\/a><br \/> <a href=\"#LINKSHARING CRITERION\">LINKSHARING CRITERION<\/a><br \/> <a href=\"#UPPERLIMIT CRITERION\">UPPERLIMIT CRITERION<\/a><br \/> <a href=\"#SEPARATE LS \/ RT SCs\">SEPARATE LS \/ RT SCs<\/a><br \/> <a href=\"#CORNER CASES\">CORNER CASES<\/a><br \/> <a href=\"#LINUX AND TIMER RESOLUTION\">LINUX AND TIMER RESOLUTION<\/a><br \/> <a href=\"#CAVEAT: RANDOM ONLINE EXAMPLES\">CAVEAT: RANDOM ONLINE EXAMPLES<\/a><br \/> <a href=\"#LAYER2 ADAPTATION\">LAYER2 ADAPTATION<\/a><br \/> <a href=\"#SEE ALSO\">SEE ALSO<\/a><br \/> <a href=\"#AUTHOR\">AUTHOR<\/a> <\/p>\n<hr>\n<h2>NAME <a name=\"NAME\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">tc-hfcs \u2212 Hierarchical Fair Service Curve<\/p>\n<h2>HISTORY &#038; INTRODUCTION <a name=\"HISTORY &#038; INTRODUCTION\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">HFSC (Hierarchical Fair Service Curve) is a network packet scheduling algorithm that was first presented at SIGCOMM\u201997. Developed as a part of ALTQ (ALTernative Queuing) on NetBSD, found its way quickly to other BSD systems, and then a few years ago became part of the linux kernel. Still, it\u2019s not the most popular scheduling algorithm \u2212 especially if compared to HTB \u2212 and it\u2019s not well documented for the enduser. This introduction aims to explain how HFSC works without using too much math (although some math it will be inevitable).<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">In short HFSC aims to:<\/p>\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\" cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"17%\"><\/td>\n<td width=\"3%\">\n<p><b>1)<\/b><\/p>\n<\/td>\n<td width=\"3%\"><\/td>\n<td width=\"77%\">\n<p>guarantee precise bandwidth and delay allocation for all leaf classes (realtime criterion)<\/p>\n<\/td>\n<\/tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"17%\"><\/td>\n<td width=\"3%\">\n<p><b>2)<\/b><\/p>\n<\/td>\n<td width=\"3%\"><\/td>\n<td width=\"77%\">\n<p>allocate excess bandwidth fairly as specified by class hierarchy (linkshare &#038; upperlimit criterion)<\/p>\n<\/td>\n<\/tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"17%\"><\/td>\n<td width=\"3%\">\n<p><b>3)<\/b><\/p>\n<\/td>\n<td width=\"3%\"><\/td>\n<td width=\"77%\">\n<p>minimize any discrepancy between the service curve and the actual amount of service provided during linksharing<\/p>\n<\/td>\n<\/tr>\n<\/table>\n<p style=\"margin-left:11%; margin-top: 1em\">The main &#8220;selling&#8221; point of HFSC is feature <b>(1)<\/b>, which is achieved by using nonlinear service curves (more about what it actually is later). This is particularly useful in VoIP or games, where not only a guarantee of consistent bandwidth is important, but also limiting the initial delay of a data stream. Note that it matters only for leaf classes (where the actual queues are) \u2212 thus class hierarchy is ignored in the realtime case.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Feature <b>(2)<\/b> is well, obvious \u2212 any algorithm featuring class hierarchy (such as HTB or CBQ) strives to achieve that. HFSC does that well, although you might end with unusual situations, if you define service curves carelessly \u2212 see section CORNER CASES for examples.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Feature <b>(3)<\/b> is mentioned due to the nature of the problem. There may be situations where it\u2019s either not possible to guarantee service of all curves at the same time, and\/or it\u2019s impossible to do so fairly. Both will be explained later. Note that this is mainly related to interior (aka aggregate) classes, as the leafs are already handled by <b>(1)<\/b>. Still, it\u2019s perfectly possible to create a leaf class without realtime service, and in such a case the caveats will naturally extend to leaf classes as well.<\/p>\n<h2>ABBREVIATIONS <a name=\"ABBREVIATIONS\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">For the remaining part of the document, we\u2019ll use following shortcuts:<\/p>\n<p style=\"margin-left:17%; margin-top: 1em\">RT \u2212 realtime <br \/> LS \u2212 linkshare <br \/> UL \u2212 upperlimit <br \/> SC \u2212 service curve<\/p>\n<h2>BASICS OF HFSC <a name=\"BASICS OF HFSC\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">To understand how HFSC works, we must first introduce a service curve. Overall, it\u2019s a nondecreasing function of some time unit, returning the amount of service (an allowed or allocated amount of bandwidth) at some specific point in time. The purpose of it should be subconsciously obvious: if a class was allowed to transfer not less than the amount specified by its service curve, then the service curve is not violated.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Still, we need more elaborate criterion than just the above (although in the most generic case it can be reduced to it). The criterion has to take two things into account:<\/p>\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\" cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"17%\"><\/td>\n<td width=\"1%\">\n<p style=\"margin-top: 1em\">\u2022<\/p>\n<\/td>\n<td width=\"5%\"><\/td>\n<td width=\"77%\">\n<p style=\"margin-top: 1em\">idling periods<\/p>\n<\/td>\n<\/tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"17%\"><\/td>\n<td width=\"1%\">\n<p>\u2022<\/p>\n<\/td>\n<td width=\"5%\"><\/td>\n<td width=\"77%\">\n<p>the ability to &#8220;look back&#8221;, so if during current active period the service curve is violated, maybe it isn\u2019t if we count excess bandwidth received during earlier active period(s)<\/p>\n<\/td>\n<\/tr>\n<\/table>\n<p style=\"margin-left:11%; margin-top: 1em\">Let\u2019s define the criterion as follows:<\/p>\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\" cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"17%\"><\/td>\n<td width=\"5%\">\n<p style=\"margin-top: 1em\"><b>(1)<\/b><\/p>\n<\/td>\n<td width=\"1%\"><\/td>\n<td width=\"77%\">\n<p style=\"margin-top: 1em\">For each t1, there must exist t0 in set B, so S(t1\u2212t0)\u00a0<=\u00a0w(t0,t1)<\/p>\n<\/td>\n<\/tr>\n<\/table>\n<p style=\"margin-left:11%; margin-top: 1em\">Here \u2019w\u2019 denotes the amount of service received during some time period between t0 and t1. B is a set of all times, where a session becomes active after idling period (further denoted as \u2019becoming backlogged\u2019). For a clearer picture, imagine two situations:<\/p>\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\" cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"17%\"><\/td>\n<td width=\"3%\">\n<p style=\"margin-top: 1em\"><b>a)<\/b><\/p>\n<\/td>\n<td width=\"3%\"><\/td>\n<td width=\"77%\">\n<p style=\"margin-top: 1em\">our session was active during two periods, with a small time gap between them<\/p>\n<\/td>\n<\/tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"17%\"><\/td>\n<td width=\"3%\">\n<p><b>b)<\/b><\/p>\n<\/td>\n<td width=\"3%\"><\/td>\n<td width=\"77%\">\n<p>as in (a), but with a larger gap<\/p>\n<\/td>\n<\/tr>\n<\/table>\n<p style=\"margin-left:11%; margin-top: 1em\">Consider <b>(a)<\/b>: if the service received during both periods meets <b>(1)<\/b>, then all is well. But what if it doesn\u2019t do so during the 2nd period? If the amount of service received during the 1st period is larger than the service curve, then it might compensate for smaller service during the 2nd period <i>and<\/i> the gap \u2212 if the gap is small enough.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">If the gap is larger <b>(b)<\/b> \u2212 then it\u2019s less likely to happen (unless the excess bandwidth allocated during the 1st part was really large). Still, the larger the gap \u2212 the less interesting is what happened in the past (e.g. 10 minutes ago) \u2212 what matters is the current traffic that just started.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">From HFSC\u2019s perspective, more interesting is answering the following question: when should we start transferring packets, so a service curve of a class is not violated. Or rephrasing it: How much X() amount of service should a session receive by time t, so the service curve is not violated. Function X() defined as below is the basic building block of HFSC, used in: eligible, deadline, virtual\u2212time and fit\u2212time curves. Of course, X() is based on equation <b>(1)<\/b> and is defined recursively:<\/p>\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\" cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"17%\"><\/td>\n<td width=\"1%\">\n<p style=\"margin-top: 1em\">\u2022<\/p>\n<\/td>\n<td width=\"5%\"><\/td>\n<td width=\"77%\">\n<p style=\"margin-top: 1em\">At the 1st backlogged period beginning function X is initialized to generic service curve assigned to a class<\/p>\n<\/td>\n<\/tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"17%\"><\/td>\n<td width=\"1%\">\n<p>\u2022<\/p>\n<\/td>\n<td width=\"5%\"><\/td>\n<td width=\"77%\">\n<p>At any subsequent backlogged period, X() is:<\/p>\n<\/td>\n<\/tr>\n<\/table>\n<p style=\"margin-left:23%;\"><b>min(X() from previous period ; w(t0)+S(t\u2212t0) for t>=t0),<\/b> <br \/> &#8230; where t0 denotes the beginning of the current backlogged period.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">HFSC uses either linear, or two\u2212piece linear service curves. In case of linear or two\u2212piece linear convex functions (first slope < second slope), min() in X\u2019s definition reduces to the 2nd argument. But in case of two\u2212piece concave functions, the 1st argument might quickly become lesser for some t>=t0. Note, that for some backlogged period, X() is defined only from that period\u2019s beginning. We also define X^(\u22121)(w) as smallest t>=t0, for which X(t)\u00a0=\u00a0w. We have to define it this way, as X() is usually not an injection.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">The above generic X() can be one of the following:<\/p>\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\" cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"17%\"><\/td>\n<td width=\"5%\">\n<p style=\"margin-top: 1em\">E()<\/p>\n<\/td>\n<td width=\"1%\"><\/td>\n<td width=\"77%\">\n<p style=\"margin-top: 1em\">In realtime criterion, selects packets eligible for sending. If none are eligible, HFSC will use linkshare criterion. Eligible time \u2019et\u2019 is calculated with reference to packets\u2019 heads ( et\u00a0=\u00a0E^(\u22121)(w) ). It\u2019s based on RT service curve, <i>but in case of a convex curve, uses its 2nd slope only.<\/i><\/p>\n<\/td>\n<\/tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"17%\"><\/td>\n<td width=\"5%\">\n<p>D()<\/p>\n<\/td>\n<td width=\"1%\"><\/td>\n<td width=\"77%\">\n<p>In realtime criterion, selects the most suitable packet from the ones chosen by E(). Deadline time \u2019dt\u2019 corresponds to packets\u2019 tails (dt\u00a0=\u00a0D^(\u22121)(w+l), where \u2019l\u2019 is packet\u2019s length). Based on RT service curve.<\/p>\n<\/td>\n<\/tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"17%\"><\/td>\n<td width=\"5%\">\n<p>V()<\/p>\n<\/td>\n<td width=\"1%\"><\/td>\n<td width=\"77%\">\n<p>In linkshare criterion, arbitrates which packet to send next. Note that V() is function of a virtual time \u2212 see <b>LINKSHARE CRITERION<\/b> section for details. Virtual time \u2019vt\u2019 corresponds to packets\u2019 heads (vt\u00a0=\u00a0V^(\u22121)(w)). Based on LS service curve.<\/p>\n<\/td>\n<\/tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"17%\"><\/td>\n<td width=\"5%\">\n<p>F()<\/p>\n<\/td>\n<td width=\"1%\"><\/td>\n<td width=\"77%\">\n<p>An extension to linkshare criterion, used to limit at which speed linkshare criterion is allowed to dequeue. Fit\u2212time \u2019ft\u2019 corresponds to packets\u2019 heads as well (ft\u00a0=\u00a0F^(\u22121)(w)). Based on UL service curve.<\/p>\n<\/td>\n<\/tr>\n<\/table>\n<p style=\"margin-left:11%; margin-top: 1em\">Be sure to make clean distinction between session\u2019s RT, LS and UL service curves and the above &#8220;utility&#8221; functions.<\/p>\n<h2>REALTIME CRITERION <a name=\"REALTIME CRITERION\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">RT criterion <i>ignores class hierarchy<\/i> and guarantees precise bandwidth and delay allocation. We say that a packet is eligible for sending, when the current real time is later than the eligible time of the packet. From all eligible packets, the one most suited for sending is the one with the shortest deadline time. This sounds simple, but consider the following example:<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Interface 10Mbit, two classes, both with two\u2212piece linear service curves:<\/p>\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\" cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"17%\"><\/td>\n<td width=\"1%\">\n<p style=\"margin-top: 1em\">\u2022<\/p>\n<\/td>\n<td width=\"5%\"><\/td>\n<td width=\"77%\">\n<p style=\"margin-top: 1em\">1st class \u2212 2Mbit for 100ms, then 7Mbit (convex \u2212 1st slope < 2nd slope)<\/p>\n<\/td>\n<\/tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"17%\"><\/td>\n<td width=\"1%\">\n<p>\u2022<\/p>\n<\/td>\n<td width=\"5%\"><\/td>\n<td width=\"77%\">\n<p>2nd class \u2212 7Mbit for 100ms, then 2Mbit (concave \u2212 1st slope > 2nd slope)<\/p>\n<\/td>\n<\/tr>\n<\/table>\n<p style=\"margin-left:11%; margin-top: 1em\">Assume for a moment, that we only use D() for both finding eligible packets, and choosing the most fitting one, thus eligible time would be computed as D^(\u22121)(w) and deadline time would be computed as D^(\u22121)(w+l). If the 2nd class starts sending packets 1 second after the 1st class, it\u2019s of course impossible to guarantee 14Mbit, as the interface capability is only 10Mbit. The only workaround in this scenario is to allow the 1st class to send the packets earlier that would normally be allowed. That\u2019s where separate E() comes to help. Putting all the math aside (see HFSC paper for details), E() for RT concave service curve is just like D(), but for the RT convex service curve \u2212 it\u2019s constructed using <i>only<\/i> RT service curve\u2019s 2nd slope (in our example <br \/> 7Mbit).<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">The effect of such E() \u2212 packets will be sent earlier, and at the same time D() <i>will<\/i> be updated \u2212 so the current deadline time calculated from it will be later. Thus, when the 2nd class starts sending packets later, both the 1st and the 2nd class will be eligible, but the 2nd session\u2019s deadline time will be smaller and its packets will be sent first. When the 1st class becomes idle at some later point, the 2nd class will be able to &#8220;buffer&#8221; up again for later active period of the 1st class.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">A short remark \u2212 in a situation, where the total amount of bandwidth available on the interface is larger than the allocated total realtime parts (imagine a 10 Mbit interface, but 1Mbit\/2Mbit and 2Mbit\/1Mbit classes), the sole speed of the interface could suffice to guarantee the times.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Important part of RT criterion is that apart from updating its D() and E(), also V() used by LS criterion is updated. Generally the RT criterion is secondary to LS one, and used <i>only<\/i> if there\u2019s a risk of violating precise realtime requirements. Still, the &#8220;participation&#8221; in bandwidth distributed by LS criterion is there, so V() has to be updated along the way. LS criterion can than properly compensate for non\u2212ideal fair sharing situation, caused by RT scheduling. If you use UL service curve its F() will be updated as well (UL service curve is an extension to LS one \u2212 see <b>UPPERLIMIT CRITERION<\/b> section).<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Anyway \u2212 careless specification of LS and RT service curves can lead to potentially undesired situations (see CORNER CASES for examples). This wasn\u2019t the case in HFSC paper where LS and RT service curves couldn\u2019t be specified separately.<\/p>\n<h2>LINKSHARING CRITERION <a name=\"LINKSHARING CRITERION\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">LS criterion\u2019s task is to distribute bandwidth according to specified class hierarchy. Contrary to RT criterion, there\u2019re no comparisons between current real time and virtual time \u2212 the decision is based solely on direct comparison of virtual times of all active subclasses \u2212 the one with the smallest vt wins and gets scheduled. One immediate conclusion from this fact is that absolute values don\u2019t matter \u2212 only ratios between them (so for example, two children classes with simple linear 1Mbit service curves will get the same treatment from LS criterion\u2019s perspective, as if they were 5Mbit). The other conclusion is, that in perfectly fluid system with linear curves, all virtual times across whole class hierarchy would be equal.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Why is VC defined in term of virtual time (and what is it)?<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Imagine an example: class A with two children \u2212 A1 and A2, both with let\u2019s say 10Mbit SCs. If A2 is idle, A1 receives all the bandwidth of A (and update its V() in the process). When A2 becomes active, A1\u2019s virtual time is already <i>far<\/i> later than A2\u2019s one. Considering the type of decision made by LS criterion, A1 would become idle for a long time. We can workaround this situation by adjusting virtual time of the class becoming active \u2212 we do that by getting such time &#8220;up to date&#8221;. HFSC uses a mean of the smallest and the biggest virtual time of currently active children fit for sending. As it\u2019s not real time anymore (excluding trivial case of situation where all classes become active at the same time, and never become idle), it\u2019s called virtual time.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Such approach has its price though. The problem is analogous to what was presented in previous section and is caused by non\u2212linearity of service curves:<\/p>\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\" cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"><\/td>\n<td width=\"3%\">\n<p style=\"margin-top: 1em\">1)<\/p>\n<\/td>\n<td width=\"3%\"><\/td>\n<td width=\"83%\">\n<p style=\"margin-top: 1em\">either it\u2019s impossible to guarantee service curves and satisfy fairness during certain time periods:<\/p>\n<\/td>\n<\/tr>\n<\/table>\n<p style=\"margin-left:17%; margin-top: 1em\">Recall the example from RT section, slightly modified (with 3Mbit slopes instead of 2Mbit ones):<\/p>\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\" cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"17%\"><\/td>\n<td width=\"1%\">\n<p style=\"margin-top: 1em\">\u2022<\/p>\n<\/td>\n<td width=\"5%\"><\/td>\n<td width=\"77%\">\n<p style=\"margin-top: 1em\">1st class \u2212 3Mbit for 100ms, then 7Mbit (convex \u2212 1st slope < 2nd slope)<\/p>\n<\/td>\n<\/tr>\n<tr valign=\"top\" align=\"left\">\n<td width=\"17%\"><\/td>\n<td width=\"1%\">\n<p>\u2022<\/p>\n<\/td>\n<td width=\"5%\"><\/td>\n<td width=\"77%\">\n<p>2nd class \u2212 7Mbit for 100ms, then 3Mbit (concave \u2212 1st slope > 2nd slope)<\/p>\n<\/td>\n<\/tr>\n<\/table>\n<p style=\"margin-left:17%; margin-top: 1em\">They sum up nicely to 10Mbit \u2212 the interface\u2019s capacity. But if we wanted to only use LS for guarantees and fairness \u2212 it simply won\u2019t work. In LS context, only V() is used for making decision which class to schedule. If the 2nd class becomes active when the 1st one is in its second slope, the fairness will be preserved \u2212 ratio will be 1:1 (7Mbit:7Mbit), but LS itself is of course unable to guarantee the absolute values themselves \u2212 as it would have to go beyond of what the interface is capable of.<\/p>\n<table width=\"100%\" border=\"0\" rules=\"none\" frame=\"void\" cellspacing=\"0\" cellpadding=\"0\">\n<tr valign=\"top\" align=\"left\">\n<td width=\"11%\"><\/td>\n<td width=\"3%\">\n<p style=\"margin-top: 1em\">2)<\/p>\n<\/td>\n<td width=\"3%\"><\/td>\n<td width=\"83%\">\n<p style=\"margin-top: 1em\">and\/or it\u2019s impossible to guarantee service curves of all classes at the same time [fairly or not]:<\/p>\n<\/td>\n<\/tr>\n<\/table>\n<p style=\"margin-left:17%; margin-top: 1em\">This is similar to the above case, but a bit more subtle. We will consider two subtrees, arbitrated by their common (root here) parent:<\/p>\n<p style=\"margin-left:17%; margin-top: 1em\">R (root) &#8211;\u00a010Mbit<\/p>\n<p style=\"margin-left:17%; margin-top: 1em\">A \u2212 7Mbit, then 3Mbit <br \/> A1 \u2212 5Mbit, then 2Mbit <br \/> A2 \u2212 2Mbit, then 1Mbit<\/p>\n<p style=\"margin-left:17%; margin-top: 1em\">B \u2212 3Mbit, then 7Mbit<\/p>\n<p style=\"margin-left:17%; margin-top: 1em\">R arbitrates between left subtree (A) and right (B). Assume that A2 and B are constantly backlogged, and at some later point A1 becomes backlogged (when all other classes are in their 2nd linear part).<\/p>\n<p style=\"margin-left:17%; margin-top: 1em\">What happens now? B (choice made by R) will <i>always<\/i> get 7 Mbit as R is only (obviously) concerned with the ratio between its direct children. Thus A subtree gets 3Mbit, but its children would want (at the point when A1 became backlogged) 5Mbit + 1Mbit. That\u2019s of course impossible, as they can only get 3Mbit due to interface limitation.<\/p>\n<p style=\"margin-left:17%; margin-top: 1em\">In the left subtree \u2212 we have the same situation as previously (fair split between A1 and A2, but violated guarantees), but in the whole tree \u2212 there\u2019s no fairness (B got 7Mbit, but A1 and A2 have to fit together in 3Mbit) and there\u2019s no guarantees for all classes (only B got what it wanted). Even if we violated fairness in the A subtree and set A2\u2019s service curve to 0, A1 would still not get the required bandwidth.<\/p>\n<h2>UPPERLIMIT CRITERION <a name=\"UPPERLIMIT CRITERION\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">UL criterion is an extensions to LS one, that permits sending packets only if current real time is later than fit\u2212time (\u2019ft\u2019). So the modified LS criterion becomes: choose the smallest virtual time from all active children, such that fit\u2212time < current real time also holds. Fit\u2212time is calculated from F(), which is based on UL service curve. As you can see, its role is kinda similar to E() used in RT criterion. Also, for obvious reasons \u2212 you can\u2019t specify UL service curve without LS one.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">The main purpose of the UL service curve is to limit HFSC to bandwidth available on the upstream router (think adsl home modem\/router, and linux server as NAT\/firewall\/etc. with 100Mbit+ connection to mentioned modem\/router). Typically, it\u2019s used to create a single class directly under root, setting a linear UL service curve to available bandwidth \u2212 and then creating your class structure from that class downwards. Of course, you\u2019re free to add a UL service curve (linear or not) to any class with LS criterion.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">An important part about the UL service curve is that whenever at some point in time a class doesn\u2019t qualify for linksharing due to its fit\u2212time, the next time it does qualify it will update its virtual time to the smallest virtual time of all active children fit for linksharing. This way, one of the main things the LS criterion tries to achieve \u2212 equality of all virtual times across whole hierarchy \u2212 is preserved (in perfectly fluid system with only linear curves, all virtual times would be equal).<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Without that, \u2019vt\u2019 would lag behind other virtual times, and could cause problems. Consider an interface with a capacity of 10Mbit, and the following leaf classes (just in case you\u2019re skipping this text quickly \u2212 this example shows behavior that <b><i>doesn\u2019t happen<\/i><\/b>):<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">A \u2212 ls 5.0Mbit <br \/> B \u2212 ls 2.5Mbit <br \/> C \u2212 ls 2.5Mbit, ul 2.5Mbit<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">If B was idle, while A and C were constantly backlogged, A and C would normally (as far as LS criterion is concerned) divide bandwidth in 2:1 ratio. But due to UL service curve in place, C would get at most 2.5Mbit, and A would get the remaining 7.5Mbit. The longer the backlogged period, the more the virtual times of A and C would drift apart. If B became backlogged at some later point in time, its virtual time would be set to (A\u2019s\u00a0vt\u00a0+\u00a0C\u2019s\u00a0vt)\/2, thus blocking A from sending any traffic until B\u2019s virtual time catches up with A.<\/p>\n<h2>SEPARATE LS \/ RT SCs <a name=\"SEPARATE LS \/ RT SCs\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">Another difference from the original HFSC paper is that RT and LS SCs can be specified separately. Moreover, leaf classes are allowed to have only either RT SC or LS SC. For interior classes, only LS SCs make sense: any RT SC will be ignored.<\/p>\n<h2>CORNER CASES <a name=\"CORNER CASES\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">Separate service curves for LS and RT criteria can lead to certain traps that come from &#8220;fighting&#8221; between ideal linksharing and enforced realtime guarantees. Those situations didn\u2019t exist in original HFSC paper, where specifying separate LS \/ RT service curves was not discussed.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Consider an interface with a 10Mbit capacity, with the following leaf classes:<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">A \u2212 ls 5.0Mbit, rt 8Mbit <br \/> B \u2212 ls 2.5Mbit <br \/> C \u2212 ls 2.5Mbit<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Imagine A and C are constantly backlogged. As B is idle, A and C would divide bandwidth in 2:1 ratio, considering LS service curve (so in theory \u2212 6.66 and 3.33). Alas RT criterion takes priority, so A will get 8Mbit and LS will be able to compensate class C for only 2 Mbit \u2212 this will cause discrepancy between virtual times of A and C.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Assume this situation lasts for a long time with no idle periods, and suddenly B becomes active. B\u2019s virtual time will be updated to (A\u2019s\u00a0vt\u00a0+\u00a0C\u2019s\u00a0vt)\/2, effectively landing in the middle between A\u2019s and C\u2019s virtual time. The effect \u2212 B, having no RT guarantees, will be punished and will not be allowed to transfer until C\u2019s virtual time catches up.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">If the interface had a higher capacity, for example 100Mbit, this example would behave perfectly fine though.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Let\u2019s look a bit closer at the above example \u2212 it &#8220;cleverly&#8221; invalidates one of the basic things LS criterion tries to achieve \u2212 equality of all virtual times across class hierarchy. Leaf classes without RT service curves are literally left to their own fate (governed by messed up virtual times).<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Also, it doesn\u2019t make much sense. Class A will always be guaranteed up to 8Mbit, and this is more than any absolute bandwidth that could happen from its LS criterion (excluding trivial case of only A being active). If the bandwidth taken by A is smaller than absolute value from LS criterion, the unused part will be automatically assigned to other active classes (as A has idling periods in such case). The only &#8220;advantage&#8221; is, that even in case of low bandwidth on average, bursts would be handled at the speed defined by RT criterion. Still, if extra speed is needed (e.g. due to latency), non linear service curves should be used in such case.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">In the other words: the LS criterion is meaningless in the above example.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">You can quickly &#8220;workaround&#8221; it by making sure each leaf class has RT service curve assigned (thus guaranteeing all of them will get some bandwidth), but it doesn\u2019t make it any more valid.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Keep in mind &#8211; if you use nonlinear curves and irregularities explained above happen <i>only<\/i> in the first segment, then there\u2019s little wrong with &#8220;overusing&#8221; RT curve a bit:<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">A \u2212 ls 5.0Mbit, rt 9Mbit\/30ms, then 1Mbit <br \/> B \u2212 ls 2.5Mbit <br \/> C \u2212 ls 2.5Mbit<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Here, the vt of A will &#8220;spike&#8221; in the initial period, but then A will never get more than 1Mbit until B &#038; C catch up. Then everything will be back to normal.<\/p>\n<h2>LINUX AND TIMER RESOLUTION <a name=\"LINUX AND TIMER RESOLUTION\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">In certain situations, the scheduler can throttle itself and setup so called watchdog to wakeup dequeue function at some time later. In case of HFSC it happens when for example no packet is eligible for scheduling, and UL service curve is used to limit the speed at which LS criterion is allowed to dequeue packets. It\u2019s called throttling, and accuracy of it is dependent on how the kernel is compiled.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">There\u2019re 3 important options in modern kernels, as far as timers\u2019 resolution goes: \u2019tickless system\u2019, \u2019high resolution timer support\u2019 and \u2019timer frequency\u2019.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">If you have \u2019tickless system\u2019 enabled, then the timer interrupt will trigger as slowly as possible, but each time a scheduler throttles itself (or any other part of the kernel needs better accuracy), the rate will be increased as needed \/ possible. The ceiling is either \u2019timer frequency\u2019 if \u2019high resolution timer support\u2019 is not available or not compiled in, or it\u2019s hardware dependent and can go <i>far<\/i> beyond the highest \u2019timer frequency\u2019 setting available.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">If \u2019tickless system\u2019 is not enabled, the timer will trigger at a fixed rate specified by \u2019timer frequency\u2019 \u2212 regardless if high resolution timers are or aren\u2019t available.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">This is important to keep those settings in mind, as in scenario like: no tickless, no HR timers, frequency set to 100hz \u2212 throttling accuracy would be at 10ms. It doesn\u2019t automatically mean you would be limited to ~0.8Mbit\/s (assuming packets at ~1KB) \u2212 as long as your queues are prepared to cover for timer inaccuracy. Of course, in case of e.g. locally generated UDP traffic \u2212 appropriate socket size is needed as well. Short example to make it more understandable (assume hardcore anti\u2212schedule settings \u2212 HZ=100, no HR timers, no tickless):<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">tc qdisc add dev eth0 root handle 1:0 hfsc default 1 <br \/> tc class add dev eth0 parent 1:0 classid 1:1 hfsc rt m2 10Mbit<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Assuming packet of ~1KB size and HZ=100, that averages to ~0.8Mbit \u2212 anything beyond it (e.g. the above example with specified rate over 10x larger) will require appropriate queuing and cause bursts every ~10 ms. As you can imagine, any HFSC\u2019s RT guarantees will be seriously invalidated by that. Aforementioned example is mainly important if you deal with old hardware \u2212 as is particularly popular for home server chores. Even then, you can easily set HZ=1000 and have very accurate scheduling for typical adsl speeds.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Anything modern (apic or even hpet msi based timers + \u2019tickless system\u2019) will provide enough accuracy for superb 1Gbit scheduling. For example, on one of my cheap dual-core AMD boards I have the following settings:<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">tc qdisc add dev eth0 parent root handle 1:0 hfsc default 1 <br \/> tc class add dev eth0 parent 1:0 classid 1:1 hfsc rt m2 300mbit<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">And a simple:<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">nc \u2212u dst.host.com 54321 <\/dev\/zero <br \/> nc \u2212l \u2212p 54321 >\/dev\/null<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">&#8230;will yield the following effects over a period of ~10 seconds (taken from \/proc\/interrupts):<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">319: 42124229 0 HPET_MSI\u2212edge hpet2 (before) <br \/> 319: 42436214 0 HPET_MSI\u2212edge hpet2 (after 10s.)<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">That\u2019s roughly 31000\/s. Now compare it with HZ=1000 setting. The obvious drawback of it is that cpu load can be rather high with servicing that many timer interrupts. The example with 300Mbit RT service curve on 1Gbit link is particularly ugly, as it requires a lot of throttling with minuscule delays.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Also note that it\u2019s just an example showing the capabilities of current hardware. The above example (essentially a 300Mbit TBF emulator) is pointless on an internal interface to begin with: you will pretty much always want a regular LS service curve there, and in such a scenario HFSC simply doesn\u2019t throttle at all.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">300Mbit RT service curve (selected columns from mpstat \u2212P ALL 1):<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">10:56:43 PM CPU %sys %irq %soft %idle <br \/> 10:56:44 PM all 20.10 6.53 34.67 37.19 <br \/> 10:56:44 PM 0 35.00 0.00 63.00 0.00 <br \/> 10:56:44 PM 1 4.95 12.87 6.93 73.27<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">So, in the rare case you need those speeds with only a RT service curve, or with a UL service curve: remember the drawbacks.<\/p>\n<h2>CAVEAT: RANDOM ONLINE EXAMPLES <a name=\"CAVEAT: RANDOM ONLINE EXAMPLES\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">For reasons unknown (though well guessed), many examples you can google love to overuse UL criterion and stuff it in every node possible. This makes no sense and works against what HFSC tries to do (and does pretty damn well). Use UL where it makes sense: on the uppermost node to match upstream router\u2019s uplink capacity. Or in special cases, such as testing (limit certain subtree to some speed), or customers that must never get more than certain speed. In the last case you can usually achieve the same by just using a RT criterion without LS+UL on leaf nodes.<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">As for the router case &#8211; remember it\u2019s good to differentiate between &#8220;traffic to router&#8221; (remote console, web config, etc.) and &#8220;outgoing traffic&#8221;, so for example:<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">tc qdisc add dev eth0 root handle 1:0 hfsc default 0x8002 <br \/> tc class add dev eth0 parent 1:0 classid 1:999 hfsc rt m2 50Mbit <br \/> tc class add dev eth0 parent 1:0 classid 1:1 hfsc ls m2 2Mbit ul m2 2Mbit<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">&#8230; so &#8220;internet&#8221; tree under 1:1 and &#8220;router itself&#8221; as 1:999<\/p>\n<h2>LAYER2 ADAPTATION <a name=\"LAYER2 ADAPTATION\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">Please refer to <b>tc\u2212stab<\/b>(8)<\/p>\n<h2>SEE ALSO <a name=\"SEE ALSO\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\"><b>tc<\/b>(8), <b>tc\u2212hfsc<\/b>(8), <b>tc\u2212stab<\/b>(8)<\/p>\n<p style=\"margin-left:11%; margin-top: 1em\">Please direct bugreports and patches to: <netdev@vger.kernel.org><\/p>\n<h2>AUTHOR <a name=\"AUTHOR\"><\/a> <\/h2>\n<p style=\"margin-left:11%; margin-top: 1em\">Manpage created by Michal Soltys (soltys@ziu.info)<\/p>\n<hr>\n","protected":false},"excerpt":{"rendered":"<p>  tc-hfcs \u2212 Hierarchical Fair Service Curve <\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[971],"tags":[973,972,389],"class_list":["post-3878","post","type-post","status-publish","format-standard","hentry","category-7-miscelanea","tag-973","tag-man7","tag-tc-hfsc"],"gutentor_comment":0,"_links":{"self":[{"href":"https:\/\/lode.uno\/linux-man\/wp-json\/wp\/v2\/posts\/3878","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lode.uno\/linux-man\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lode.uno\/linux-man\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lode.uno\/linux-man\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lode.uno\/linux-man\/wp-json\/wp\/v2\/comments?post=3878"}],"version-history":[{"count":0,"href":"https:\/\/lode.uno\/linux-man\/wp-json\/wp\/v2\/posts\/3878\/revisions"}],"wp:attachment":[{"href":"https:\/\/lode.uno\/linux-man\/wp-json\/wp\/v2\/media?parent=3878"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lode.uno\/linux-man\/wp-json\/wp\/v2\/categories?post=3878"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lode.uno\/linux-man\/wp-json\/wp\/v2\/tags?post=3878"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}