﻿
Paper Review: Active Queue Management with Non-Linear Packet Dropping Function

# Paper Review: Active Queue Management with Non-Linear Packet Dropping Function

As promised in the previous article, I plan to review Reference 2, Active Queue Management with Non-Linear Packet Dropping Function, by D. Augustyn, A. Domanski, and J. Domanska, published in HET-NETs 2010, which discusses a change in the structure of the packet drop probability function using the average queue length in a buffer. I mentioned previously that choosing a linear function of the average queue length can be viewed as a bit of an arbitrary choice, since we’re designing a control mechanism here, and this paper attempts to define a new form of this packet drop probability function.

In summary, the best lesson one can take from this paper is that publication in a journal or conference proceedings does not guarantee that the paper withstands scrutiny. The paper is linked above for the interested reader to peruse himself, and to investigate the claims.

## Summary

The paper intended to give a new function to calculate the probability of proactively dropping a packet in a queue in order to prevent a full buffer. It seemed to be presented as an alternative to RED, described in my previous article. The authors define this new function, then set up a simulation in order to examine the effects.

## When Orthogonal is Abused

The authors describe using a finite linear combination of orthogonal basis polynomials defined on a finite interval as the underlying mathematical structure.

First, we should discuss what we mean by orthogonal in context of functions. Orthogonal is most commonly understood in terms of vectors, and when we’re in two dimensions, orthogonal becomes our familiar perpendicular.

### Orthogonality

Beginning with the familiar notion of perpendicular, we can generalize this to understand orthogonality. The geometric interpretation of two vectors  being perpendicular is that the angle between them is $90^{\circ}$. Once we leave two and three dimensions (or jump to the space of polynomials, as we’ll do soon), the concept of an angle isn’t as helpful.

Another way to define perpindicular is through an operation known as the dot productSuppose we take two 2D vectors, $\mathbf{x}$ and $\mathbf{y}$. Each vector will have coordinates: $\mathbf{x} = (x_{1},x_{2})$ and $\mathbf{y} = (y_{1}, y_{2})$. The dot product is a special type of multiplication defined on vectors, and denoted $\cdot$:

$$\mathbf{x}\cdot\mathbf{y} = x_{1}y_{1} + x_{2}y_{2}$$

The dot product can be described in words as the sum of the component-wise multiplication of the coordinates of the two vectors.

Now, we can say that two vectors are perpendicular if their dot product is 0. That is, $\mathbf{x}$ and $\mathbf{y}$ are perpindicular if $\mathbf{x}\cdot\mathbf{y} = 0$. (This article gives a nice overview of how we move from the algebraic to the geometric definition of perpendicular and orthogonal.)

Remember, perpendicular doesn’t make sense once we get into higher dimensions. Orthogonal is a more general notion of vectors being perpendicular, and is defined for two vectors (of any length) as their dot product equalling zero.

### From Dot Product to Inner Product

The dot product is used on vectors, and defines another type of product that is different from the scalar multiplication we know. In fact, we can generalize the notion of a dot product to something called an inner product, which can be defined on many different spaces than just vectors. We can define operations and products however we like, but for our definition to qualify as an inner product (denoted $\langle \cdot, \cdot\rangle$), it must meet certain criteria

For instance, on the set of real valued functions with domain $[a,b]$, we define the inner product of two functions $f(x)$ and $g(x)$ as

$$\langle f, g\rangle := \int_{a}^{b}f(x)g(x)dx$$

The concept of orthogonality generalizes to an inner product as well. If the inner product of two functions is zero (as defined above), we say the functions are orthogonal.

### Back to the paper

The authors claim to be using a set of orthogonal polynomials to define their drop probability function, and they give the structure of such functions. For a domain $[a,b]$, and for $\phi_{j}$ in the set of polynomials, they define $\phi_{j} = (x-a)^{j-1}(b-x)$. So, for example, $\phi_{1} = (b-x)$, and $\phi_{5} = (x-a)^{4}(b-x)$

Now, in order to be an orthogonal basis for a space1, the set of functions that are claimed to form the basis of the set must be pairwise orthogonal. That is, I need to be able to take the inner product of any two functions $\phi_{i}$ and $\phi_{j}$ and get 0. If that isn’t true for even one pair, then the set is not orthogonal.

As it turns out, if we take the inner product of any two functions in the basis set over the domain given, we find that there are no pairs that are orthogonal. To do this in general, we compute the integral

$$\int_{a}^{b}(x-a)^{i-1}(b-x)\cdot (x-a)^{j-1}(b-x)dx$$

The integral computation is one of simple polynomial integration, and can be done either by hand or using your favorite software (Mathematica) of choice. What we find here is that this set of functions defined in general this way is never orthogonal, yet the paper claims they are.

Applying to the particular situation of designing a drop probability function, they give the following for average queue length thresholds $T_{\min}$ and $T_{\max}$ $$p(x,a_{1},a_{2}) = \left\{\begin{array}{lr}0, &x < T_{\min}\\\phi_{0} + a_{1}\phi_{1}(x) + a_{2}\phi_{2}(x),&T_{\min}\leq x \leq T_{\max}\\1,&x > T_{\max}\end{array}\right.$$

where the basis functions are

\begin{aligned}\phi_{0}(x) &= p_{m}\frac{x-T_{\min}}{T_{\max}-T_{\min}}\\\phi_{1}(x) &= (x-T_{\min})(T_{\max}-x)\\\phi_{2}(x) &= (x-T_{\min})^{2}(T_{\max}-x)\end{aligned}

The reader will recognize $\phi_{0}$ as the original drop probability function from the RED algorithm. These functions are absolutely not orthogonal though (as the authors claim), and a simple check as we did above will verify it.

## Other issues

Another issue is that these mysterious coefficients $a_{1}$ and $a_{2}$ need to be determined. How, you ask? The authors do not say, other than to note that one can define “a functional” implicit on the unknown $p(x,a_{1},a_{2})$ that can be minimized to find the optimal values for those coefficients. They write that this mysterious functional2 can be based on either the average queue length or average waiting time, yet provide no details whatsoever as to the functional they have chosen for this purpose. They provide a figure with a sample function, but give no further details as to how it was obtained.

One other issue I have in their methodology is discussing the order of estimation. For those familiar with all sorts of ways to estimate unknown functions, from Taylor series, to splines, to Fourier series, we know that a function is exactly equal to an infinite sum of such functions. Any finite sum is an estimation, and the number of terms we use for estimation with a desired accuracy may change with the function being estimated.

For instance, if I want to use Taylor series (a linear combination of polynomials) to estimate a really icky function, how many terms should I include to ensure my accuracy at a point is within 0.001 of the real value? It depends on the function.3.

The authors simply claim that a second order term is appropriate in this case. The issue I take with that is these queueing management drop probability functions are designed. We’re not estimating a function describing a phenomenon, we are designing a control policy to seek some sort of optimal behavior. Fundamentally, the authors posing this as an estimation problem of some unknown drop probability function is incorrect. This isn’t a law of nature we seek to observe; it’s a control policy we seek to design and implement and optimize. Using language that implies the authors are estimating some “true function” is misleading.

Regarding the simulation itself, since the design was arbitrary, and not based on sound mathematical principles, I cannot give any real comment to the simulations and the results. The authors briefly discuss and cite some papers that explore the behavior of network traffic, and claim to take this into account in their simulations. To those, I cannot comment yet.

## Conclusion

Always verify a paper for yourself, and don’t accept anything at face value. Research and technical publications should be completely transparent as to choices and methodologies (and obviously free of mathematical inaccuracies), and derivations and proofs should be present when necessary, even if in an appendix.

#### Footnotes

1. This discussion on basis functions can get very deep, and the linear algebra can get heavy-ish. Right now I’m really focused on pointing out errors in the paper, so don’t worry too much about the rigorous details of estimating a function by a basis set.
2. I did not make a typo. A functional is a mapping that takes functions as input.
3. Many engineers use a first or second order approximation as a rule of thumb, though it really should still be verified that higher order terms contribute a negligible amount first