### Browsed byCategory: Uncategorized

Exploiting Chemistry for Better Packet Flow Management 1: Introduction

## Exploiting Chemistry for Better Packet Flow Management 1: Introduction

Perhaps the phrase “don’t reinvent the wheel” is overused. However, many newer disciplines, particularly in the technology sector, seem to insist on it. One thing physical engineers learned long ago was to study the world around them, work with it, and emulate it in their designs. Network engineering should be no different. In a technical report from 2011, authors Thomas Meyer and Christian Tschudin from the University of Basel describe a highly elegant natural flow management method [11] that exploits much of the hard work done in the chemistry sector in chemical kinetics. They describe a scheduling approach that creates an artificial chemistry as an analogue of a queueing network, and uses the Law of Mass Action to schedule events naturally. The analysis of such networks utilizing their implementation is simplified regardless of the depth one wishes to go into. In addition, they show a congestion control algorithm based on this framework is TCP fair and give evidence of actual implementation (a relief to practitioners).

This report will discuss their paper at length with a goal of covering not only their work, but the underlying ideas. Since the paper requires knowledge of chemical kinetics, probability, queueing theory, and networking, it should be of interest to specialists in these disciplines, but a comprehensive discussion of the fundamentals glossed over in the paper would make dissemination more likely.

Note: the full pdf of the report is attached to each section of this series at the bottom.

## Overall Review of the Paper

The paper is well written, readable, and quite clear despite the breadth of scope it contains. It’s a major benefit to the authors and readers that their theoretical model has been tested in a proof-of-concept network. The work shows much promise for the networking space as a whole.

## Overview of the Paper and Chemistry Basics

Just as in chemistry and physics, packet flow in a network has microscopic behavior controlled by various protocols, and macro-level dynamics. We see this in queueing theory as well–we can study (typically in steady-state to help us out, but in transient state as well) the stochastic behavior of a queue, but find in many cases that even simple attempts to scale the analysis up to networks (such as retaining memorylessness) can become overwhelming. What ends up happening in many applied cases is a shift to an expression of the macro-level properties of the network in terms of average flow. The cost of such smoothing is an unpreparedness to model and thus deal effectively with erratic behavior. This leads to overprovisioning and other undesirable and costly design choices to mitigate those risks.

Meyer and Tschudin have adapted decades of work in the chemical and physical literature to take advantage of the Law of Mass Action, designing an artifical chemistry that takes an unconventional non-work-conserving approach to scheduling. Non-work-conserving queues add a delay to packets and have tended to be avoided for various reasons, typically efficiency. Put simply, they guarantee a constant wait time of a packet regardless of the number of packets in a queue by varying the processing rate with fill level of the queue. The more packets in queue, the faster the server processes those packets.

### Law of Mass Action in Chemistry

If we have some chemical reaction with reactants $A_{1}$, $A_{2}$,$\ldots$, $A_{n}$ and products $B_{1}$, $B_{2}$, $\ldots$, $B_{m}$, and the reaction is only forward1, then we may express the reaction as

$$A_{1} + A_{2} + \ldots + A_{n} \longrightarrow_{k} B_{1} + B_{2} + \ldots + B_{n}$$

where $k$ is the rate constant. In a simple reaction $A \to P$, with $P$ as the product, we can see the rate expressed nicely in a very basic differential equation form[9]:

$$-\frac{\text{d}c_{A}}{\text{d}t} = k\cdot c_{A}$$

This should actually look somewhat similar to problems seem in basic calculus courses as well. The rate of change of draining the reactant is a direct function of the current concentration.

The reaction rate $r_{f}$ of a forward reaction is proportional to the concentrations of the reactants:

$$r_{f} = k_{f}c_{A_{1}}c_{A_{1}}\cdots c_{A_{N}}$$

for a set of reactants $\{A_{i}\}$.

### The Queuing Analogue and Assumptions

Meyer and Tschudin[11] express the networking version of these chemical reactions in a very natural way. Packets are molecules. A molecular species is a queue, so molecules of species $X$ go into queue $X$. The molecular species is a temporary buffer that stores particular packets types until they are consumed by a reaction (processed by some server in the queueing space). FIFO (first-in-first-out) discipline is assumed.

The figure above from the technical report shows how a small system of reactions looks in the chemical space and the queuing space. Where analysis and scheduling can get complicated is in the coupled nature of the two reactions. The servers both drain packets from queue $Y$, so they are required to coordinate their actions in some way. It’s important to note here that this equivalence rests on treating the queuing system as M/M/1 queues with a slightly modified birth-death process representation.

Typically, in an M/M/1 queue, the mean service rate is constant. That is, the service rate is independent of the state the birth-death process is in. However, if we model the Law of Mass Action using a birth-death process, we’d see that the rate of service (analogously, the reaction rate) changes depending on the fill-level of the queue (or concentration of the reactant). We’ll investigate this further in the next sections, discussing their formal analysis.

## Related Work and Precedent

The authors noted that adding packet delay is not unheard of in the networking space. Delay Frame Queuing[12] utilizes non-work-conserving transmission at edge nodes in an ATM network in order to guarantee upper bounds on delay and jitter for virtual circuits. Three other researchers in 2008 (Kamimura et al) proposed a Constant Delay Queuing policy that assigns a constant delay to each packet of a particular priority stream and forward other best-effort packets during the delay[8].

## Continuation

The next article will discuss the formal mathematical model of an artificial packet chemistry.

## References

1. Dittrich, P., Ziegler, J., and Banzhaf, W. Artificial chemistries – a review. Artificial Life 7(2001), 225–275.
1. Feinburg, M. Complex balancing in general kinetic systems. Archive for Rational Mechanics and Analysis 49 (1972).
2. Gadgil, C., Lee, C., and Othmer, H. A stochastic analysis of first-order reaction networks. Bulletin of Mathematical Biology 67 (2005), 901–946.
3. Gibson, M., and Bruck, J. Effcient stochastic simulation of chemical systems with many species and many channels. Journal of Physical Chemistry 104 (2000), 1876–1889.
4. Gillespie, D. The chemical langevin equation. Journal of Chemical Physics 113 (2000).
5. Gillespie, D. The chemical langevin and fokker-planck equations for the reversible isomerizationreaction. Journal of Physical Chemistry 106 (2002), 5063–5071.
6. Horn, F. On a connexion between stability and graphs in chemical kinetics. Proceedings of the RoyalSociety of London 334 (1973), 299–330.
7. Kamimura, K., Hoshino, H., and Shishikui, Y. Constant delay queuing for jitter-sensitive iptvdistribution on home network. IEEE Global Telecommunications Conference (2008).
8. Laidler, K. Chemical Kinetics. McGraw-Hill, 1950.
9. McQuarrie, D. Stochastic approach to chemical kinetics. Journal of Applied Probability 4 (1967), 413–478.
10.  Meyer, T., and Tschudin, C. Flow management in packet networks through interacting queues and law-of-mass-action-scheduling. Technical report, University of Basel.
11.  Pocher, H. L., Leung, V., and Gilles, D. An application- and management-based approach to atm scheduling. Telecommunication Systems 12 (1999), 103–122.
12. Tschudin, C. Fraglets- a metabolistic execution model for communication protocols. Proceedings of the 2nd annual symposium on autonomous intelligent networks and systems (2003).

The Hathlor Classification System

## The Hathlor Classification System

Many researchers have their own libraries, and The Math Citadel is no different. Both Jason and I have spent many hours buried in the shelves of bookstores new and used, the stacks of university library shelves, and the rows of books in public libraries across four states now. During this time, we’ve amassed our own collection of new, used, out-of-print, and rare math and engineering texts. Some we acquired as part of our formal studies, most are acquired tangentially, and sometimes haphazardly.

As we’re both very interested in organization and cataloguing in general, we noticed the different methods libraries tend to use to shelve their vast collections, particularly the technical books. Most university libraries we’ve visited utilize the more modern Library of Congress classification system, while public libraries tend to be a mix of the Dewey Decimal System and the Library of Congress system, varying by library.

What we ultimately realized in our time exploring these shelves is that both of these methods for cataloguing books are insufficient for our needs. One of our biggest desires in a classification system was a way to determine some of the content of the text in a database or card catalogue listing, without necessarily having to pull the book from the shelf (which may be in storage). One other shortcoming we noticed is that neither system accounts for a “continuum-style” classification. As an example, many abstract algebra texts contain chapters on number theory, but some do not. Perhaps two different abstract algebra texts have different focus topics; one may focus more heavily on group theory, while another spends much of its pages discussing advanced matrix theory. The titles do not always reflect this, nor do the classification codes used by either the Dewey Decimal System nor the Library of Congress system.

So, we invented our own. Born of months of discussion and trial, we finally settled on a new one that takes the best aspects of both original systems, and also expands to include additional information. We’ll begin by briefly describing the two original classification systems, then describe our new Hathlor system. Our system is customizable to suit anyone’s needs, and we’re excited to share a passion project of ours.

## The Dewey Decimal System

This system is the one most are familiar with from school libraries. It was created by Melvil Dewey in 1876, and introduced a relative index that allowed for library expansion. Wikipedia has a great account of the history and development of the system, so we’ll not dive too deeply here; we’ll merely describe it. The system separates books by discipline, ten major classes overall, each divided into ten sections. It’s a hierarchical classification system that endeavors to classify a book as specifically as possible by adding classes after the decimal points.

Wikipedia gives an example of a book on Finsler geometry:

1. The book’s broadest class is 500-Natural Sciences and Mathematics
2. Next, it’s specifically a mathematics text (not physics or biology), which is the first section, so we are at 510
3. Within mathematics, its topic is geometry, the 6th topic listen, thus 516
4. Now we may move to decimal places to further classify as specifically as we like. This text is an analytic geometry, the third subtopic in geometry. Thus, we may now list the text as 516.3
5. We wish to further classify, so among analytic geometries, this one discusses metric differential geometries, the 7th subdivision of analytic geometry. If the library classes the books to two decimal places, the code is 516.37
6. We may even wish to subdivide metric differential geometry and move to three decimal places, finishing with 516.375–its final classification code.

### Pros

This system allows very fine granularity and specificity in classification; one could decide to divide further and move to four, five, or more decimal places. It provides a total ordering, so shelving is sensical.

### Cons

One thing I personally don’t like about this system is the length the codes can reach without really offering tons more information. It breaks single topics down into very fine divisions if desired, but one would need to know the possible subdivisions to look at the code of a book and discern all that information. One other issue is that the book is restricted to one and only one class. Many mathematics and engineering texts don’t fit neatly into one class, division, or subdivision. A book on algebraic statistics would span abstract algebra and statistics, yet under the Dewey Decimal system, I’m not only forced to classify it under one or the other, but the “other” I don’t use disappears, and the text can get lost. A statistician might not always just browse the algebra shelf, yet this text might be of interest to him.

## Library of Congress

The LCC was developed by the Library of Congress, and is the most common system deployed in academic and research libraries in the United States. Herbert Putnam invented the system in 1897. For additional history, check here. This system has a larger number of broad classes, and uses letters to denote general classes and narrower topics. There are 21 broad classes, using all letters of the alphabet excepting I, O, W, X, and Y. Each broad class is divided into subclasses, though the number of subclasses may vary. For example, Class B (Philosophy, Psychology, and Religion) is divided into 15 subclasses, one of which is BV-Practical Theology. Class Q (Science) is divided into 12 subclasses, one of which is QA-Mathematics.

Underneath these broad classes is a narrower topic of 4 digits, then a cutter number that represents the author, corporation, or title, and finally the last line notes the year of publication.

As an example, from the University of Mississippi library

Title: Price Control under Fair Trade Legislation

Author: Ewald T. Grether

1. This text falls under HF (Social Sciences → Commerce).
2. The library codes it 5415, signaling Business → Marketing → General Works
3. Next, the cutter number for Grether is .G67, and finally, the year of publication is 1939.

In summary, the LCC call number would look like this:

HF
5415
.G67
1939

### Pros:

There are more classes than the Dewey Decimal system.

### Cons:

We thoroughly dislike this system for a number of reasons. The primary concern is that looking at the call number on the spine of the book tells very little useful information, and gives some information that is irrelevant to evaluation of the content, such as the cutter number and publication year. It also shares the same issue as the Dewey Decimal system in that multiple subjects contained in a text are lost by forcing the text into one class.

In both of these systems, it’s difficult to determine where some more multidisciplinary or multitopic texts might be found without knowing the exact name of the text. It would be impossible to use either of these systems to locate some text that discusses graph theory and algorithms, but also mentions applications to chemistry. It is with this motivation that we devised our new system.

## The Hathlor Classification Codes

The main goal of the new system was to allow a user without intimate familiarity with the minutest details of the scheme, but with knowledge of the specific subjects, to discern information from the spine or given code of a text. Neither LCC nor DDC give this benefit. We describe our system and reasoning here.

### Hierarchical Yet Lateral

We have divided material in the library into a hierarchical set of groups. The largest group a text can be a member of is the Subject. Our library currently contains 5 subjects, each given a sensical one or two-letter code:

SubjectCode
ChemistryC
Computer ScienceCS
EngineeringE
MathematicsM
PhysicsP

Within each Subject are a number of Topics, where each topic has a three letter code. Again, these codes were designed sensibly, so that one can infer the topic from the three letter code (as opposed to needing to know that QA is Mathematics in the LCC system, for example). To illustrate, the table below shows how we divided Mathematics into ten Topics

TopicCode
Applied and Engineering Mathematics AEM
AlgebraALG
AnalysisANA
Differential/Integral EquationsDIE
Discrete MathematicDSC
FundamentalsFUN
GeometryGEO
Number TheoryNUT
Statistics/Probability TheorySPT
TopologyTOP

Finally, within each Topic are a varying number of Subtopics. Texts can contain many Subtopics, and in the spirit of the MAC address, we give a variable length binary code that indicates which Subtopics are contained. For example, the Topic of Fundamentals contains four Subtopics: Calculus in bit 1, Logic in bit 2, Set Theory in bit 3, and Trigonometry in bit 4.

A general code will look like this:

Subject – Topic.XXXX

where the single-letter subject code is given, the three letter topic code is given, and the X is an indicator of whether or not a particular subtopic is contained in the work. Note that the number of X’s may vary by topic.

Example: Strang’s Linear Algebra and its Applications is a basic linear algebra text covering fundamental matrix theory, Gaussian elimination, linear systems, orthogonal projections, determinants, eigenvalues and eigenvectors, and positive-definite matrices. Subtopics in Algebra are Abstract Algebra, Category Theory, Linear Algebra, and Matrix Theory, in that order (alphabetically).

To code this book, we note that its subject is Mathematics, with topic Algebra, and containing linear algebra and matrix theory, but not Abstract Algebra, or Category Theory.

Thus, the Hathlor code for the book is

M-ALG.0011

### Cross-Topic Texts

Many texts in mathematics contain material that spans topics. Our codes reflect this, and may be extended to include those additional topics in descending order of importance or inclusion to the work, separating the Topic.Subtopic codes by a colon. One may do this as many times as necessary to encapsulate all appropriate topics. Thus, we see that now the code may take the form of

Subject-PrimaryTopic.XXXX:SecondaryTopic.XX:…

Example: Apostol’s Linear Algebra: A First Course with Applications to Differential Equations is primarily a linear algebra text, but does have a significant discussion of differential equations. Thus, the subject is still clearly Mathematics, but now we have a primary topic of Algebra and a secondary topic of Differential/Integral Equations. Differential/Integral Equations has three subtopics: integral equations, ordinary differential equations, and partial differential equations. Apostol only addresses the second of these in his text.

Thus, the Hathlor code for this book is

M-ALG.0010:DIE.010

As a note, one can have as many topics indicated as necessary. Joyner’s Adventures in Group Theory is an engaging, multi-topic text that touches on many fundamental areas of mathematics, presenting abstract algebra matrix theory, logic, set theory, and a tiny bit of graph theory through the lens of recreational mathematics. Its Hathlor code is M-ALG.1001:FUN.0110:DSC.0100.

### Final Generalization: Cross-Subject Texts

We also noticed in reading many technical works that authors may span entire subjects. Some texts in analysis focus only on the pure mathematics, and others discuss applications of analysis to physics or electrical engineering. The Hathlor codes account for this as well, giving the book a primary subject and as many subsequent subjects as necessary. One finishes the entire subject-topic-subtopic classification prior to moving onto the next, noting a new subject by the set <> of symbols. The final general Hathlor code thus takes on the following form:

PrimarySubject-PrimaryTopic.XXX:SecondaryTopic.XXXX:…<>SecondarySubject.PrimaryTopic.XXXX:…

Example: Hohn’s Applied Boolean Algebra discusses Boolean algebra and some of its applications, particularly in electrical engineering and relay circuits.The text is primarily a mathematical treatment, but it would be foolish not to note its electrical engineering motivations. Thus is spans the subjects of mathematics and engineering. Its Hathlor code is

M-DSC.1000<>E-ELE.1

### Shelving: Simple Lexicographic Order

Shelving books by the Hathlor classification code is done in lexicographic order, moving left to right, maintaining a simple, unambiguous shelving system. Thus, all chemistry books are shelved before all computer science books. Within each subject, we move by alphabetical order by topic according to the three letter codes. All (Primary) Algebra (ALG) books come before all (primary) Analysis (ANA) books. Within each topic, we shelve by the binary indicators. Thus, M-ALG.1000 comes before M-ALG.01001. If the text contains secondary/tertiary topics, we file those after those with no subsequent topics, the same way that “can” precedes “candle” in a dictionary. The secondary topics are filed in alphabetical order, and so forth. The texts that span subjects then follow those that do not span subjects, still shelved by their primary subject.

## Why Reinvent the Wheel?

As mentioned before, we felt that the current systems in use both omit useful information regarding the content of the works and add extra information a user doesn’t typically care about, such as the LCC’s cutter number. In addition, a researcher or browser may simply have a general idea of the types of things he would like a text to contain, but neither the DDC nor the LCC provides a simple way to search for such things. Ours provides a way to search via a simple regular expression query, returning a set of texts previously unknown to the user that fit the subjects, topics, and subtopics he seeks, particularly books that contain all he seeks.

Though neither of us have formally worked in a library, nor to either of us hold any formal degrees in library science, we both have a deep and abiding passion for reading, mathematics, organization, and classification. Beyond that, our collection now spans over 220 books, so we definitely needed a better way to shelve our library. We both think it’s a pity that very few technical libraries outside of universities exist anymore, particularly in the private sector. Company libraries for engineers and scientists such as the (highly underrated and unknown) library at NASA Ames Research Center should be cared for, revitalized, and restarted. We’re happy to help with that.

–Rachel Traylor and Jason Hathcock

Dialogue: What do We Mean by Predictive Analytics?

## Dialogue: What do We Mean by Predictive Analytics?

Predictive analytics is a phrase that gets used often as a feature in many tech offerings. Predicting when a problem is likely to occur allows either a human or an automated system to take some action to mitigate a potential issue before it occurs and has potentially catastrophic effects. If we take data storage, for example, the worst thing that can happen on your storage array is the loss of data. Data loss is permanent (unless you have a backup), so when it’s gone, it’s gone1

Mitigating problems proactively involves modeling system behavior (whatever that system happens to be) according to some metric or set of metrics, then evaluating new data as it arrives and deciding if this new data is “abnormal” or indicative of a potential problem. This means there are three steps –modeling the behavior of the system or aspect of the system you are trying to monitor, determining what is normal (and by complement, abnormal), and then deciding how you will handle an abnormal data point/event/other metric.

These three steps, while simple to articulate, are three very difficult and distinct problems that each involve a whole host of considerations that are themselves interrelated. For instance2:

• How do we decide which metrics to use in our modeling?
• How do we verify that this model is an accurate representation of system behavior?
• Once we have a model, how do we define unusual or anomalous behavior
• Once we define anomalous behavior, how do we decide our courses of action? Do we act on any “weird” point that crosses some threshold, or should we see the threshold crossed repeatedly, or something else?

Proactively mitigating system issues is a well-justified desire of many companies, because it increases system reliability. I watched Starwind present their Virtual Tape Library at Storage Field Day 15 , and they, like many other companies, strive to create a way to detect impending failure patterns and take preventative measures before a catastrophic failure. The presentation is only two hours long, and covered their entire architecture, not just the specific feature regarding failure pattern detection, so we were unable to take the time to discuss the specifics of Starwind’s Proactive Support, as they call it.

Detecting any kind of pattern in data is difficult, especially a failure pattern. There are always tradeoffs. If we set our tolerance for what we consider “normal behavior” to be too low, we risk alerting on potential issues too often. When this happens, alerts get ignored, and real problems are assumed to be just another “false alarm.” On the other hand, if we set the tolerance for what we consider normal too high, we run the risk of not detecting an issue at all.

At this point, I’d like to open dialogue in the comments, particularly because these subjects are deep; in many cases so deep an entire two-hour presentation can be devoted to just a few aspects of this very large challenge. How do we balance these tradeoffs? How do we decide whether an unusual data point or set of points is really something bad? Is it possible if we are generating too many “false alarms”, that our original behavior model is off?