I came across a situation recently where the first stage of a multi-stage batch process provided statistics on the input data. The statistics will vary by input data. Subsequent processing stages would benefit from adapting (in the Adaptive Learning[.doc] sense, not Adaptive Modeling to the statistics by allocating system resources. Static configuration values for buffers and threads are easily implemented. One could (and I did) test various permutations with a perl script and a lot of time. The missing details however are the variability of data (not known until the app is deployed in the field) and the current runtime environment (cpu utilization, number of active database connections, etc). All of this dynamic data could be used to determine an optimal (or near- optimal) configuration.
I began articulating the known variables with the intent of modeling inputs and feeding them to a Markov Decision Process. I'll come clean and admit that being able to use the term sequential stochastic optimization in conversation around the office was a big plus. A MDP is essentially a
dynamic program, whereby for each policy the resulting state variables comprise a Markov process (a stochastic process with the property that the conditional probability of a transition to any state depends only on the current state, and not on previous states).
The role of MDPs is clear for a given class of problem. Consider the following:
decisions that humans and computers make on all levels usually have two types of impacts: (i) they cost or save time, money, or other resources, or they bring revenues, as well as (ii) they have an impact on the future, by influencing the dynamics. In many situations, decisions with the largest immediate profit may not be good in view of future events.
The goal is to reduce time to process, but increasing the value of one control variable may lead to a negative impact based on the value of another. An MDP uses a probability model to determine the selection of a next state. Each choice is graded by a reward function. Over time, a MDP will select values that have the highest overall reward.
Will they wait?
The issue that ultimately blocks me on this is how do you support a product with time critical functions when the execution time is left to a non- deterministic process? Even if the solution you ultimately arrive at is better than a static configuration, will they wait long enough?
Actually I didn't really intend to write out Markov at all. I wanted to ask the question:
Is close enough good enough?
The real topic is about automating semantic interoperability between systems. More on which later.
Another interesting find: A Markov Decision Process Model for Airline Meal Provisioning