Care in applying Analytics for Prediction and for Management Decisions

  • September 19, 2014
  • Ultramax Corporation
  • Feature

By Carlos W. Moreno, Ultramax

Imagine a database of historical variables values organized in sets that are highly related to each other (where each set is an “event”, even if the variables did not happened at the same time – e.g., amount of fertilizer used and production yield), as collected in many endeavors including in ‘big data’.   Imagine that we classify the variables into two sets: predictor and predicted.  Various technologies have different names for them.   Problem: given the values of the predictor variables in some specific event, predict the values of predicted variables.  Depending on the accuracy of the predictions (what can be explained, what not) this can give significant insights as to what is happening, especially when the information is nicely presented.    The division of data into predictor and predicted does not require or imply a cause-and-effect relationship, just that the predictor values are known before the predicted values are.  One may wish to predict the value of a ‘cause’ (such as given the yield estimate the amount of fertilizer used).  Note: the timing issue implied in the term ‘predict’ relates to our knowledge of reality, not when the value occurs.  In this case, ‘predict’ is equivalent to estimate, appraise, evaluate, reckon, assess.   Clearly, analytics (of data relationships) would have something to contribute when the data is created by a rather constant mechanism.  The ‘mechanism’ is often a rather complex web of cause-and-effect relationships – i.e., of transfer functions.  The values almost always include variations that we interpret as random (often, from unknown effects).   Requirement: “Constancy of Mechanism”.  This is obviously an important requirement, to which one frequently pays little attention, and the failure of which – a matter of degree – leads to errors.   The analytics of creating a prediction function (a singular noun even for multiple inputs and multiple outputs) based on data is model-fitting by finding coefficient for the (mathematical) model structures selected.  There are several technologies to achieve this.  One technology, Ultramax®, that supports several levels of insight and of intelligent decision making, provided the nomenclature used in this review.   So far in this review, the input/output relationship in the prediction function quantifies ‘correlations’ only – or ‘how things happen’ -- but not necessarily “cause-and-effect” or “causation” effects.   The principle at work is this quote from Ultramax’s Blue Book: “One can construct prediction models based on data created by a system/mechanism.  These models quantify correlation relationships between various data (an alternate way to create prediction models is based on first-principles).   As is well known, correlation by itself does not identify causation.  And causation -- i.e. cause-and-effect relationships -- is necessary for managing outcomes, for decision making; and thus for optimization.   Constructing cause-and-effect prediction models based on data requires that the decision (adjusted) inputs be independent variables.  Adjusted inputs are independent when their values are determined solely by the decision or selection by the decision maker    in other words, their values are not changed by other mechanisms.  This characteristic is easy to corroborate during execution. In addition, for the uncontrolled inputs to maintain their role representing imposed conditions that also affect outcomes, the uncontrolled inputs should be independent of adjusted inputs.  This characteristic requires understanding of the process by the engineers. The independence requirements are obviously much simpler to ascertain in planning and while making system adjustments, than in assessing such a requirement using historical data.” Thus, this prediction function described above is suitable for an observer who either does not affect how data is created, or who does not use the prediction information to change how he/she affects data creation.  More specifically, it is important to realize that the prediction function, so far and by itself, does NOT support this course of action: using the prediction function find a set of predictor values that predict more desirable outcomes, and then start a new event making the predictor variables have those values.  A person that can change causes is, of course, not just an observer, but instead a decision maker intervening on how data is created, a manager.   The prediction function could be close to a cause-and-effect relationship if one adds significant first-principle insights.  For instance, we know that earth quality affects crop yield, and it is reasonable that modifying earth to have desired predictor characteristics will result in better crops.  But this approach may not be reliable or at least not sustainable.  We also know that the plants themselves affect the earth in many ways.   Gradient of the prediction function: For any one predicted variable, the gradient (the first partial derivatives at the predictor values, which could be the coefficient themselves such as in linear functions) represent “the quantitative relationship between changes in predictor values and changes in predicted values”.  The gradient most frequently varies depending on the value of the predictors – a characteristic of a non-linear reality.  The gradient is a vector for each input, with one value for each output.  It represents also the most basic human understanding of the impact of variations in predictors to variations in predicted variables.  This ‘feeling’ or ‘knowledge’ leads naturally to consider the possibility of modifying outcomes – to manage -- to achieve more desirable results.  The next section reviews what else is necessary to achieve this objective. Correlation of inputs in the database: Predictor values in the database are also likely to have various levels of correlation among themselves.  Extreme correlation or “confounding”  makes finding the true gradient of each individual predictor up to impossible (regardless of the analytics used).  If the software supports the syndrome, the prediction function gradients are some balance among the true gradients of the correlated predictors.  However, this syndrome does not affect prediction accuracy.  As usual, predictions are limited to where the prediction function is (most) valid: in the region spanned by the predictor values plus minor extrapolations, the region of experience in the database. So, the prediction function provides the insights or predictions for an observer, who does not intervene on how data is being generated.  The insights may lead to formulating hypothesis that can be tested with future experiments.  Additionally, the prediction function could be a useful quantitative representation of a cause-and-effect relationship that supports management action (decision making) if we add further knowledge about the mechanism / process as explained in the next section.   Prediction in observations, coupled with relevant first principles knowledge, enables quantifying patterns of behavior.  This process is sharpened when running experiments where the decision maker controls (adjusts) certain predictors, as described below.   One step up: management actions, decisions Clearly, to manage results, to act, the manager / engineer needs to employ cause-and-effect relationships (transformations) that he/she can initiate, maintain and depend on to get desired results.  This is “to make things happen”. Framework for learning how to make safe and higher impact decisions from the database (more detailed explanations follow):

A.    Setup: The variables in the database of events are classified into three groups:

  • Decision inputs (predictor variables) which, as part of the mechanism that creates the data, were changed or adjusted by a manager, a decision maker.  These are actionable steps, decision points (as further defined below in section B).
  • Uncontrolled inputs (predictor variables, if any) whose changes in value are closely related to changes in factors that (may) affect results in important ways, and whose values are determined elsewhere – the manager cannot readily change them.  That is, they represent the conditions under which one has to work.  
  • Other uncontrolled inputs may be unknown, or just without data.  Ideally, they only create minor variations on the results, which means that one can ignore them.
  • Outputs / outcomes / results / effects (predicted variables) from the input values in the cause-and-effect relationship, the response of the mechanism.

The prediction function will predict output values depending on the values of the inputs.

B.    Requirement: Independence.  The prediction function is valid for new decisions when:

  • Nothing alters the decision input values other than the decision maker; they are independent of all other variables.  This assures that it is an action step, a decision point.  Example: a setpoint value for the Regulatory Control System to implement.
  • The uncontrolled input values (known or not) are indeed imposed externally, they are not affected by the adjustment of decision inputs: the uncontrolled inputs are independent of the decision inputs.  This assures that the prediction function is valid for alternate decision values.  Example: characteristics of raw materials.  

Note, as in Prediction only, interdependency among uncontrolled inputs is not a barrier (as long as the software supports the syndrome).    

C.    Requirement: Constancy of Mechanism.  

Requirements A and B are assessed by the user, the assessment cannot be done from the database alone.  Violation of requirement C could conceivably be detected if the events in the database are in order of time, or if a time-stamp is included.  Macro-knowledge of what did happen may be more relevant (e.g., an instance of major maintenance or process redesign).   Requirement B assures that the adjusted inputs are not forcefully, physically mixed with other inputs, that there effects are separate (independent) and thus the prediction function does not depend on any such inherent  ‘mixing’.  Then, when the manager intervenes by changing a decision input value, the prediction function remains representative, valid and reliable. Gradients of the prediction function hopefully match the users experience and knowledge of first principles – what engineers and scientists know best.  If there is a disparity, it is worth re-assessing the situation and analysis.

Physical independence enables, but does not assure, that an adjusted input not be numerically confounded.  Various adjusted inputs could be highly correlated in the database – even though independent -- if that is how the mechanism / process was managed in the past (i.e., it was a conceptual relationship, not physical).  As for Prediction only, highly correlated inputs do not affect prediction accuracy, but in the prediction function individual gradients will be confused with the gradients of other inputs.  This syndrome, of course, applies to all analytical methods, even if using Bayesian techniques.   These three requirements apply in properly run Design of Experiments (DOE), Sequential Empirical Optimization (SEO) and Neural Networks (NN) (in order of invention).  To understand gradients better – and further, to learn about curvatures in the response -- high correlation between decision inputs is avoided in creating the databases for the analysis.   If the three requirements are fulfilled, optimizing a decision based on the prediction function is to find the set of input adjustment values (decisions) such that the predicted outcomes are best according to the user’s criterion.   If uncontrolled inputs are included in the analysis, the optimum would be ‘dynamic’, depending on the uncontrolled input values.  The effects of uncontrolled inputs not included in the analysis will contribute to noise (unexplained variations in outputs), and with respect to them the advice will be ‘robust’.  Some solutions find and implement those adjustment values automatically, such as in optimizing supervisory control.   If no uncontrolled inputs are included (even those known) the optimum would be ‘static’, the best fixed advice for the distribution of the ignored uncontrolled inputs when generating the database (hopefully representing the future distribution according to the Constancy of Mechanism requirement). Note: Some adjusted inputs can be optimized even though they are modified by a mechanism.  One case is the adjustments made by existing supervisory control logic, responding to conditions (to uncontrolled inputs).  Then optimization is achieved by adding a ‘bias’ to the control logic recommended adjustments, this bias being the decision point.    Assessing the fulfillment of these requirements for managing and decision making requires process / mechanism knowledge.  This is the basic message conveyed by this review.  In addition, analytical results should be consistent with manager’s understanding, else something is failing: the analytics, the understanding, or the data.   Of course, cherry-picking data to match understanding leads to erroneous or unsupported conclusions. A second step up: Decisions on uncontrolled inputs Let us assume that a set of ‘uncontrolled inputs’ at a level of management can be determined on purpose at a higher (or earlier, or different) management group, namely, for that group that set of uncontrolled inputs are actionable steps, decision points.  Let us call that set “higher-level decisions”.  The uncontrolled inputs that remain uncontrolled are simply called the ‘other uncontrolled inputs’.   Requirements for learning to make safe and higher impact higher-level decisions from the database (applying the more details in the previous section):

A.    Setup: In the definition of variables for the lower level management, the set of uncontrolled inputs that become higher-level decision points – an actionable step – are changed to adjusted inputs.

B.    Requirement: The prediction function for alternate decisions is valid when:

  • Each new decision point is physically independent of all other variables.  This assures that it is an action step, a decision point.  
  • The (other) uncontrolled inputs are independent of the decisions.  This assures that the prediction function is valid for alternate decision values.

C.    Requirement: Constancy of Mechanism.  

These three requirements may be difficult to assess by lower-level management group. If the three requirements are met, the following is valid in the region of decision values covered by the data in the database (the region of experience, where the prediction model is most accurate):

  • IF the decision input values do not have very high correlation, the gradient indicates the effect of changing decision values on output values (at least for a small change in the decisions when the prediction function is non-linear). 
  • The prediction function remains valid for tentative decision values (even if decision inputs are highly correlated, even if spanning only a sub-space).  

This enables finding a reliable optimum estimate in that region.  This optimum includes also the optimal decision values for the lower-level management (which are a consequence of the higher-level decision values).  Note: The true optimum may be outside that region, but without reliable estimates.

An optimal decision would be:

  • ‘Dynamic’, depending on the known values of the remaining uncontrolled inputs (if any); and
  • robust ‘Static’ for the unknown values of the remaining uncontrolled inputs (if any)

This optimum would be a reasonable set of decisions to make.     If the optimum appears at the edge of the region spanned by the input data (region of experience), it is an indication that the true optimum may be outside of the region.   Further, if this decision is implemented, and the database is enlarged with new data, then the region of adjustment values may become larger and closer to the true optimum, this being in a sub step of an SEO cycle.   Sometimes the ‘actionable step’ is not the adjustment of decision inputs, but the selection among different available ‘packs’.  An example is the Purchasing Department selecting ore from various mines.  Each pack has its own key factor set of characteristics that affect outcomes through the cause-and-effect transfer function.  

In the case of “selection” the requirements are a little different:

  • The engineer should know, measure and include ALL important key characteristics in the database, or factors closely correlated with them.  If there are unknown key characteristics which are not always physically correlated to the known ones, then one will miss taking advantage of that bit of knowledge, and that will add to “noise” in the outputs.
  • The key factors (causes) should be independent from the remaining uncontrolled inputs (known and unknown).  
  • In turn, the remaining uncontrolled inputs are independent of the key factors.  

Then the prediction function remains valid even when intervening with selection of new unexperienced packs.  Also, the cause-and-effect gradient shows the partial effect of each key factor, barring high data correlation. The decision making analysis consists of predicting outcomes for each pack using their key factors and other uncontrolled inputs (if any) as predictors, and judging the packs by the merits of the predicted values of the process.  

Did you enjoy this great article?

Check out our free e-newsletters to read more great articles..