Product Design Engineering: Spreadsheet Modeling and Decision Analysis

Ref: Spreadsheet Modeling & Decision Analysis, by Cliff T.Ragsdale

Modeling and Solving LP problems in a Spreadsheet
Effective Spreadsheet Design Guidelines:

Organize the data, and then build the model around the data: After the data is arranged in a visually appealing manner, logical locations for decisions variables, constraints, and the objective function tend to naturally suggest themselves. This also tends to enhance the reliability, auditability, and maintainability of the model
Do not embed numeric constants in formulas: Numeric constants should be placed in individual cells and labeled appropriately. This enhances the reliability and modifiability of the model
Things which are logically related (for example, LHS and RHS of constraints) should be arranged in close physical proximity to one another and in the same columnar or row orientation: This enhances reliability and auditability of the model
A design that results in formulas that can be copied is probably better than one that does not: A model with formulas that can be copied to complete a series of calculations in a range is less prone to error (more reliable) and tends to be more understandable (auditable). Once a user understands the first formula in a range, he/she understands all the formulas in a range
Column or row totals should be in close proximity to the columns or rows being totaled: Spreadsheet users often expect numbers at the end of a column or row to represent a total or some other summary measure involving the data in the column or row. Numbers at the ends of columns or rows that do not represent totals can easily be misinterpreted (reducing auditability)
The English-reading human eye scans left to right, top to bottom: This fact should be considered and reflected in the spreadsheet design to enhance the auditability of the model
Use color, shading, borders, and protection to distinguish changeable parameters from other elements of the model: This enhances the reliability and modifiability of the model
Use text boxes and cell comments to document various elements of the model: These devices can be used to provide greater detail about a model or particular cells in a model than labels on a spreadsheet may allow

Make vs. Buy Decisions:

LP is particularly well-suited to problems where scarce or limited resources must be allocated or used in an optimal manner. Numerous examples of these types of problems occur in manufacturing organizations.

For example, LP might be used to determine how the various components of a job should be assigned to multipurpose machines in order to minimize the time it takes to complete the job. As another example, a company might receive an order for several items that it cannot fill entirely with its own production capacity. In such a case, the company must determine which items to produce and which items to subcontract (or buy) from an outside supplier.

An Investment Problem:
There are numerous problems in the area of finance to which various optimization techniques can be applied. These problems often involve attempting to maximize the return on an investment while meeting certain cash flow requirements and risk constraints. Alternatively, we may want to minimize the risk on an investment while maintaining a certain level of return.

Transportation problem:
Many transportation and logistics problems businesses face fall into a category of problems known as network flow problems.

A blending problem:
Many business problems involve determining an optimal mix of ingredients. For example, major oil companies must determine the least costly mix of different crude oils and other chemicals to blend together to produce a certain grade of gasoline. Lawn care companies must determine the least costly mix of chemicals and other products to blend together to produce different types of fertilizer.

A production and inventory planning problem:
One of the most fundamental problems facing manufacturing companies is that of planning their production and inventory levels. This process considers demand forecasts and resource constraints for the next several time periods and determines production and inventory levels for each of these time periods so as to meet the anticipated demand in the most economical way.

A multiperiod cash flow problem:
Numerous business problems involve decisions that have a ripple effect on future decisions. Many financial decisions involve multiple time periods because the amount of money invested or spent at one point in time directly affects the amount of money available in subsequent time periods. In these types of multiperiod problems, it can be difficult to account for the consequence of a current decision on future time periods without and LP model.

Data Envelopment Analysis:
Managers are often interested in determining how efficiently various units within a company operate. Similarly, investment analysts may be interested in comparing the efficiency of several competing companies within an industry. Data Envelopment Analysis (DEA) is an LP-based methodology for performing this type of analysis. DEA determines how efficiently an operating unit (or company) converts inputs to outputs when compared with other units.
=====================================================

Sensitivity Analysis and the Simplex Method:
Formulating and solving an LP model does not necessarily mean that the original decision problem has been solved. After solving a model, a number of questions often arise about the optimal solution to the LP model. In particular, we might be interested in how sensitive the optimal solution is to changes in various coefficients of the LP model.

Businesses rarely know with certainty what costs will be incurred or the exact amount of resources that will be consumed or available in a given situation or time period. Thus, optimal solutions obtained using models that assume all relevant factors are known with certainty might be viewed with skepticism by management. Sensitivity analysis can help overcome this skepticism and provide a better picture of how the solution to a problem will change if different factors in the model change. Sensitivity analysis also can help answer a number of practical managerial questions that might arise about the solution to an LP problem.

All the coefficients in LP model represent numeric constants. So, when we formulate and solve an LP problem, we implicitly assume that we can specify the exact values for these coefficients. However, in the real world, these coefficients might change from day to day or minute to minute. For example, the price a company chargers for its products can change on a daily, weekly, or monthly basis. Similarly, if a skilled machinist calls in sick, a manufacturer might have less capacity to produce items on a given machine than was originally planned.

Realizing that such uncertainties exist, a manager should consider how sensitive an LP model's solution is to changes or estimation errors that might occur in:

The objective function coefficient
The constraint coefficients
The RHS values for the constraints

A manager also might ask a number of "What if?" questions about these values. For example, what if the cost of a product increases by 7%? What if a reduction in setup time allows for additional capacity on a given machine? What if a worker's suggestion results in a product requiring only two hours of labor rather than three.

Sensitivity analysis addresses these issues by assessing the sensitivity of the solution to uncertainty or estimation errors in the model coefficients, as well as the solution's sensitivity to changes in model coefficients that might occur because of human intervention.

The Answer Report:

This report summarizes the solution to the problem, and is fairly self-explanatory. The first section of the report summarizes the original and final (optimal) values of the set cell. The next section summarizes the original and final (optimal) values of the adjustable (or changing) cells representing the decision variables.

The final section of this report provides information about the constraints. In particular, the Cell Value column shows the final (optimal) value assumed by each constraint cell. Note that these values correspond to the final value assumed by the LHS formula of each constraint. The formula column indicates the upper or lower bounds that apply to each constraint cell. The status column indicates which constraints are binding and which are nonbinding. A constraint is binding if it is satisfied as a strict equality in the optimal solution; otherwise, it is nonbinding.

Finally, the values in the Slack column indicate the differences between the LHS and RHS of each constraint. By definition, binding constraints have zero slack and nonbinding constraints have some positive level of slack.

The slack values for the non-negativity conditions indicate the amounts by which the decision variables exceed their respective lowers bounds of zero.

The Answer Report does not provide any information that could not be derived from the solution in the spreadsheet model. However, the format of this report gives a convenient summary of the solution that can be incorporated easily into a word-processing documents as part of a written report to management.

The Sensitivity Report:

This report summarizes information about the variable cells and constraints for our model. This information is useful in evaluating how sensitive the optimal solution is to changes in various coefficients in the model.

The original objective function coefficients associated with the variable cells are listed in the Objective Coefficient column. The next two columns show the allowable increases and decreases in these values.

The shadow price for a constraint indicates the amount by which the objective function value changes given a unit increase in the RHS value of the constraint, assuming all other coefficients remain constant. If a shadow price is positive, a unit increases in the RHS value of the associated constraint results in an increase is the optimal objective function value. If a shadow price is negative, a unit increase in the RHS value of the associated constraint results in a decrease in the optimal objective function value.

Key Points:
Our discussion of Solver's sensitivity report highlights some key points concerning shadow prices and their relationship to reduced costs. These key points are summarized as:

The shadow prices of resources equate the marginal value of the resources consumed with the marginal benefit of the goods being produced
Resources in excess supply have a shadow price (or marginal value) of zero
The reduced cost of a product is the difference between its marginal profit and the marginal value of the resources it consumes
Products whose marginal profits are less than the marginal value of the goods required for their production will not be produced in an optimal solution

A warning about degeneracy:
The solution to an LP problem sometimes exhibits a mathematical anomaly known as degeneracy. The solution to an LP problem is degenerate if the RHS values of any of the constraints have an allowable increase or allowable decrease of zero. The presence of degeneracy impacts our interpretation of the values on the sensitivity report.
So before interpreting the results on a sensitivity report, you should always first check to see if the solution is degenerate because this has important ramifications on how the numbers on the report should be interpreted.

The Limits Report:
This report lists the optimal value of the set cell. It then summarizes the optimal values for each variable cell and indicates what values the set cell assumes if each variable cell is set to its upper or lower limits. The values in the Lower Limits column indicate the smallest value each variable cell can assume while the values of all other variable cells remain constant and all the constraints are satisfied. The values in the Upper Limits column indicate the largest value each variable cell can assume while the values of all other variable cells remain constant and all the constraints are satisfied.

The sensitivity assistant add-in (optional):
Excel add-in called Sensitivity Assistant that provides two additional tools for performing sensitivity analysis on an ad hoc basis: Spider Tables and Solver Tables.
A Spider Table summarizes the optimal value for one output cell as individual changes are made to various input cells. A Solver Table summarizes the optimal value of multiple cells as changes are made to a single input cell.

The simplex method considers only the extreme points of the feasible region, and is an efficient way of solving LP problems. In this method, slack variables are first introduced to convert all constraints to equal to constraints. The simplex method systematically moves to better and better corner point solutions until no adjacent extreme point provides an improved objective function value.
===================================================

Network Modeling:
A number of practical decision problems in business fall into a category known as network flow problem. These problems share a common characteristic - they can be described or displayed in a graphical form known as a network. There are several types of network flow problems:

Transshipment problems
Shortest path problems
Maximal flow problems
Transportation/assignment problems
Generalized network flow problems

These problems can be considered and solve as LP problems. There is a different type of network problem known as the minimum spanning tree problem.

The transshipment problem:
Let's begin our study of network flow problems by considering the transshipment problem. Most of the other types of network flow problems all can be viewed as simple variations of the transshipment problem. So, once you understand how to formulate and solve transshipment problems, the other types of problems will be easy to solve.

Characteristics of Network Flow Problems:
All network flow problems can be represented as a collection of nodes connected by arcs. The circles are called nodes in the terminology of network flow problems, and the lines connecting the nodes are called arcs. The arcs in a network indicate the valid paths, routes, or connections between the nodes in a network flow problem. When the lines connecting the nodes in a network are arrows that indicate a direction, the arcs in the network are called directed arcs.
The notion of supply nodes (or sending nodes) and demand nodes (or receiving nodes) is another common element of network flow problems.
Transshipment nodes can both send to and receive from other nodes in the network. The net supply or demand for each node in the network is indicated by a positive or negative number next to each node. Positive numbers represent the demand at a given node, and negative numbers represent the supply available at a node.
The objective in most network flow problems is to minimize the total cost, distance, or penalty that must be incurred to solve the problem. Such problems are known as minimum cost network flow problems.
Just as the number of arcs in the network determines the number of variables in the LP formulation of a network flow problem, the number of nodes determines the number of constraints. In particular, there must be one constraint for each node. A simple set of rules, known as the Balance-of-Flow Rules, applies to constructing the constraints for minimum cost network flow problems. These rules are summarized as follows:

Total Supply > Total Demand            Inflow - Outflow >= Supply or Demand
Total Supply < Total Demand            Inflow - Outflow <= Supply or Demand
Total Supply = Total Demand            Inflow - Outflow = Supply or Demand

The Shortest Path Problem:
In many decision problems, we need to determine the shortest (or least costly) route or path through a network from a starting node to an ending node. For example, many cities are developing computerized models of their highways and streets to help emergency vehicles identify the quickest route to a given location.
Each street intersection represents a potential node in a network, and the streets connecting the intersections represent arcs.

The Equipment Replacement Problem:
The equipment replacement problem is a common type of business problem that can be modeled as a shortest path problem. This type of problem involves determining the least costly schedule for replacing equipment over a specified length of time.

Summary of Shortest Path Problems:

You can model any shortest path problem as a transshipment problem by assigning a supply of 1 to the starting node, a demand of 1 to the ending node, and a demand of 0 to all other nodes in the network. Because in the examples where involved only a small number of paths through each of networks, it might have been easier to solve these problems simply by enumerating the paths and calculating the total distance of each one. However, in a problem with many nodes and arcs, an automated LP model is preferable to a manual solution approach.

Transportation/assignment Problems:
This type of network differs from the earlier network flow problems because it contains no transshipment nodes. Each node is either a sending node or a receiving node. The lack of transshipment nodes is the key feature that distinguishes transportation/assignment problems from other types of network flow problems.

Generalized Network Flow Problems:
There are numerous examples of network flow problems in which a gain or loss occurs on flows across arcs. For instance, if oil or gas is shipped through a leaky pipeline, the amount of oil or gas arriving at the intended destination will be less than the amount originally placed in the pipeline. Similar loss-of-flow examples occur as a result of evaporation of liquid, spoilage of foods and other perishable items, or imperfections in raw materials entering production processes that result in a certain amount of scrap.
Many financial cash flow problems can be modeled as network flow problems in which flow gains occur in the form of interest or dividends as money flows through various investments.

Generalized Network Flow Problems and Feasibility:
In generalized network flow problems, the gains and/or losses associated with flows across each arc effectively increase and/or decrease the supply available in the network. So we are not able to satisfy all of the demand due to the loss of material that occurs in the production process.
The point being made here is that with generalized network flow problems, you cannot always tell before solving the problem if the total supply is adequate to meet the total demand.
As a result, you cannot always know which balance-of-flow rule to apply. When the issue is unclear, it is safest to first assume that all the demand can be met and (according to the balance-of-flow rule) use constraints of the form: (Inflow - Outflow >= Supply or Demand). If the resulting problem is infeasible ( and there are no errors in the model), then we know all the demand cannot be satisfied and (according to the balance-of-flow rule) use constraints of the form: (Inflow - Outflow <= Supply or Demand). In this later case, the solution will identify the least costly way of using the available supply to meet as much of the demand as possible.

Important Modeling Point:
For generalized network flow problems, the gains and/or losses associated with flows across each arc effectively increase and/or decrease the supply available in the network. As a result, it is sometimes difficult to tell in advance whether the total supply is actually adequate to meet the total demand in a generalized network flow problem. When in doubt, it is best to assume the total supply is capable of satisfying the total demand and use Solver to prove (or refute) this assumptions.

Maximal Flow Problems:
The maximal flow problem (or max flow problem) is a type of network flow problem in which the goal is to determine the maximum amount of flow that can occur in the network. In a maximal flow problem, the amount of flow that can occur over each arc is limited by some capacity restriction. This type of network might be used to model the flow of oil in a pipeline (in which the amount of oil that can flow through a pipe in a unit of time is limited by the diameter of the pipe). Traffic engineers also use this type of network to determine the maximum number of cars that can travel through a collection of streets with different capacities imposed by the number of lanes in the streets and speed limits.
The max flow problem appears to be very different from the network flow models described earlier because it does not include specific supplies or demands for the nodes.

Special Modeling Considerations:
A number of special conditions can arise in network flow problems that require a bit of creativity to model accurately. For example, it is easy to impose minimum or maximum flow restrictions on individual arcs in the networks by placing appropriate lower and upper bounds on the corresponding decision variables. However, in some network flow problems, minimum or maximum flow requirements may apply to the total flow emanating from a given node. Unfortunately, these constraints do not conform to the balance-of-flow rule and would require us to impose side constraints on the model.
The additional nodes and arcs added are sometimes referred to as dummy nodes and dummy arcs. But using a dummy node in this manner allows us to model and solve the problem accurately. Dummy nodes and arcs can be helpful in modeling a variety of situations that naturally occur in network problems.

Minimal Spanning Tree Problems:
Another type of network problem is known as the minimal spanning tree problem. This type of problem cannot be solved as an LP problem, but is solved easily using a simple manual algorithm.
For a network with n nodes, a spanning tree is a set of (n-1) arcs that connects all the nodes and contains no loops. A minimum spanning tree problem involves determining the set of arcs that connects all the nodes in a network while minimizing the total length (or cost) of the selected arcs.

An algorithm for the Minimal Spanning Tree Problem:
You can apply a simple algorithm to solve minimal spanning tree problems. The steps to this algorithm are:

Select any node. Call this the current subnetwork
Add to the current sub-network the cheapest arc that connects any node within the current sub-network to any node not in the current sub-network. (Ties for the cheapest arc can be broken arbitrarily.) Call this the current sub-network
If all the nodes are in the sub-network, stop; this is the optimal solution. Otherwise, return to step 2

====================================================

Integer Linear Programming:
When some or all of the decision variables in an LP problem are restricted to assuming only integer values, the resulting problem is referred to as an integer linear programming (ILP) problem. Many practical problems need integer solutions. For example, when scheduling workers, a company needs to determine the optimal number of employees to assign to each shift. If we formulate this problem as an LP problem, its optimal solution could involve allocating fractional numbers of workers (for example, 7.33 workers) to different shifts; but this is not an integer feasible solution. Similarly, if an airline is trying to decide how many 767s, 757s, and A-300s to purchase for its fleet, it must obtain an integer solution because the airline cannot buy fractions of planes.

An integrality condition indicates that some (or all) of the variables in the formulation must assume only integer values. We refer to such variables as the integer variables in a problem. In contrast, variables that are not required to assume strictly integer values are referred to as continuous variables.

Relaxation:

One approach to finding the optimal integer solution to a problem is to relax, or ignore, the integrality conditions and solve the problem as if it were a standard LP problem where all the variables are assumed to be continuous. This model is sometimes referred to as the LP relaxation of the original ILP problem.

The feasible region of the LP relaxation of an ILP problem always encompasses all the feasible solutions to the original ILP problem. Although the relaxed feasible region might include additional non-integer solutions, it will not include any integer solutions that are not feasible solutions to the original ILP.

As a general rule, the optimal solution to the LP relaxation of an ILP problem is not guaranteed to produce an integer solution. In such cases, other techniques must be applied to find the optimal integer solution for the problem being solved. (There are some exceptions to this rule. In particular, the network flow problems often can be viewed as ILP problems. The LP relaxation of network flow problems will always have integer solutions if the supplies and/or demands at each node are integers and the problem is solved using the simplex method.

Bounds:

Before discussing how to solve ILP problems, an important point must be made about the relationship between the optimal solution to an ILP problem and the optimal solution to its LP relaxation: The objective function value for the optimal solution to the ILP problem can never be better than the objective function value for the optimal solution to its LP relaxation.

Key Concept:

For maximization problems, the objective function value at the optimal solution to the LP relaxation represents an upper bound on the optimal objective function value of the original ILP problem. For minimization problems, the objective function value at the optimal solution to the LP relaxation represents a lower bound on the optimal objective function value of the original ILP problem.

Rounding:

The solution to the LP relaxation of an ILP problem might satisfy the ILP problem's integrality conditions and, therefore, represent the optimal integer solution to the problem. But what should we do if this is not the case (as usually happen)? One technique that frequently is applied involves rounding the relaxed LP solution.

When the solution to the LP relaxation of an ILP problem does not result in an integer solution, it is tempting to think that simply rounding this solution will generate the optimal integer solution. Unfortunately, this is not the case.

Because rounding up does not always work, perhaps we should round down, or truncate, the values for the decision variables identified in the LP relaxation.

Another problem with rounding down is that even if it results in a feasible integer solution to the problem, there is no guarantee that it is the optimal integer solution.

Simply rounding the solution to the LP relaxation of an ILP problem is not guaranteed to provide the optimal integer solution. Therefore, we need another way to find the optimal integer solution o an ILP problem. Various procedures have been developed for this purpose. The most effective and widely used of these procedures is the branch-and-bound (B&B) algorithm. The B&B algorithm theoretically allows us to to solve any ILP problem by solving a series of LP problems called candidate problems.

Stopping Rules:

Finding the optimal solution for simple ILP problems can sometimes require the evaluation of hundreds of candidate problems. More complex problems can require the evaluation of thousands of candidate problems, which can be a very time-consuming task even for the fastest computers. For this reason, many ILP packages allow you to specify a suboptimally tolerance of X% (where X is some numeric value), which tells the B&B algorithm to stop when it finds an integer solution that is no more than X% worse than the optimal integer solution. This is another area where obtaining upper or lower bounds on the optimal integer solution can be helpful.
Specifying suboptimality tolerances can be helpful if you are willing to settle for a good but suboptimal solution to a difficult ILP problem. However, most B&B packages employ some sort of default suboptimality tolerance and, therefore, might produce a suboptimal solution to the ILP problem without indicating that a better solution might exist.

Solving ILP problems using solver:
None of the parameters indicate that the cells representing the decision variables must assume integer values. To communicate this to Solver, we need to add constraints to the problem by clicking the Add button.
Because we want these cells to assume only integer values, we need to select the int option from the drop-down menu.
By default, Solver uses a suboptimality tolerance factor of 5%. To ensure that Solver finds the best possible solution to an ILP problem, we must change its suboptimality tolerance factor by clicking the Options button in the Solver parameters dialog box and then click the Integer Options button in the Solver Options dialog box. The tolerance option represents Solver's suboptimality tolerance value. To make sure Solver finds the best possible solution to an ILP problem, we must change this setting from its default value of 0.05 to 0.

Other ILP problems:
The ability to constrain certain variables to assume only integer values enables us to model a number of important conditions more accurately. For example, we have not considered the impact of quantity discounts, setup or lump-sum costs, or batch size restrictions on a given decision problem. Without ILP techniques, we could not model these decision issues.

An Employee scheduling problem:
Anyone responsible for creating work schedules for a number of employees can appreciate the difficulties in this task. It can be very difficult to develop a feasible schedule, much less an optimal schedule. Trying to ensure that a sufficient number of workers is available when needed is a complicated task when you must consider multiple shifts, rest breaks, and lunch or dinner breaks. However, some sophisticated LP models have been devised to solve these problems.

Binary Variables:
Some LP problems naturally evolve into ILP problems when we realize that we need to obtain integer solutions. To impose integrality conditions we change the continuous variables in the model into general integer variables, or variables that could assume any integer value (provided that the constraints of the problem are not violated). In many other situations, we might want to use binary integer variables (or binary variables), which can assume only two integer values: 0 and 1. Binary variables can be useful in a number of practical modeling situations, as following cases.

A capital budgeting problem:
In a capital budgeting problem, a decision maker is presented with several potential projects or investment alternatives and must determine which projects or investments to choose. The projects or investments typically require different amounts of various resources (for example, money, equipment, personnel) and generate different cash flows to the company. The cash flows for each project or investment are converted to a net present value (NPV). The problem is to determine which set of projects or investments to select in order to achieve the maximum possible NPV.

Setting up the Binary Variables:
In our formulation of this problem, we assume that each decision variable is a binary variable. We must include this assumption in the formal statement of our model by adding the constraints: (All Xi must be binary).

Important Software Note:

Solver provides a feature that allows you to indicate that certain variable cells represent binary variables. When you are adding constraints, the drop-down list box used to indicate the type of constraint ("<=", ">=", "=", "int") includes another option labeled "bin" for binary.

Binary Variables and Logical conditions:
Binary variables can be used to model a number of logical conditions that might apply in a variety of problems. For example, in company X, several of the projects under consideration (for example, projects 1, 3, and 6) might represent alternative approaches for producing a certain part for a product. The company might want to limit the solution to include no more than one of these three alternatives. The following type of constraint accomplishes this restriction: (X1 + X3 + X6 <= 1)

Because X1, X3, and X6 represent binary variables, no more than one of them can assume the value 1 and still satisfy the previous constraint. If we want to ensure that the solution includes exactly one of these alternatives, we could include the following constraint in our model: (X1 + X3 + X6 = 1)

As an example of another type of logical condition, suppose that project 4 involves a cellular communications technology that will not be available to the company unless it undertakes project5. In other words, the company cannot select project4 unless it also selects project5. This type of relationship can be imposed on the solution with the constraint: (X4 - X5 <= 0)

As these examples illustrate, you can model certain logical conditions using binary variables.

The Fixed-Charge Problem:
In most of the LP problems we formulate objective functions to maximize profits or minimize costs. In each of these cases, we associate a per-unit cost or per-unit profit with each decision variable to create the objective function. However, in some situations, the decision to produce a product results in a lump-sum cost, or fixed-charge, in addition to a per-unit cost or profit. These type of problems are known as fixed-charge or fixed-cost problems.
The following are some examples of fixed-costs:

The cost to lease, rent, or purchase a piece of equipment or a vehicle that will be required if a particular action is taken
The setup cost required to prepare a machine or production line to produce a different type of product
The cost to construct a new production line or facility that will be required if a particular decision is made
The cost of hiring additional personnel that will be required if a particular decision is made

The fixed-costs are new costs that will be incurred if a particular action or decision is made. In this respect, fixed costs are different from sunk costs, which are costs that will be incurred regardless of what decision is made. Sunk costs are irrelevant for decision-making purposes because, by definition, decisions do not influence these costs. On the other hand, fixed costs are important factors in decision making because the decision determines whether or not these costs will be incurred.

Defining the Constraints and determining values for "Big M":
First we must ensure that the required relationship between the Xi and Yi variables is enforced. In particular, the value of the Yi variables can be determined from the Xi variables. Therefore, we need constraints to establish this link between the value of the Yi variables and the Xi variables. These linking constraints are represented by: (X1 <= M1Y1 , X2 <= M2Y2 , X3 <= M3Y3)

In each of these constraints, the Mi is a numeric constant that represents an upper bound on the optimal value of the Xi. Let's assume that all the Mi are arbitrarily large numbers; for example, Mi = 10,000. Then, each constraint sets up a link between the value of the Xi and the Yi.
The Mi values used in the linking constraints are sometimes referred to as Big M values because they can be assigned arbitrarily large values. However, for reasons these types of problems are much easier to solve if the Mi values are kept as small as possible.

Problems involving binary integer variables are often easier to solve than problems involving general integer variables. Second, good near-optimal solutions often can be obtained by rounding, so this approach is not necessarily ineffective - and could be the only practical solution available for large ILPs that are difficult to solve.

Minimum Order/Purchase Size:
Many investment, production, and distribution problems have minimum purchase amounts or minimum production lot size requirements that must be met. For example, a particular investment opportunity might require a minimum investment of $25,000. Or, a supplier of a given part used in a production process might require a minimum order of 10 units. Similarly, many manufacturing companies have a policy of not producing any units of a given item unless a certain minimum lot size will be produced.

Quantity Discounts:
In all of the LP problems we have assumed that the profit or cost coefficient in the objective function were constant. However, for example, as the production of products increases, quantity discounts might be obtained on component parts that would cause the profit margin on these items to increase.

A Contract Award Problem:
Other conditions often arise in decision problems that can be modeled effectively using binary variables. One of these condition is contract award problem. For example, company X wants to construct four buildings. Each building project requires large amounts of cement to be delivered to the building sites. At the company request, three cement companies have submitted bids for supplying the cement for these projects. The company can contract with more than one supplier to meet the cement requirements for a given project. The problem is to determine what amounts to purchase from each supplier to meet the demands for each project at the least total cost.

The Branch & Bound Algorithm:
A special procedure, known as the branch-and-bound (B&B) algorithm is required to solve ILPs. Although we can easily indicate the presence of integer variables in a model, it usually requires quite a bit of effort on Solver's part to actually solve an ILP problem using the B&B algorithm.
The B&B algorithm starts by relaxing all the integrality conditions in an ILP and solving the resulting LP problem. If we are lucky, the optimal solution to the relaxed LP problem might happen to satisfy the original integrality conditions. If this occurs, then we are done - the optimal solution to the LP relaxation is also the optimal solution to the ILP. However, it is more likely that the optimal solution to the LP will violate one or more of the original integrality conditions.
Part of the difficulty is that none of the corner points of the relaxed feasible region are integer feasible (other than the origin). We know that the optimal solution to an LP problem will occur at a corner point of its feasible region but, in this case, none of those corner points (except the origin) correspond to integer solutions. Thus, we need to modify the problem so that the integer feasible solutions to the problem occur at corner points of the relaxed feasible region. This is accomplished by branching.

Branching:
Any integer variable in an ILP that assumes a fractional value in the optimal solution to the relaxed problem can be designated as a branching variable.

The Branch & Bound Algorithm:

Relax all the integrality conditions in ILP and solve the resulting LP problem. If the optimal solution to the relaxed LP problem happens to satisfy the original integrality conditions, stop - this is the optimal integer solution. Otherwise, proceed to step 2
If the problem being solved is a maximization problem, let Zbest = - infinity. If it is a minimization problem, let Zbest = +infinity. (In general, Zbest represents the objective function value of the best known integer solution as the algorithm proceeds.)
Let Xj represent one of the variables that violated the integrality conditions in the solution to the problem that was solved most recently, and let bj represent its non-integer value. Let INT(bj) represent the largest integer that is less than bj. Create two new candidate problems: one by appending the constraint Xj <= INT(bj) to the most recently solved LP problem, and the other by appending the constraint Xj >= INT(bj) + 1 to the most recently solved LP problem. Place both of these new LP problems in a list of candidate problems to be solved.
If the list of candidate problems is empty, proceed to step 9. Otherwise, remove a candidate problem from the list, relax any integrality conditions in the problem, and solve it
If there is no solution to the current candidate problem (that is, it is infeasible), proceed to step 4. Otherwise, let Zcp denote the optimal objective function value for the current candidate problem
If Zcp is not better than Zbest (for a maximization problem Zcp <= Zbest or for a minimization problem Zcp >= Zbest), proceed to step 4
If the solution to the current candidate problem does not satisfy the original integrality conditions, proceed to step 3
If the solution to the current candidate problem does satisfy the original integrality conditions, a better integer solution has been found. Thus, let Zbest = Zcp and save the solution obtained for this candidate problem. Then go back to step 4
Stop. The optimal solution has been found and has an objective function value given by the current value of Zbest

Summary:

In some cases, acceptable integer solutions to ILP problems can be obtained by rounding the solution to the LP relaxation of the problem. However, this procedure can lead to suboptimal solutions, which might still be viable if you can show that the solution obtained by rounding is within an acceptable distance from the optimal integer solution. This approach might be the only practical way to obtain integer solutions for some ILP problems.

The B&B algorithm is a powerful technique for solving ILP problems. A great deal of skill and creativity are involved in formulating ILPs so that they can be solved efficiently using the B&B technique. Binary variables can be useful in overcoming a number of the simplifying assumptions often made in the formulation of LP models. Here again, quite a bit of creativity might be required on the part of the model builder to identify the constraints to implement various logical conditions in a given problem.
====================================================

Goal Programming and Multiple Objective Optimization:
Two other modeling techniques that are sometimes helpful in solving optimization problems. The first technique - goal programming - involves solving problems containing not one specific objective function, but rather a collection of goals that we would like to achieve. A goal can be viewed as a constraint with a flexible, or soft, RHS value.

The second technique - multiple objective optimization - is closely related to goal programming and applies to problems containing more than one objective function. In business and government, different groups of people frequently pursue different objectives. Therefore, it is quite possible that a variety of objective functions can be proposed for the same optimization problem.

Both techniques require an iterative solution procedure in which the decision maker investigates a variety of solutions to find one that is most satisfactory. Thus, unlike the LP and ILP procedures presented earlier, we cannot formulate a multiple objective or goal programming problem and solve one optimization problem to identify the optimal solution. In these problems, we might need to solve several variations of the problem before we find an acceptable solution.

Goal Programming:

The optimization techniques have always assumed that the constraints in the model are hard constraints, or constraints that cannot be violated.

Hard constraints are appropriate in many situations; however, these constraints might be too restrictive in other situations. For example, when you buy a new car, you probably have in mind a maximum purchase price that you do not want to exceed. We might call this your goal. However, you will probably find a way to spend more than this amount if it is impossible to acquire the car you really want for your goal amount. So, the goal you have in mind is not a hard constraint that cannot be violated. We might view it more accurately as a soft constraint representing a target you would like to achieve.

Numerous managerial decision-making problems can be modeled more accurately using goals rather than hard constraints. Often, such problems do not have one explicit objective function to be maximized or minimized over a constraint set but, instead, can be stated as a collection of goals that might also include hard constraints. These types of problems are known as goal programming (GP) problems.

The technique of linear programming can help a decision maker analyze and solve a GP problem. We must determine if we can find a solution that exactly meets all of the goals in this problem and, if not, what trade-offs can be made among the goals to determine an acceptable solution. We can formulate an LP model for this GP problem to help us make this determination.

Defining the Goal Constraints:
The RHS value of each goal constraint is the target value for the goal because it represents the level of achievement that the decision maker wants to obtain for the goal. The variables -di and +di are called deviational variables because they represent the amount by which each goal deviates from its target value. The -di represents the amount by which each goal's target value is underachieved, and the +di represents the amount by which each goal's target value is overachieved.

Defining the Hard Constraints:
Not all of the constraints in a GP problem have to be goal constraints. A GP problem can also include one or more hard constraints typically found in LP problems. It is also possible to change a soft constraint into a hard constraint during the analysis of a GP problem.

GP Objective Functions:
Although it is fairly easy to formulate the constraints for a GP problem, identifying an appropriate objective function can be quite tricky and usually requires some mental effort.
The objective in a GP problem is to determine a solution that achieves all the goals as closely as possible. The ideal solution to any GP problems is one in which each goal is achieved exactly at the level specified by its target value. (In such an ideal solution, all the deviational variables in all the goal constraints would equal 0.) Often, it is not possible to achieve the ideal solution because some goals might conflict with others. In such a case, we want to find a solution that deviates as little as possible from the ideal solution. One possible objective is to minimize the sum of the deviations.

The problem with the sum of the deviations is that it can be mix of different units (7 rooms + 1,500 dollars = 1,507 units of what?). One solution to this problem is to modify the objective function so that it measures the sum of percentage deviations from the various goals. This is accomplished by minimizing the sum of the percentage deviations. Where ti represents the target value for goal i: MIN: Sum (-di + di)/ti

Note that the percentage deviation objective can be used only if all the target values for all the goals are non-zero; otherwise a division by zero error will occur. Another potential criticism of the previous objective functions concerns how they evaluate deviations.

It would be nice to provide the decision maker a way to represent which deviations are desirable and undesirable in the objective function. One solution to the previous criticisms is to allow the decision maker to assign weights to the deviational variables in the objective function of a GP problem to better reflect the importance and desirability of deviations from the various goals. So, a more useful type of objective function for a GP problem is: Minimize the weighted sum of the deviations: MIN: Sum(-widi + widi).

Or, Minimize the weighted sum of the percentage deviations: MIN: Sum(-widi + widi)/ti
In these weighted objective functions, the -wi and +wi represent numeric constants that can be assigned values to weight the various deviational variables in the problem.
A variable that represents a highly undesirable deviation from a particular goal is assigned a relatively large weight - making it highly undesirable for that variable to assume a value larger than 0. A variable that represents a neutral or desirable deviation from a particular goal is assigned a weight of 0 or some value lower than 0 to reflect that it is acceptable or even desirable for the variable to assume a value greater than 0.
However, you need to follow an iterative procedure in which you try a particular set of weights, solve the problem, analyze the solution, and then refine the weights and solve the problem again. You might need to repeat this process many times to find a solution that is the most desirable to the decision maker.

Trade-offs: The Nature of GP
Depending on the preferences of the decision maker, we could continue to fine-tune the weights in the problem until we reach a solution that is most satisfactory to the decision maker. The nature of GP involves making trade-offs among the various goals until a solution is found that gives the decision maker the greatest level of satisfaction. Thus, unlike the other applications of LP, the use of LP in GP does not indicate immediately the best possible solution to the problem (unless the decision maker initially specifies an appropriately weighted objective function).
Rather, it provides a method by which a decision maker can explore a variety of possible solutions and try to find the solution that comes closest to satisfying the goals under consideration.

Comments about Goal Programming:
Some additional comments should be made before we leave the topic of GP. First, it is important to note that different GP solutions cannot be compared simply on the basis of their optimal objective function values. The user changes the weights in the objective functions from iteration to iteration; therefore, comparing their values is not appropriate because they measure different things. The objective function in a GP problem serves more of a mechanical purpose, allowing us to explore possible solutions. Thus, we should compare the solutions that are produced - not the objective function values.

Second, in some GP problems, one or more goals might be viewed as being infinitely more important that the other goals. In this case, we could assign arbitrarily large weights to deviations from these goals to ensure that undesirable deviations from them never occur. This is sometimes referred to as preemptive GP because certain goals preempt others in order of importance. If the target values for these goals can be achieved, the use of preemptive weights effectively makes these goals hard constraints that should never be violated.

Third, we can place hard constraints on the amount by which we can deviate from a goal.

Fourth, the concept of deviational variables is not limited to GP. These types of variables can be used in other problems that are quite different from GP problems. So, understanding deviational variables can prove useful in other types of mathematical programming situations.

Finally, another type of objective function, called the MINIMAX objective, is sometimes helpful in GP when you want to minimize the maximum deviation from any goal. To implement the MINIMAX objective, we must create one additional constraint for each deviational variable. Q is the MINIMAX variable. The objective is to minimize the value of Q, stated as: MIN: Q

Because the variable Q must be greater than or equal to the values of all the deviational variables, and because we are trying to minimize it, Q will always be set equal to the maximum value of the deviational variables. At the same time, this objective function tries to find a solution where the maximum deviational variable (and the value of Q) is as small as possible. Therefore, this technique allows us to minimize the maximum deviation from all the goals).

Summary of Goal Programming:

Identify the decision variables in the problem
Identify any hard constraints in the problem and formulate them in the usual way
State the goals of the problem along with their target values
Create constraints using the decision variables that would achieve the goals exactly
Transform the above constraints into goal constraints by including deviational variables
Determine which deviational variables represent undesirable deviations from the goals
Formulate an objective that penalizes the undesirable deviations
Identify appropriate weights for the objective
Solve the problem
Inspect the solution to the problem. If the solution is unacceptable, return to step 8 and revise the weights as needed

Multiple Objective Optimization:
We now consider how to solve LP problems involving multiple objective functions. These problems are called called multiple objective linear programming (MOLP) problems.

Most of the LP and ILP problems involve one objective function. These objective functions typically sought to maximize profits or minimize costs. However, another objective function could be formulated for most of these problems. For example, if a production process creates a toxic pollutant that is dangerous to the environment, a company might want to minimize this toxic by-product. However, this objective is likely to be in direct conflict with the company's other objective of maximizing profits. Problems with multiple objectives require analyzing the trade-offs among the different objectives.
MOLP problems can be viewed as special types of GP problems where, as part of solving the problem, we must also determine target values for each goal or objective. Analyzing these problems effectively also requires that we use the MINIMAX objective.

Determining Target Values for the Objectives:
An LP problem can have only one objective function, so how can we include three objectives in our spreadsheet model? The objectives in this problem can be stated as the following goals if we have appropriate values for t1, t2, and t3: For example:

Goal 1: The total production cost should be approximately t1
Goal 2: The gallons of toxic water produced should be approximately t2
Goal 3: The number of life-threatening accidents should be approximately t3

Unfortunately, the problem did not provide explicit values for t1, t2, and t3. However, if we solve our model to find the solution that minimizes the first objective (total production cost), the optimal value of this objective function would be a reasonable value to use as t1 in the first goal. Similarly, if we solve the problem two more times minimizing the second and third objectives, respectively, the optimal objective function values for these solutions would provide reasonable values to use as t2 and t3 in the second and third goals. We could then view our MOLP problem in the format of a GP problem.

Summarizing the Target Solutions:
To improve the value of one objective, we must sacrifice the value of the others. This characteristic is common to most MOLP problems. Thus, the purpose of MOLP (and of GP) is to study the trade-offs among the objectives in order to find a solution that is the most desirable to the decision maker.

Determining a GP Objective:
Now that we have target values for the objectives in our problem, we can formulate a weighted GP objective to allow the decision maker to explore possible solutions. Several GP objectives and illustrated the use of an objective that minimizes the weighted percentage deviation from the goals' target values. So the percentage deviation from this goal may be computed as: (Actual value - Target value)/Target value

These percentage deviation calculations are all linear functions of the decision variables. Thus, if we form an objective function as a weighted combination of these percentage deviation functions, we obtain the following linear objective function:

MIN: w1(percentage deviation from the from the first goal) + w2(percentage deviation from the second goal + w3(percentage deviation from the third goal) + . . .

Thus, to explore the non-extreme feasible solutions to this GP problem (or any other GP problem with hard constraints), we need to use a different type of objective function.

The MINIMAX Objective:
As it turns out, the MINIMAX objective, can be used to explore the points on the edge of the feasible region - in addition to corner points. To illustrate this, let's attempt to minimize the maximum weighted percentage deviation from the target values for the goals. We obtain the following linear objective function:

MIN: The maximum of w1(percentage deviation from the from the first goal), w2(percentage deviation from the second goal, w3(percentage deviation from the third goal), . . .
We implement this objective by establishing a MINIMAX variable Q that we minimize with the objective: MIN: Q

Thus, as we minimize Q, we are also minimizing the weighted percentage deviations from the target values for each of our goals. In this way, the maximum weighted deviation from any of the goals is minimized - or we have MINImized the MAXimum deviation (hence the term MINIMAX).
By adjusting the weights, the decision maker can explore a variety of solutions that do not necessarily occur at the corner points of the original feasible region to the problem.

Comments on MOLP:
One advantage of using the MINIMAX objective to analyze MOLP problems is that the solutions generated are always Pareto optimal. That is, given any solution generated using this approach, we can be certain that no other feasible solution allows an increase in any objective without decreasing at least one other objective.
Although the MINIMAX objective is helpful in the analysis of MOLPs, its usefulness is not limited to these problems. Like deviational variables, the MINIMAX technique can prove useful in other types of mathematical programming situations. We know that the actual value for any goal could never be less than its derived target value and we use the following formula to calculate the percentage deviation for each goal constraint:

(Actual value - Target value)/Target value

For goals derived from maximization objectives, we know that the actual value of the goal can never be greater than its derived target value and the percentage deviation for such goals should be calculated as:

(Target value - Actual value)/Target value

If the target value of a goal is zero, it is not possible to use weighted percentage deviations in the solution to the MOLP (because division by zero is not permissible). In this case, you can simply use weighted deviations.

Summary of Multiple Objective Optimization:

Identify the decision variables in the problem
Identify the objectives in the problem and formulate them in the usual way
Identify the constraints in the problem and formulate them in the usual way
Solve the problem once for each of the objectives identified in step 2 to determine the optimal value of each objective
Restate the objectives as goals using the optimal objective values identified in step 4 as the target values
For each goal, create a deviation function that measures the amount by which any given solution fails to meet the goal (either as an absolute or a percentage)
For each of the deviation functions identified in step 6, assign a weight to the deviation function and create a constraint that requires the value of the weighted deviation function to be less than the MINIMAX variable Q
Solve the resulting problem with the objective of minimizing Q
Inspect the solution to the problem. If the solution is unacceptable, adjust the weights in step 7 and return to step 8

Summary:
Two separate but closely related issues in optimization are GP and MOLP. GP provides a way of analyzing potential solutions to a decision problem that involves soft constraints. Soft constraints can be stated as goals with target values. These goals can be translated into constraints through the use of deviational variables, which measure the amount by which a given solution deviates from a particular goal. The objective in GP problems is to minimize some weighted function of the deviational variables. By adjusting the weights on the deviational variables, a variety of potential solutions can be analyzed.

MOLP provides a way to analyze LP problems involving multiple objectives that conflict with one another. Although an MOLP problem is somewhat different from a standard GP problem, the objectives can be restated as goals after identifying appropriate target values for the objectives. The MINIMAX objective is helpful in analyzing the possible solutions to an MOLP problem.

Solving a GP or MOLP problem is not as simple as solving a single LP problem. Rather, a sequence of problems must be solved to allow the decision maker to analyze the trade-offs among the various goals and objectives at different possible solutions. Thus, both of these procedures are highly iterative and interactive.
=====================================================

Nonlinear Programming and Evolutionary Optimization (Part I):
Other types of optimization problems involve objective functions and constraints that cannot be modeled adequately using linear functions of the decision variables. These types of problems are called nonlinear programming (NLP) problems.

The process of formulating an NLP problem is virtually the same as formulating an LP problem. In each case, you must identify the appropriate decision variables and formulate an appropriate objective function and constraints using these variables. The process of implementing and solving NLP problems in a spreadsheet is also similar to that for LP problems. However, the mechanics (that is, mathematical procedures) involved in solving NLP problems are very different.

The Nature of NLP Problems:

The main difference between an LP and NLP problem is that NLPs can have a nonlinear objective function and/or one or more nonlinear constraints.

Various hypothetical NLP problems:

A problem with a linear objective function and a nonlinear feasible region
A problem with a nonlinear objective function and a linear constraint set: A nonlinear objective can cause the optimal solution to the NLP problem to occur at a solution that is not a corner point of the feasible region - even if all the constraints are linear
A problem with a nonlinear objective and a nonlinear constraint set: The optimal solution to this NLP problem occurs at a solution that is not a corner point of the feasible region
A problem with a nonlinear objective and a linear constraint set. The optimal solution to this problem occurs at a point in the interior of the feasible region

An optimal solution to an LP problem always occurs at a corner point of its feasible region, but this is not true of NLP problems. The optimal solution to some NLP problems might not occur on the boundary of the feasible region at all, but at some point in the interior of the feasible region. Therefore, the strategy of searching the corner points of the feasible region employed by the simplex method to solve LP problems will not work with NLP problems.

Solution Strategies for NLP Problems:
The solution procedure Solver uses to solve NLP problems is called the generalized reduced gradient (GRG) algorithm. NLP algorithms begin at any feasible solution to the NLP problem. This initial feasible solution is called the starting point. The algorithm then attempts to move from the starting point in a direction through the feasible region that causes the objective value to improve. Some amount of movement (or a step size) in the selected feasible direction is then taken resulting in a new, and better, feasible solution to the problem. The algorithm next attempts to identify another feasible direction in which to move to obtain further improvements in the objective function value. If such a direction exists, the algorithm determines a new step size and moves in that direction to a new and better feasible solution. This process continues until the algorithm reaches a point at which there is no feasible direction in which to move that results in an improvement in the objective function. When no further possibility for improvement exists (or the potential for further improvement becomes arbitrarily small), the algorithm terminates.

Local vs. Global Optimal Solutions:

NLP solution algorithms terminate whenever they detect that no feasible direction exists in which it can move to produce a better objective function value (or when the amount of potential improvement becomes arbitrarily small). In such a situation, the current solution is a local optimal solution - a solution that is better than any other feasible solution in its immediate, or local, vicinity. However, a given local optimal solution might not be the best possible, or global optimal, solution to a problem. Another local optimal solution in some other area of the feasible region could be the best possible solution to the problem.

Two important points about the GRG and all other NLP algorithms:

NLP algorithms can terminate at a local optimal solution that might not be the global optimal solution to the problem
The local optimal solution at which an NLP algorithm terminates depends on the initial starting point

However, many NLP problems have a single local optimal solution that, by definition, must also be the global optimal solution. So in many problems, NLP algorithms will locate the global optimal solution. But, as a general rule, we will not know whether the solution obtained for an NLP problem is the global optimal solution. Therefore, it is usually a good idea to try starting NLP algorithms from different points to determine if the problem has different local optimal solutions. This procedure often reveals the global optimal solution.

A note about "optimal" solutions:

When solving an NLP problem, Solver normally stops when the first of three numerical tests is satisfied, causing one of the following three completion messages to appear:

"Solver found a solution. All constraints and optimality conditions are satisfied." This means Solver found a local optimal solution, but does not guarantee that the solution is the global optimal solution. You should run Solver from several different starting points to increase the chances that you find the global optimal solution to your problem
"Solver has converged to the current solution. All constraints are satisfied." This means the objective function value changed very slowly for the last few iterations. If you suspect the solution is not a local optimal solution, your problem may be poorly scaled. The convergence option in the Solver Options dialog box can be reduced to avoid convergence at suboptimal solutions
"Solver cannot improve the current solution. All constraints are satisfied." This rare message means that your model is degenerate and the Solver is cycling. Degeneracy can often be eliminated by removing redundant constraints in a model

A note about starting points:

Solver sometimes has trouble solving an NLP problem if it starts at the null starting point, where all the decision variables are set equal to 0 - even if this solution is feasible. Therefore, when solving an NLP problem, it is best to specify a non-null starting solution whenever possible.

Economic Order Quantity Models:

The economic order quantity (EOQ) problem is one of the most common business problems for which nonlinear optimization can be used. This problem is encountered when a manager must determine the optimal number of units of a product to purchase whenever an order is placed. The basic model for an EOQ problem makes the following assumptions:

Demand for (or use of) the product is fairly constant throughout the year
Each new order is delivered in full when the inventory level reaches 0

The key issue in an EOQ problem is to determine the optimal quantity to order whenever an order is placed for an item.
In the basic EOQ model, the total annual cost of stocking a product is computed as the sum of the actual purchase cost of the product, plus the fixed cost of placing orders, plus the cost of holding (or carrying) the product in inventory. The goal in this type of problem is to find the EOQ that minimizes the total cost.
The total annual cost of acquiring products that meet the stated assumptions is represented by:

Total annual cost = DC + (D/Q)S + (Q/2)Ci

Where:
D = annual demand for the item
C = unit purchase cost for the item
S = fixed cost of placing an order
i = cost of holding one unit in inventory for a year (expressed as a percentage of C)
Q = order quantity, or quantity ordered each time an order is placed

A note about the assume linear model option:

When solving an NLP problem, it is important not to select the standard simplex LP option because this option causes Solver to attempt to apply the simplex method to the problem. When this option is selected, Solver conducts a number of internal tests to verify that the model is truly linear in the objective and constraints. If this option is selected and Solver's tests indicate that the model is not linear, a dialog box appears indicating that the conditions for linearity are not satisfied.

Comments on the EOQ model:

There is another way to determine the optimal order quantity using the simple EOQ model. Using calculus, it can be shown that the optimal value of Q is represented by:

Q = (2DS/Ci)^1/2

Although the previous EOQ formula has its uses, we often must impose financial or storage space restrictions when determining optimal order quantities. The previous formula does not explicitly allow for such restrictions, but it is easy to impose these types of restrictions using Solver.

Location Problems:

A number of decision problems involve determining the location of facilities or service centers. Examples might include determining the optimal location of manufacturing plants, warehouses, fire stations, or ambulance centers. The objective in these types of problems is often to determine a location that minimizes the distance between two or more service points. Problems involving this type of distance measure are nonlinear.You might recall from basic algebra that the straight line (or Euclidean) distance between two points (X1, Y1) and (X2, Y2) on a standard X-Y graph is defined as:

Distance = [(X1 - X2)^2 + (Y1 - Y2)^2]^1/2

Some comments about the solution to location problems:

When solving location problems, it is possible that the location indicated by the optimal solution may simply not work. For instance, the land at the optimal location may not be for sale, the "land" at this location may be a lake, the land may be zoned for residential purposes only, etc. However, solutions to location problems often give decision makers a good idea about where to start looking for suitable property to acquire for the problem at hand. It may also be possible to add constraints to location problems that eliminate certain areas from consideration if they are inappropriate or unavailable.

Nonlinear network flow problem:

The constraints in network flow models have a special structure in which the flow into a node must be balanced with the flow out of the same node. Numerous decision problems exist in which the balance of flow restrictions must be maintained while optimizing a nonlinear objective function. There are a number of other areas to which this type of nonlinear network flow model can be applied. Analysis are often interested in determining the "weakest link" in a telecommunication network or production system. The same type of problem could be solved to determine the least reliable path through these types of networks.

Project selection problem:
In a project selection problem we want to select the combination of projects that produce the greatest net present value (NPV) subject to various resource restrictions. In these types of problems, there is often some uncertainty about whether a selected project can actually be completed successfully, and this success might be influenced by the amount of resources devoted to the project. When you have a choice between using linear constraints and nonlinear constraints, it is almost always better to use the linear constraints.

Optimizing existing financial spreadsheet models:
In optimization, we have always constructed an algebraic model of a decision problem and then implemented this model in a spreadsheet for solution and analysis. However, we can apply optimization techniques to virtually any existing spreadsheet model. Many existing spreadsheets involve financial calculations that are inherently nonlinear.

Comments on optimizing existing spreadsheets:

One difficulty in optimizing an existing spreadsheet model is determining whether the underlying algebraic model is linear or nonlinear. This is important in determining whether a global optimal solution to the problem has been obtained. If we instruct Solver to assume that the model is linear, it conducts a series of numerical tests to determine whether this assumption is appropriate. If Solver detects that the model is not linear, a message is displayed to that effect and we need to re-solve the model as an NLP. So when applying optimization techniques to an existing spreadsheet model, it is a good idea to instruct Solver to assume that the model is linear. If Solver can find a solution under this assumption, we can be confident that it is the global optimal solution. If Solver detects that the model is nonlinear, we must be aware that any solution obtained might represent a local optimal solution as opposed to the global optimal solution.

I this case, we might re-solve the model several times from different starting points to see if better local optimal solutions exist for the problem. (Note that if a problem is poorly scaled, Solver's linearity tests will sometimes indicate that the model is not linear when, in fact it is.)

As you develop your skills and intuition about spreadsheet optimization, you might be inclined to skip the step of writing out algebraic formulations of your models. For straightforward problems, this might be appropriate. However, in more complex problems, this can lead to undesirable results. For example, how it can be tempting to implement the binary variables in a fixed-charge problem using an IF() function in a spreadsheet. Unfortunately, this causes Solver to view the problem as an NLP rather than as a mixed-integer LP problem. So, how you implement the model for a problem can impact whether Solver finds the global optimal solution. As the model builder, you must understand what kind of model you have and implement it in the most appropriate way. Writing out the algebraic formulation of the model often helps to ensure that you thoroughly understand the model you are attempting to solve.

The Portfolio selection problem:
One of the most famous applications of NLP involves determining the optimal mix of investments to hold in a portfolio in order to minimize the risk of the portfolio while achieving a desired level of return. One way to measure the risk inherent in an individual investment is the variance (or, alternatively, the standard deviation) of the returns it has generated over a period of time. One of the key objectives in portfolio selection is to smooth out the variation in the return on a portfolio by choosing investments whose returns tend to vary in opposite directions. That is, we'd like to choose investments that have a negative covariance or negative correlation so that when one investment generates a lower-than-average return, another investment in our portfolio offsets this with a higher-than-average return. This tends to make the variance of the return of the portfolio smaller than that of any individual investment.

Portfolio theory stipulates that for each possible level of investment return, there is a portfolio that minimizes risk, and accepting any greater level of risk at that level of return is inefficient. Alternatively, for each level of investment risk, there is a portfolio that maximizes the return, and accepting any lower level of return at this level of risk is also inefficient.

Whether you attempt to minimize risk subject to a certain required rate of return, or maximize the return subject to a given level of risk, the solutions obtained may still be inefficient. The optimal trade-off between risk and return for a given portfolio problem can be summarized by a graph of the efficient frontier, which plots the minimal portfolio risk associated with each possible level of return.

This graph is helpful not only in identifying the maximum level of risk that should be accepted for each possible level of return, but also in identifying where further increases in expected returns incur much greater amounts of risk.

Handling conflicting objectives in portfolio problems:
There are two different conflicting objectives that can be applied to portfolio selection problems: minimizing risk (portfolio variance) and maximizing expected returns. One way of dealing with these conflicting objectives is to solve the following problem:

MAX: (1 - r)(Expected Portfolio Return) - r(Portfolio Variance)
Subject to: Sum Pi = 1
Pi >= 0 for all i

Here, the Pi again represent the percentages of money we should invest in each stock in the portfolio and r is a constant between 0 and 1 representing the investor's aversion to risk (or the risk aversion value). When r = 1 (indicating maximum risk aversion), the objective function attempts to minimize the portfolio variance. Conversely, when r = 0 (indicating a total disregard of risk), the objective attempts to maximize the expected portfolio return.

For values of r between 0 and 1, Solver will always attempt to keep the expected return as large as possible and the portfolio variance as small as possible. As the value of the parameter r increases, more and more weight is placed on the importance of making the portfolio variance as small as possible (or minimizing risk). Thus, a risk averse investor should prefer solutions with relatively large values of r. By solving a series of problems, each time adjusting the value of r, an investor can select a portfolio that provides the greatest utility, or the optimum balance of risk and return for their own attitudes toward risk and return. Alternatively, if an investor feels minimizing risk is twice as important as maximizing returns, we can solve the problem with r = 0.667 (and(1-r)=0.333) to reflect the investor's attitudes toward risk and return.

Sensitivity Analysis:

One advantage of using the simplex method to solve LP problems is that it provides expanded sensitivity information. A certain amount of sensitivity information is also available when using nonlinear optimization methods to solve linear or nonlinear problems. If an LP problem has alternative optimal solutions, the simplex method and the nonlinear optimizer will not necessarily identify the same optimal solution. Another similarity between the two Sensitivity Reports is apparent if we compare the values in the Reduced Cost and Shadow Price columns with the values in the Reduced Gradient and Lagrange Multiplier columns. The reduced cost for each variable is the same as the reduced gradient for each variable. Similarly, the shadow price for each constraint is the same as the Lagrange multiplier for each constraint. This is not simply a coincidence.

Lagrange Multipliers:
The shadow price of a constraint represents the marginal value of an additional unit of the resource represented by the constraint - or the amount by which the objective function would improve if the RHS of the constraint is loosened by one unit. This same interpretation applies in a more approximate sense to Lagrange multipliers. The main difference between shadow prices and Lagrange multipliers involves the range of RHS values over which the shadow price or Lagrange multiplier remains valid.
After solving an LP problem using the simplex method, we can identify the allowable increase and decrease in a constraint's RHS value over which the shadow price of the constraint remains valid. We can do his because the objective function and constraints in an LP problem are all linear, making the impact of changes in a constraint's RHS value on the objective function value relatively easy to compute. However, in NLP problems, we have no general way to determine such ranges for the RHS values of the constraints. So, when using Solver's nonlinear optimizer to solve an optimization problem, we cannot easily determine the range of RHS values over which a constraint's Lagrange multiplier will remain valid. The Lagrange multipliers can be used only to estimate the approximate impact on the objective function of changing a constraint's RHS value by small amounts.

Reduced Gradients:
The reduced cost of a variable that assumes its simple lower (or upper) bound in the optimal solution generally represents the amount by which the objective function would be reduced (or improved) if this variable were allowed to increase by one unit. Again, this same interpretation applies in a more approximate sense to reduced gradient values. In particular, nonzero reduced gradient values indicate the approximate impact on the objective function value of very small changes in the value of a given variable.
An LP problem can be viewed as a special type of nonlinear programming problem where the objective function and constraints are linear.
====================================================

Nonlinear Programming and Evolutionary Optimization (Part II)

Solver options for solving NLPs:

Although we can represent an LP problem by a highly structured, and relatively simple, objective function and set of constraints, the objective function and constraints in an NLP problem can be virtually any mathematical function. Thus, it is not uncommon to encounter difficulties while trying to solve NLP problems.

Solver provides several options for controlling how it solves NLPs. These options - Estimates, Derivatives, and Search - are located at the bottom of the Solver Options dialog box. The default settings for these options work well for many problems. However, if you have difficulty solving an NLP, you might try changing these options to Solver to use a different search strategy. A complete description of these options would require an in-depth understanding of calculus.
As Solver searches for an optimal solution to an NLP, it terminates if the relative change in the objective function value for several iterations is smaller that the convergence factor. If you think Solver is stopping too quickly as it converges on an optimal solution, you should reduce the convergence setting.
The Estimates option determines how Solver estimates the values of the decision variables while searching for improved solutions. The default setting, Tangent, estimates values using a linear extrapolation technique, whereas the alternate setting, Quadratic, uses a nonlinear extrapolation technique.
The Derivatives option determines how Solver estimates derivatives. When using the default setting, Solver obtains estimates of first derivatives at a point by perturbing the point once in a forward direction and computing the rise over the run.
With the Central setting, Solver obtains estimates of first derivatives by perturbing away from a point in both a backward and forward direction and computes the rise over the run between the two points. The Central setting requires twice as many recalculations as the Forward option but can improve the estimates of the derivatives, yielding better search directions and often fewer iterations. However, the difference in accuracy is usually not worth the extra effort, hence the default is Forward.
The Search option determines how Solver chooses a search direction along which to seek a feasible point with an improved objective value. The default setting, Newton, causes Solver to use the Broyden-Fletcher-Goldfarb-Shanno Quasi-Newton method to identify search directions. The Conjugate setting instructs Solver to use the conjugate gradient method.
Scaling problems often affect how easily Solver can solve a problem. Thus, selecting the Use Automatic Scaling option is also a possible remedy to try if Solver encounters difficulty in solving an NLP.

Evolutionary Algorithms:
In recent years, one of the most interesting and exciting developments in the field of optimization has centered on research into the area of Evolutionary (or genetic) algorithms. Inspired by ideas from Darwin's theory of evolution, researchers interested in mathematical optimization have devised heuristic search techniques that mimic processes in biological reproduction and apply the principle of 'survival of the fittest' to create general purpose optimization engines.
In a nutshell, genetic algorithms (GAs) start with a set of chromosomes (numeric vectors) representing possible solutions to an optimization problem. The individual components (numeric values) within a chromosome are referred to as genes. New chromosomes are created by crossover and mutation. Crossover is the probabilistic exchange of values between solution vectors. Mutation is the random replacement of values in a solution vector. Chromosomes are then evaluated according to a fitness (or objective) function with the fittest surviving into the next generation. The result is a gene pool that evolves over time to produce better and better solutions to a problem.
The fitness of each new chromosomes is then calculated and compared against the fitness of the corresponding chromosomes in the original population with the most fit chromosome surviving into the next population. Various procedures can be used to implement the crossover, mutation, and survival of the fittest.

To a certain extent, Solver's evolutionary algorithm picks up where its nonlinear GRG algorithm leaves off. For nonlinear problems, the solution Solver generates depends on the starting point and may be a local rather than global optimum solution. Also, Solver tends to have difficulty solving problems with discontinuities and unsmooth landscapes, which are typical of spreadsheet models employing logical IF() and/or Lookup tables. Although the evolutionary algorithm cannot completely avoid the possibility of becoming trapped at a local optimal solution, its use of a randomized initial gene pool and probabilistic crossover and mutation operators make this occurrence less likely. Moreover, the evolutionary algorithm can operate on virtually any spreadsheet model - even those containing IF() functions, Lookup tables, and custom macro functions.

Beating the Market:
Solver can be used to create investment portfolios that provide a certain level of return at minimum risk. However, some portfolio managers have a different investment objective in mind; namely, maximizing the probability of beating the market's return.
The IF() functions create discontinuities in the objective, causing Solver's GRG algorithm to be fairly ineffective. Therefore, the problem can be solved using Solver's Evolutionary algorithm in a effective way.

The Traveling Salesperson Problem:
The Traveling Salesperson Problem (TSP) is one of the most famous problems in the field of management science. This problem can be described succinctly as follows:

A salesperson wants to find the least costly (or shortest) route for visiting clients in n different cities, visiting each city exactly once before returning home.

Although this problem is very simple to state, it becomes extremely difficult to solve as the number of cities increase. In general, for an n-city TSP, there are (n - 1)! possible routes (or tours) the salesman can take (where (n - 1)! = (n - 1) x (n - 2) x (n - 3) x . . . x 2 x 1.
Because TSPs are so difficult, heuristic solution techniques (like genetic algorithms) are often used to solve these problems.

Although it is unlikely that many traveling salespersons really care about solving this type of problem, there are numerous other examples of practical business problems that can be described in the general form of a TSP.
If you imagine the drill bit representing a salesperson and each of the required hole locations as representing cities the drill bit must visit, it is easy to see that this is a TSP.
Because the solution to a TSP requires n cities to be visited exactly once, the length of the optimal tour will not change regardless of which city is selected as the starting point. There may be alternate optimal tours, but they will all have the same objective function value. So by selecting a starting point for a TSP, we reduce the number of possible solutions in the TSP from n! to (n - 1)! which, becomes quite significant as n increases.

A permutation is simply a rearrangement of the elements of a set. The Premium Solver supports a special types of constraint for changing cells known as the "alldifferent" constraint. The alldifferent constraint can be applied to a contiguous range of N changing cells and instructs Solver to only use a permutation of the set of integers from 1 to N in those cells. The "alldifferent" constraint used in combination with Solver's evolutionary optimizer allows us to model and solve a number of very challenging but practical business problems involving the optimal sequencing of jobs or activities.

In general, the function INDEX(range, row number, column number) returns the value in the specified row number and column number of the given range.

A note on Premium Solver's "alldifferent" Constraint: Premium Solver's alldifferent constraint can be applied to a contiguous range of N changing cells and instructs Solver to only use a permutation of the set of integers from 1 to N in those cells. Currently, you may not place any other bounds or restrictions on the cells covered by an alldifferent constraint. Thus, if you need to determine the optimal permutation of a set of integers from, for example, 21 to 28 you can:

Apply the "alldifferent" constraint to a set of eight changing cells (so Solver will generate permutations of 1 to 8 in these cells), and,
Place formulas in another set of eight cells that add 20 to the values Solver generates for the "alldifferent" changing cells

It is important to remember that Solver's evolutionary algorithm randomly generates the initial population of solutions and uses probabilistic crossover and mutation operations. Indeed, with large TSP type problems, if you run Solver's evolutionary optimizer several times, it will likely locate better and better solutions to the problem until the global optimal solution is found. Such is the nature of heuristic optimization techniques!
Evolutionary algorithms are one of the most exciting developments in the field of optimization in recent years. Solver's evolutionary search capabilities will undoubtedly continue to be refined and improved in future software releases.

Summary:
The steps involved in formulating and solving an NLP problem are not very different from those required to solve an LP problem - the decision variables are identified and an objective function and any constraints are stated in terms of the decision variables. Because the objective function and constraints in an NLP problem might be nonlinear, the calculations involved in solving NLP problems are different from those included in the simplex method, which is used most often to solve LP problems. NLP problems sometimes have several local optimal solutions. Thus, finding the global optimal solution to a difficult NLP problem might require re-solving the model several times using different initial starting points. A good starting point can be a critical factor in the successful use of nonlinear programming.
Evolutionary (or genetic) algorithms use random search techniques and the principle of survival of the fittest to solve difficult optimization problems for which linear and nonlinear optimization techniques are not suitable. Research into genetic algorithms is ongoing, but this promises to become a very useful and powerful optimization tool for business.
=====================================================

Regression Analysis (Part I):
Regression analysis is a modeling technique for analyzing the relationship between a continuous (real-valued) dependent variable Y and one or more independent variables X1, X2, . . . , Xk. The goal in regression analysis is to identify a function that describes, as closely as possible, the relationship between these variables so that we can predict what value the dependent variable will assume given specific values for the independent variables. Regression analysis provides the tool for making predictions.

In order to identify a function that describes the relationship between X and Y, we first need to collect sample data to analyze. Then, the data are displayed in scatter diagram. If the data positioned in a linear shape there is a strong linear relationship between X and Y.

Because of the scattering of points, this type of graph is called a scatter diagram or scatter plot.

Creating a Scatter Plot:

To create a scatter plot follow the procedure as below:

Select the data in relevant cells (X and Y)
Click the Insert menu
Click Chart
Click XY (Scatter)

Excel's Chart Wizard then prompts you to make a number of selections concerning the type of chart you want and how it should be labeled and formatted. After Excel creates a basic chart, you can customize it in many ways. Double-click a chart element to display a dialog box with options for modifying the appearance of the element.

Regression Models:

We will formalize the somewhat imprecise nature of a statistical relationship by adding an error term to what is otherwise a functional relationship. That is, in regression analysis, we consider models of the form:

Y = f(X1, X2, . . ., Xk) + e (Equation 1)

Where e represents a random disturbance, or error, term. The above equation is a regression model. The regression model in the equation indicates that for any values assumed by the independent variables X1, . . ., Xk there is a probability distribution that describes the possible values that can be assumed by the dependent variable Y.

The probability distributions denote the unsystematic variation in the dependent variable Y at different levels of the independent variable. This represents random variation in the dependent variable (represented by e in above equation) that cannot be accounted for by the independent variable.

If we want to predict what value the dependent variable Y would assume at some level of the independent variable, the best estimate we could make is given by the regression function. That is, our best estimate of the value Y will assume at a given level of the independent variable X1 is the mean (or average) of the distribution of values for Y at that level of X1.

The actual value assumed by the dependent variable is likely to be somewhat different from our estimate because there is some random, unsystematic variation in the dependent variable that cannot be accounted for by our regression function. The difference between the actual value of Y and our predicted value of Y would, on average, tend toward 0. For this reason, we can assume that the error term e in the equation has an average, or expected, value of 0 if the probability distributions for the dependent variable Y at the various levels of the independent variable are normally distributed.

Simple Linear Regression Analysis:

The following simple linear regression model might be an appropriate choice for describing the relationship between X and Y:

Yi = (beta0) + (beta1)X1i + ei (Equation 2) or Yi = b0 + b1X1i + ei

In above equation, Yi denotes the dependent variable value for the ith observation, X1i denotes the independent variable value associated with Yi, and ei is an error term indicating that Yi might not always equal beta0 + beta1X1i.

The parameter b0 represents a constant value (sometimes referred to as the Y-intercept because it represents the point where the line goes through the Y-axis) and b1 represents the slope of the line (that is, the amount by which the line rises or falls per unit increase in X1).beta0 and beta1 are sometimes referred to as population parameters. By taking a sample of Y values at selected levels of X1 we can estimate the values of the population parameters. We will identify the estimated values of beta0 and beta1 as b0 and b1, respectively. The remaining problem is to determine the best values of b0 and b1 from our sample data.

Defining "Best Fit":
An infinite number of values could be assigned to b0 and b1. So, searching for the exact values for b0 and b1 to produce the line that best fits our data might seem like searching for a needle in a haystack - and it is certainly not something we want to do manually. To have the computer estimate the values for b0 and b1 that produce the line that best fits our data, we must give it some guidance and define what we mean by the best fit.
We will use the symbol yi to denote our estimated, or fitted, value of Yi, which is defined as:

yi = b0 + b1X1i (Equation 3)

We want to find the values for b0 and b1 that make all the estimated sales values (yi) as close as possible to the actual sales values (Yi). In most regression problems, it is impossible to find a function that fits the data perfectly because most data sets contain some amount of unsystematic variation.
Although we are unlikely to find values for b0 and b1 that will allow us to fit our data perfectly, we will try to find values that make the differences between the estimated values for the dependent variable and the actual values for the dependent variable (Yi - yi) as small as possible. We refer to the difference Yi - yi as the estimation error for observation i because it measures how far away the estimated value yi is from the actual value Yi. The estimation errors in a regression problem are also referred to as residuals.
Although different criteria can be used to determine the best values for b0 and b1, the most widely used method determines the values that minimize the sum of squared estimation errors - or error sum of squares (ESS) for short. That is, we will attempt to find values for b0 and b1 that minimize:

ESS = Sum of squared estimation errors = Sum [Yi - (b0 + b1X1i)]^2

Thus, minimizing ESS seems to be a good objective to use in searching for the best values of b0 and b1. Because regression analysis finds the values of the parameter estimates that minimize the sum of squared estimation errors, it is sometimes referred to as the method of least squares.

Solving the problem using the Regression Tool:
In addition to Solver, Excel provides another tool for solving regression problems that is easier to use and provides more information about a regression problem.
Before you can use the regression tool in Excel, you need to make sure that the Analysis ToolPak add-in is available. You can do this by completing the following steps:

Click the Tools menu
Click Add-Ins
Click Analysis Toolpak

After ensuring that the Analysis Toolpak is available, you can access the regression tool by completing the following steps:

Click the Tools menu
Click Data Analysis
Click Regression

After you choose the Regression command, the Regression dialog box appears. This dialog box presents many options and selections; at this point, we will focus on only three options: the Y-range, the X-range, and the Output-Range. The Y-range corresponds to the range in the spreadsheet containing the sample observations for the dependent variable. The X-range corresponds to the range in the spreadsheet containing the sample observations for the independent variable. Thus we can calculate the parameter estimates for a regression function using either Solver or the regression tool. The advantage of the regression tool is that it does not require us to set up any special formulas or cells in the spreadsheet, and it produces additional statistical results about the problem under study.

Evaluating the Fit:

We can also use the TREND( ) function in Excel to compute the yi (estimated values) values. This TREND( ) function computes the least squares linear regression line using a Y-range and an X-range. It then uses this regression function to estimate the value of Y using the value of X. Thus, using the TREND( ) function, we don't have to worry about typing the wrong values for the estimated intercept or slope.

To insert this estimated trend line on the existing scatter plot:

Right-click on any of the data points in the scatter plot to select the series of data
Select Add Trendline
On the Type card, click Linear
On the Options card, select the "Display equation on chart" and "Display R-squared value on chart."
Click OK

A Note on the TREND( ) Function:
The TREND( ) function can be used to calculate the estimated values for linear regression models. The format of the TREND( ) function is as follows:

TREND(Y-range, X-range, X-value for prediction)

where Y-range is the range in the spreadsheet containing the dependent Y variable, X-range is the range in the spreadsheet containing the independent X variable(s), and X-value for prediction is a cell (or cells) containing the values for the independent X variable(s) for which we want an estimated value of Y. The TREND( ) function has an advantage over the regression tool in that it is dynamically updated whenever any inputs to the function change. However, it does not provide the statistical information provided by the regression tool. It is best to use these two different approaches to doing regression in conjunction with one another.

The R Square (R^2) Statistic:
The value labeled "R square" provides a goodness-of-fit measure. This value represents the R^2 statistic (also referred to as the coefficient of determination. This statistic ranges in values from 0 to 1 (0<=R^2<=1) and indicates the proportion of the total variation in the dependent variable Y around its mean (average) that is accounted for by the independent variable(s) in the estimated regression function.
The total variation in the dependent variable Y around its mean (Y) is described by a measure known as the total sum of squares (TSS), which is defined as: TSS = Sum (Yi - Y)^2
The TSS equals the sum of the squared differences between each observation Yi in the sample and the average value of Y, denoted in above equation Y.

The total sum of squares (TSS) can be decomposed into the following two parts:

TSS = ESS + RSS

ESS is the quantity that is minimized in least squares regression. ESS represents the amount of variation in Y around its mean that the regression function cannot account for, or the amount of variation in the dependent that is unexplained by the regression function. Therefore, the regression sum of squares (RSS) represents the amount of variation in Y around its mean that the regression function can account for, or the amount of variation in the dependent variable that is explained by the regression function.
Now consider the following definitions of the R^2 statistic:

R^2 = (RSS/TSS) = 1 - (ESS/TSS)

If ESS = 0 (which can occur only if the regression function fits the data perfectly), then TSS = RSS and, therefore, R^2 = 1. On the other hand, if RSS = 0 (which means that the regression function was unable to explain any of the variation in the behavior of the dependent variable Y), then TSS = ESS and R^2 = 0. So, the closer the R^2 statistic is to the value 1, the better the estimated regression function fits the data.
R^2 statistic is approximately 0.969. This indicates that approximately 96.9% of the total variation in our dependent variable around its mean has been accounted for by the independent variable in our estimated regression function. Because this value is fairly close to the maximum possible R^2 value (1), this statistic indicates that the regression function we have estimated fits our data well.
The multiple R statistic represents the strength of the linear relationship between actual and estimated values for the dependent variable. As with the R^2 statistic, the multiple R statistic varies between 0 and 1 with values near 1 indicating a good fit.

Making Predictions:
Using the estimated regression in equation 3 we can make predictions about the Y variable for different levels of advertising expenditures.

The Standard Error:
A measure of the accuracy of the prediction obtained from a regression model is given by the standard deviation of the estimation errors - also known as the standard error, Se.

The standard error measures the amount of scatter, or variation, in the actual data around the fitted regression function. The standard error is useful in evaluating the level of uncertainty in predictions we make with a regression model. As a very rough rule-of-thumb, there is approximately a 68% chance of the actual level of sales falling within +_1 standard error of the predicted value Yi. Alternatively, the chance of the actual level of sales falling within +_2 standard errors of the predicted value Yi is approximately 95%.

A note about Extrapolation:
Predictions made using an estimated regression function might have little or no validity for values of the independent variable. Thus, we cannot assume that our model will give accurate estimates of dependent variable (Y) levels at independent variable significantly above or below this range of values, because the relationship between X and Y might be quite different outside this range.

Assumptions for the Statistical Tests:
The methods for constructing confidence intervals are based on important assumptions concerning the simple linear regression model presented in equation 2 above. We assumed that the error terms ei are independent, normally distributed random variables with expected (or mean) values of 0 and constant variances. Thus, the statistical procedures for constructing intervals and performing t-test apply only when these assumptions are true for a given set of data. As long as these assumptions are not seriously violated, the procedures described offer good approximations of the desired confidence intervals and t-test. Various diagnostic checks can be performed on the residuals (Yi - yi) to see whether or not our assumptions concerning the properties of the error terms are valid. Excel also provides basic diagnostics that can be helpful in determining whether assumptions about the error terms are violated. Regression dialog box provides two options for producing graphs that highlight serious violations of the error term assumptions. These options are Residual Plots and Normal Probability Plots.

If the error terms in equation 2 above are normally distributed random variables, the dependent variable in equation 2 is a normally distributed random variable prior to sampling. Thus one way to evaluate whether we can assume that the error terms are normally distributed is to determine if we can assume that the dependent variable is normally distributed. The normal probability plot provides an easy way to evaluate whether the sample values on the dependent variable are consistent with the normality assumption. A plot that is approximately linear supprots the assumption of normality.
If the residual plot shows a systematic tendency for the residuals to be positive or negative, this indicates that the function chosen to model the systematic variation between the dependent and independent variables is inadequate and that another functional form would be more appropriate.
If the residual plot indicates that the magnitude of the residuals is increasing (or decreasing) as the value of the independent variable increases, we would question the validity of the assumption of constant error variances. In some cases, a simple transformation of the dependent variable can correct the problem of non-constant error variances.

Results:
The t-statistic and p-value provide another way of testing whether beta1 = 0. According to statistical theory, if beta1 = 0, then the ratio of b1 to its standard error should follow a t-distribution with n-2 degrees of freedom. Thus, the t-statistic for testing if beta1 = 0 is: t-statistic = (b1/standard error of b1)
The p-value indicates the probability of obtaining an outcome that is more extreme than the observed test statistic value if beta1 = 0. In this case, the p-value is 0, indicating that there is virtually no chance that we will obtain an outcome as large as the observed value for b1 if the true value of beta1 is 0.
The t-statistic, p-value, and confidence interval for the intercept beta0 would be interpreted in the same way as demonstrated for beta1. The p-value for beta0 indicates that we have a for example 13.689% chance of obtaining an outcome more extreme than the observed value of b0 if the true value of beta0 is 0.

Analysis of Variance:
The analysis of variance (ANOVA) results provide another way of testing whether or not beta1 = 0. The values in the MS column in the ANOVA table represent values known as the mean squared regression (MSR) and mean squared error (MSE), respectively.

The value Significance F is similar to the p-values described earlier, and indicates the probability of obtaining a value in excess of the observed value for the F-statistic if beta1 = 0. In this case, the significance of F is 0, indicating that there is virtually no chance that we would have obtained the observed value for b1 if the true value of beta1 is 0. Therefore, we conclude that the true value of beta1 is not equal to 0.

The F-statistic tests whether or not all of the betai for all of the independent variables in a regression model are all simultaneously equal to 0. A simple linear regression model contains only one independent variable. In this case, the tests involving the F-statistic and the t-statistic are equivalent.

A note about statistical tests:
Regardless of the form of the distribution of the error terms, least squares regression can always be used to fit regression curves to data in order to predict the value the dependent variable will assume for a given level of the independent variables. Many decision makers never bother to look at residual plots or to construct confidence intervals for parameters in the regression models for the predictions they make. However, the accuracy of predictions made using regression models depends on how well the regression function fits the data. At the very least, we should always check to see how well a regression function fits a given data set. We can do so using residual plots, graphs of the actual data versus the estimated values, and the R^2 statistic.
=====================================================

Regression Analysis (Part II)
Introduction to multiple regression:

We have seen that regression analysis involves identifying a function that relates the systematic changes in a continuous dependent variable to the values of one or more independent variables. That is, our goal in regression analysis is to identify an appropriate representation of the function f(.) in:

Y = f(X1, X2, . . ., Xk) + e (Equation 1)

In some situations, a business person is far more likely to encounter situations involving more than one (or multiple) independent variables. We'll now consider how multiple regression analysis can be applied to these situations. We'll focus our attention on the multiple linear regression function represented by:

Yi = b0 + b1X1i + b2X2i + . . . + bkXki (Equation 2)

Yi represents the estimated value for the ith observation in our sample whose actual value is Yi. The symbols X1i, X2i, . . ., Xki represent the observed values of the independent variables associated with observation i. With two independent variables, we fit a plane to our data. With three or more independent variables, we fit a hyperplane to our data. It is difficult to visualize or draw graphs in more than three dimensions, so we cannot actually see what a hyperplane looks like. However, just as a plane is a generalization of a straight line into three dimensions, a hyperplane is a generalization of a plane into more than three dimensions.

Regardless of the number of independent variables, the goal in multiple regression analysis is the same as the goal in a problem with a single independent variable. That is, we want to find the values for b0, b1, . . ., bk in equation 2 that minimize the sum of squared estimation errors represented by:

ESS = Sum (Yi - yi)^2

We can use the method of least squares to determine the values for b0, b1, . . ., bk that minimize ESS.

To determine if the multiple linear regression function in equation 2 is appropriate for the data, we should first construct scatter plots between the dependent variable and each independent variable. These graphs seem to indicate a linear relationship between each independent variable and the dependent variable. Thus, we have reason to believe that a multiple linear regression function would be appropriate for these data.

Selecting the Model:

The best model is often the simplest model that accurately reflects the relevant characteristics of the problem being studied. This is particularly true in multiple regression models. The fact that a particular problem might involve numerous independent variables does not necessarily mean that all of the variables should be included in the regression function. If the data used to build a regression model represents a sample from a larger population data, it is possible to over-analyze or overfit the data in the sample. That is, if we look too closely at a sample of data, we are likely to discover characteristics of the sample that are not representative (or which do not generalize) to the population from which the sample was drawn. This can lead to erroneous conclusions about the population being sampled. To avoid the problem of overfitting when building a multiple regression model, we should attempt to identify the simplest regression function that adequately accounts for the behavior of the dependent of the variable we are studying.

Before issuing the Regression command again, we would need to rearrange the data in the spreadsheet so that the values for X1 and X3 are located next to each other in one contiguous block. The regression tool in Excel (and in most other spreadsheet software packages) requires that the X-range be represented by one contiguous block of cells.

Important Software Note:

When using the regression tool, the values for the independent variables must be listed in adjacent columns in the spreadsheet and cannot be separated by any intervening columns. That is, the Input X-range option in the Regression dialog box must always specify a contiguous block of numbers.

Inflating R^2:

Adding other independent variables to the simple linear regression model cause the R^2 statistic to increase. This should not be surprising. As it turns out, the value of R^2 can never decreases as a result of adding an independent variable to a regression function.

When you add any independent variable to a regression function, the value of the R^2 statistic can never decrease and will usually increase at least a little. Therefore, we can make the R^2 statistic arbitrarily large simply by including enough independent variables in the regression function - regardless of whether or not the new independent variables are related at all to the dependent variable.

The Adjusted-R^2 Statistic:

The value of the R^2 statistic can be inflated artificially by including independent variables in a regression function that have little or no logical connection with the dependent variable. Thus, another goodness-of-fit measure, known as the adjusted-R^2 statistic (denoted by Ra^2), has been suggested which accounts for the number of independent variables included in a regression model. The adjusted-R^2 statistic is defined as:

Ra^2 = 1 - (ESS/TSS)[(n - 1)/(n - k - 1)]

where n represents the number of observations in the sample, and k represents the number of independent variables in the model. As variables are added to a regression model, the ratio of ESS to TSS in above equation will decrease (because ESS decreases and TSS remains constant), but the ratio of (n - 1) to (n - k - 1) will increase (because (n - 1) remains constant and (n - k - 1) decreases). Thus, if we add a variable to the model that does not reduce ESS enough to compensate for the increase in k, the adjusted-R^2 value will decrease. The adjusted-R^2 value can be used as a rule-of-thumb to help us decide if an additional independent variable enhances the predictive ability of a model or if it simply inflates the R^2 statistic artificially. However, using the adjusted-R^2 statistic in this way is not foolproof and requires a good bit of judgment on the part of the person performing the analysis. The model with the highest adjusted-R^2 always has the smallest standard error. For this reason, the adjusted-R^2 statistic is sometimes the sole criterion used to select which multiple regression model to use in a given problem.

Multicollinearity:

The term multicollinearity is used to describe the situation when the independent variables in a regression model are correlated among themselves. Multicollinearity tends to increase the uncertainty associated with the parameters estimates (bi) in a regression model and should be avoided whenever possible.

Making prediction in multiple regression model:

In the case of a multiple regression model, the techniques used to construct prediction intervals require a basic knowledge of matrix algebra. The interested reader should consult advanced texts on multiple regression analysis for a description of how to construct more accurate prediction intervals using multiple regression models.

Binary independent variables:

For example, The appraiser might want to include other independent variables in her analysis. Some of these, such as age of the roof, could be measured numerically and be included as an independent variable. But how would we create variables to represent the presence of a swimming pool or the condition of the roof?

The presence of a swimming pool can be included in the analysis with a binary independent variable coded as:

Xpi = 1, if house i has a pool - Xpi = 0, otherwise

The condition of the roof could also be modeled with binary variables. Here, however, we might need more than one binary variable to model all the possible conditions. If some qualitative variable can assume p possible values, we need (p - 1) binary variables to model the possible outcomes.

For example, suppose that the condition of the roof could be rated as good, average, or poor. There are three possible values for the variable representing the condition of the roof; therefore, we need two binary variables to model these outcomes. Thus, we need only two binary variables to represent three possible roof conditions. These binary variables are coded as:

Xri = 1, if the roof of house i is in good condition - Xri = 0, otherwise

X(r+1)i = 1, if the roof of house i is in average condition- X(r+1)i = 0, otherwise

We can use binary variables as independent variables in regression analysis to model a variety of conditions that are likely to occur. In each case, the binary variables would be placed in the X-range of the spreadsheet and appropriate bi values would be calculated by the regression tool.

Statistical tests for the population parameters:

Statistical tests for the population parameters in a multiple regression model are performed in much the same way as for the simple regression model. The F-statistic tests whether or not all of the betai for all of the independent variables are all simultaneously equal to 0 (B1 = B2 = . . . = BK = 0). Significance of F indicates the probability of this condition being true for the data under consideration. Each t-statistic can be used to test whether or not the associated population parameter Bi = 0 given all the other independent variables in the model. Furthermore, the R^2 and adjusted-R^2 statistics are purely descriptive in nature and do not depend in any way on the assumptions about the distribution of the error terms.

Polynomial Regression:
Business problems exist where there is not a linear relationship between the dependent and independent variables. Rather, more of a curvilinear relationship exists between these variables. Does this mean that linear regression analysis cannot be used with these data? Not at all. We need to use the following type of regression function:

Yi = b0 + b1X1i + b2(X^2)1i

Where Yi represents the estimated value of the ith object in our model, and X1i represents the total square value. Notice that the second independent variable in above equation is the first independent variable squared (X^21).

Expressing nonlinear relationships using linear models:
We can use least squares regression to estimate the optimal values for b0, b1, and b2. Note that if we define a new independent variable as X2i = X^21, then the regression function in above equation is equivalent to:

Yi = b0 + b1X1i + b2X2i

As long as a regression function is linear with respect to its parameters, we can use Excel's regression analysis tool use to find the least squares estimates for the parameters.
To fit the regression function in above equation to our data, we must create a second independent variable to represent the values of X2i. Because the X-range for the Regression command must be represented as one contiguous block, we inserted a new column between the square footage and selling price columns and placed the values of X2i in this column. To create the curve in the scatter plot follow as below:

Right-click on any of the data points in the scatter plot to select the series of data
Click Add Trendline
On the Type card, click Polynomial and use an Order value of 2
On the Options card, select Display Equation On Chart and Display R-squared Value on Chart
Click OK

As you might imagine, we could continue to add higher order terms to the model and further increase the value of the R^2 statistic. Here again the adjusted-R^2 statistic could help us select a model that provides a good fit to our data without overfitting the data.

Summary of Nonlinear Regression:
Regression analysis can be used not only in fitting straight lines or hyperplanes to linear data, but also in fitting other types of curved surfaces to nonlinear data. It is important to prepare scatter plots of each independent variable against the dependent variable in a regression problem to see if the relationship between the variables is linear or nonlinear. Relatively simple nonlinear relationships can often be accounted for by including squared or cubed terms in the model. In more complicated cases, sophisticated transformations of the dependent or independent variables might be required.

Summary:

Regression analysis is a statistical technique that can be used to identify and analyze the relationship between one or more independent variables and a continuous dependent variable. The goal in regression analysis is to identify a function of the independent variables that adequately accounts for the behavior of the dependent variable. The method of least squares provides a way to determine the best values for the parameters in a regression model for a given sample of data. After identifying such a function, it can be used to predict what value the dependent variable will assume given specific values for the independent variables. Various statistical techniques are available for evaluating how well a given regression function fits a data set and for determining which independent variables are most helpful in explaining the behavior of the dependent variable. Simple transformation of the independent variables can allow this type of model to fit both linear and nonlinear data sets.
=====================================================

Discriminant Analysis:
Discriminant Analysis (DA) is a statistical technique that uses the information available in a set of independent variables to predict the value of a discrete, or categorical, dependent variable. Typically, the dependent variable in a DA problem is coded as a series of integer values representing various groups to which the observations in a sample belong. The goal of DA is to develop a rule for predicting to what group a new observation is most likely to belong based on the values the independent variables assume. To gain an understanding of the purpose and value of DA, consider the following business situations where DA could be useful.

Credit scoring: The credit manager of a mortgage company classifies the loans it has made into two groups: those resulting in default and those that are current. For each loan, the manager has data describing the income, assets, liabilities, credit history, and employment history of the person who received the loan. The manager wants to use this information to develop a rule for predicting whether or not a new loan applicant will default if granted a loan.
Insurance rating: An automotive insurance company uses claims data from the past five years to classify its current policyholders into three categories: high risk, moderate risk, and low risk. The company has data describing each policyholder's age, marital status, number of children, educational level, employment record, and number of traffic citations received during the past five years. The company wants to analyze how the three groups differ with regard to these characteristics and use this information to predict into which risk category a new insurance applicant is likely to fall.

DA differs from most other predictive statistical methods (for example, regression analysis) because the dependent variable is discrete, or categorical, rather than continuous. For instance, in the first example given previously, the credit manager wants to predict whether a loan applicant will: (1) default or (2) repay the loan. Similarly, in the second example, the company wants to predict into which risk category a new client is most likely to fall: (1) high risk, (2) moderate risk, or (3) low risk. In each example, we can arbitrarily assign a number (1, 2, 3, ...) to each group represented in the problem, and our goal is to predict to which group (1, 2, 3, ...) a new observation is most likely to belong. However, the regression approach does not work in problems involving more than two groups.

Group Locations and Centroids:

The data set can be displayed graphically. The ovals outline the typical observations for group 1 and group 2. The points labeled "C1" and "C2" represent the centroids for group 1 and group 2, respectively. A group's centroid is the set of averages for the independent variables for the group. The centroid indicates where a group is centered, or located.

We might expect that the more the centroids for the groups in a DA problem differ, the easier it is to distinguish between the groups. However, if the two groups' centroids were both located at the same point, it would be difficult to tell which group an observation belonged to based on only its location, because all observations would be scattered around the same centroid point.

Calculating Discriminant Scores:
In the following regression equations, the dependent variable Y represents the estimated value of the Group variable, and X1 and X2 correspond to values for the independent variables representing the Mechanical and Verbal test scores, respectively: Y(estimated) = b0 + b1X1i + b2X2i

Regardless of the number of independent variables in a problem, the purpose of the regression equation is to combine the information available in the variables into a single-valued estimate of the Group variable. This estimated value of the Group variable is referred to as a discriminant score and is denoted by Y (estimated).

The Classification Rule:
Because the discriminant scores are not equal to our group values of 1 and 2, we need a method for translating these scores into group membership predictions.
The following classification rule can be used for this purpose:

Classification Rule: If an observation's discriminant score is less than or equal to some cutoff value, then assign it to group 1; otherwise, assign it to group 2.

The remaining problem is to determine an appropriate cutoff value. We can do so in a number of ways. One possibility is to select the cutoff value that minimizes the number of misclassifications in the analysis sample.
The distributions of discriminant scores for group 1 and group 2 are centered at Y1(estimated) and Y2(estimated) respectively. The area where the tails of the distributions overlap corresponds to observations that are most likely to be classified incorrectly. Generally, DA attempts to minimize this area of overlap. With these data conditions, it is easy to see that the midpoint between the average discriminant score for each group is likely to be the best choice for a cutoff value because this point minimizes the probability of misclassification for future observations.

This midpoint value often provides good classification results for new observations. The cutoff value is the average of the estimates of Y1 and Y2: Cutoff value = (Y1 + Y2)/2

Refining the Cutoff Value:
Because the tails of the distributions overlap, we can be fairly certain that some observations will be classified incorrectly using the classification rule stated earlier. For example, our classification rule might predict that certain job applicants will be satisfactory employees when, in fact, they will be unsatisfactory. Hiring such employees could be a costly mistake. Alternatively, our classification rule might predict incorrectly that certain applicants will be unsatisfactory when, in fact, they would be very good employees. Not hiring these applicants would represent a lost opportunity (or opportunity cost) for the company. However, these costs of misclassification are not taken into account by the midpoint cutoff value.

We can use an alternative method for specifying what cutoff value to use in our classification rule. Suppose that we define the costs of misclassification and prior probabilities as:

C(1|2) = the cost of classifying an observation into group 1 when it belongs to group 2
C(2|1) = the cost of classifying an observation into group 2 when it belongs to group 1
P1 = the prior probability that a new observation belongs to group 1
P2 = the prior probability that a new observation belongs to group 2

A general method for determining the cutoff value for the classification rule is represented by:

Cutoff value = (Y1 + Y2)/2 + S^2Yp/(Y1 - Y2) x LN (P2C(1|2)/P1C(2|1)

Where LN(.) represents the natural logarithm, and

S^2Yp = [(n1 - 1)S^2Y1 + (n2 - 1)S^2Y2]/(n1 + n2 - 2)

Where ni represents the number of observations in group i and S^2Yi represents the sample variance of the discriminant scores for group i.

Classification Accuracy:
Our ultimate goal in DA is to predict to which group a new observation (or job applicant) belongs. Because we already know which group each observation in our analysis sample belongs to, it might seem pointless to try to classify these observations. Yet, by doing so we can get some idea of how accurate our classification rule is likely to be with other new observations whose true group memberships are unknown.

The k - Group DA problem:

It might seem as if the regression approach for two-group DA could be extended easily to problems with more than two groups. For example, suppose that we have a problem with three groups (labeled A, B, and C) and one independent variable X. We could set up a dependent variable Y and and assign it a value of 1, 2, or 3 to indicate observations belonging to group A, B, or C, respectively. We might attempt to model the behavior of this dependent variable as:

Yi = b0 + b1Xi

We could then apply regression analysis to estimate values for b0 and b1 so that we could calculate the discriminant scores Yi.

In general, the regression function in above equation cannot be used for DA problems with more than two groups because the relationship between the dependent and independent variables might not be linear. Even if we could make the relationship linear with some appropriate coding for the dependent variable, it can be very difficult to discern what this coding should be if the problem involves more than one independent variable.

Multiple Discriminant Analysis:
To distinguish the above approach from the two-group regression approach, we refer to it as multiple discriminant analysis (MDA). Although the MDA approach is different from the regression technique for the two-group example, the basic idea is the same. Note that most of the observations in each group cluster around the group's centroid. If we don't already know to which group an observation belongs, we could classify the observation based on its proximity to the centroids. That is, to determine to which group a new observation belongs, we could plot it on graph and visually judge the distance between it and each of the centroids. A logical classification rule is to assign observations to the group represented by the closest centroid. This is the approach taken in MDA, although at a somewhat more sophisticated level.

Distance Measures:
The problem with the visual approach is that some observations might be "too close to call," appearing to be equally close to several centroids. We need a more formal measure of the distance between each observation and each centroid. You might recall from high school algebra that the Euclidean (straight-line) distance between two points (A1, B1) and (A2, B2) in two dimensions can be measured by:

Distance = [(A1 - A2)^2 + (B1 - B2)^2]^1/2

We can use this formula in MDA to measure the distance from a given observation to the centroid of each group, and then assign the observation to the group it is closest to. From a statistical viewpoint, this distance measure in above equation is somewhat weak because it ignores the variances of the independent variables.

If we let Dij represent the distance from observation i to the centroid for group j, we can define this distance as:

Dij = [Sum(Xik - Xjk)^2/S^2jk]^1/2

Where Xik represents the value of observation i on the kth independent variable, Xjk represents the average value of group j on the kth independent variable, and S^2jk represents the sample variance of group j on the kth independent variable.

MDA Classification:
There are numerous variations on the distance measure in above equation. One of the most popular variations - known as the Mahalanobis distance measure refines the calculation in above equation to account for differences in the covariances between the independent variables. An add-in called DA.xla was written to make it easy to perform MDA in Excel using Mahalanobis distance measures. This add-in calculates the Mahalanobis distance from each observation to the centroid for each of the possible groups. It then assigns each observations to the group it is closest to based on these distance measures.

Installing and Using the DA.xla Add-in:
To use the DA.xla add-in, simply open the file called DA.xla found on your data disk. This automatically adds a new option labeled Discriminant Analysis to your Tools menu. This new option remains on the Tools menu until you exit Excel. If you want to make the DA.xla add-in a more permanent addition to your Tools menu:

Click the Tools menu
Select Add-Ins
Click the Browse button
Locate the file DA.xla and click OK
Click OK on the Add-Ins dialog box

The Discriminant Analysis option will now be listed on your Tools menu whenever you start Excel. You can eliminate this option by deselecting it in the Add-Ins dialog box.

Summary:
DA is a statistical technique that can be used to develop a rule for predicting in which of two or more groups an observation belongs. DA is similar to regression analysis in that it uses a set of independent variables to predict the value of a dependent variable. However, in DA the dependent variable is coded in a discrete, or categorical, manner to represent the difference groups under consideration.
When the problems involves only two groups, regression analysis can be used to develop a classification rule. However, when the problem involves more than two groups, the regression approach is no longer appropriate and the more general k-group approach to DA is required. The k-group DA approach can be used with problems involving two or more groups. DA is a powerful technique with a deep and rich theoretical basis.
====================================================

Time Series Forecasting (Part I):
A time series is a set of observations on a quantitative variable collected over time. For example, every night the evening news reports the closing value of the Dow Jones Industrial Average. These closing values represent a series of values for a quantitative variable over time-or a time series. Most businesses keep track of a number of time series variables. Examples might include daily, weekly, monthly, or quarterly figures on sales, costs, profits, inventory, back orders, customer counts, and so on.

Businesses often are interested in forecasting future values of a time series variable. For example, if we could accurately predict future closing values of the Dow Jones Industrial Average, we could become very wealthy investing in the stock market by "buying low and selling high." In constructing business plans, most companies make some attempt to forecast the expected levels of sales, costs, profits, inventory, back orders, customer counts, and so on. These types of forecasts often are required inputs to the other types of modeling techniques.

If we don't know which causal independent variables are influencing a particular time series variable, we cannot build a regression model. And even if we do have some idea about which causal variables are affecting a time series, there might not be any data available for those variables. Finally, even if the estimated regression function fits the data well, we might have to forecast the values of the causal independent variables in order to estimate future values of the dependent (time series) variable. Forecasting the causal independent variables might be more difficult than forecasting the original time series variable.

On the Importance of Forecasting:

You do not plan to ship goods across the ocean, or to assemble merchandise for sale, or to borrow money without first trying to determine what the future may hold in store. Ensuring that the materials you order are delivered on time, seeing to it that the items you plan to sell are produced on schedule, and getting your sales facilities in place all must be planned before that moment when the customers show up and lay their money on the counter. The successful business executive is a forecaster first: purchasing, producing, marketing, pricing, and organizing all follow.

Time Series Methods:
In many business planning situations, it is difficult, undesirable, or even impossible to forecast time series data using a causal regression model. However, if we can discover some sort of systematic variation in the past behavior of the time series variable, we can attempt to construct a model of this behavior to help us forecast its future behavior. For example, we might find a long-term upward (or downward) trend in the time series that might be expected to continue in the future. Or, we might discover some predictable seasonal fluctuations in the data that could help us make estimates about the future. As you may have surmised, time series forecasting is based largely on the maxim that history tends to repeat itself.
Techniques that analyze the past behavior of a time series variable to predict the future are sometimes referred to as extrapolation models. The general form of an extrapolation model is:

Yt+1 = f(Yt, Yt-1, Yt-2, ...)

where Yt+1 represents the predicted value for the time series variable in time period t+1, Yt represents the actual value of the time series variable in time period t, Yt-1 represents the actual value of the time series variable in time period t-1, and so on. The goal of an extrapolation model is to identify a function f( ) for above equation that produces accurate forecasts of future values of the time series variable. Different time series styles are explained as follows:

There are several techniques that are appropriate for stationary time series, where there is no significant upward or downward trend in the data over time
There are also techniques for handling non-stationary time series, where there is some upward or downward trend in the data over time
There are also techniques for modeling seasonal patterns in both stationary and non-stationary time series data

Measuring accuracy:

Many methods are available for modeling time series data. In most cases, it is impossible to know in advance which method will be the most effective for a given set of data. Thus, a common approach to time series analysis involves trying several modeling techniques on a given data set and evaluating how well they explain the past behavior of the time series variable. We can evaluate these techniques by constructing line graphs that show the actual data versus the values predicted by the various modeling techniques. More formal quantitative measures of the accuracy (or "goodness of fit") of time series modeling techniques also exist. Four common accuracy measures are the mean absolute deviation (MAD), the mean absolute percent error (MAPE), the mean square error (MSE), and the root mean square error (RMSE).

These quantities measure the differences between the actual values in the time series and the predicted, or fitted, values generated by the forecasting technique. The MSE and RMSE measures are closely related to the sum of square estimation errors criterion.

After collecting the data for a time series variable, the next step in building a time series model is to inspect the data plotted over time.

Moving Averages:

The moving average technique is probably the easiest extrapolation method for stationary data to use and understand. With this technique, the predicted value of the time series in period t+1 (denoted by Yt+1) is simply the average of the k previous observations in the series; that is:

Yt+1 = (Yt + Yt-1 + Yt-k+1)/k

Creating a Line Graph:
To create a scatter plot follow the below procedure:

Select the relevant data cells
Click the Insert menu
Click Chart
Click XY (Scatter)

Excel's Chart Wizard then prompts you to make a number of selections concerning how the chart should be labeled and formatted. After Excel creates a basic chart, you can customize it in many ways. Double-clicking a chart element displays a dialog box with options for modifying the appearance of the element.

The value k in above equation determines how many previous observations will be included in the moving average. No general method exists for determining what value of k will be best for a particular time series. Therefore, we must try several values of k to see which gives the best results.

We generate the moving average forecasts using the AVERAGE( ) function.

The moving average technique tends to average out the peaks and valleys occurring in the original data. Thus, the moving average technique is sometimes referred to as a smoothing method. The larger the value of k (or the more past data points are averaged together), the smoother the moving average prediction will be.
We can evaluate the relative accuracy of the two moving average forecasting functions by comparing the MSE values for these two techniques (2-Month Moving Avg and 4-Month Moving Avg).

Note that the SUMXMY2( ) function calculates the sum of squared differences between corresponding values in two different ranges. The COUNT( ) function returns the number of values in a range. The MSE value describes the overall fit of the forecasting technique to the historical data.

Because we want to forecast future observations, we might be interested in how well the forecasting function performed on the most recent data. We can determine this by calculating other MSE values using only the most recent data. Note, however, that there is no guarantee that the forecasting technique that has been most accurate recently will continue to be most accurate in the future.

Weighted Moving Averages:
One drawback of the moving average technique is that all the past data used in calculating the average are weighted equally. We can often obtain a more accurate forecast by assigning weights to the data. The weighted moving average technique is a simple variation on the moving average technique that allows for weights to be assigned to the data being averaged. In the weighted moving average technique, the forecasting function is representing by:

Yt+1 = w1Yt + w2Yt-1 + . . . + wkYt-k+1

where 0<=wi<=1 and Sum(wi)= 1

Although the weighted moving average offers greater flexibility than the moving average, it is also a bit more complicated. In addition to determining a value for k, we must also determine values for the weights wi in above equation. However, for a given value of k, we can use Solver to determine the values for wi that minimize the MSE.

Exponential Smoothing:
Exponential Smoothing is another averaging technique for stationary data that allows weights to be assigned to past data. Exponential smoothing models assume the following form: (a = alpha)

Yt+1 = Yt + a(Yt - yt)

The equation indicates that the predicted value for time period t+1 (Yt+1) is equal to the predicted value for the previous period (yt) plus an adjustment for the error made in predicting the previous period's value (a(Yt - yt)). The parameter a in above equation can assume any value between 0 and 1.

It can be shown that the exponential smoothing formula in above equation is equivalent to:

Yt+1 = aYt + a(1 - a)^2Yt-2 + ... + a(1 - a)^nYt-n + . . .

The forecast Yt+1 in exponential smoothing is a weighted combination of all previous values in the time series where the most recent observation Yt receives the heaviest weight (a), the next most recent observation Yt-1 receives the next heaviest weight (a(1 - a)), and so on. In an exponential smoothing model, small values of alpha tend to produce sluggish forecasts that do not react quickly to changes in the data. A value of alpha near 1 produces a forecast that reacts more quickly to changes in the data.
We can use Solver to determine the optimal value for alpha when building an exponential smoothing forecasting model for a particular data set. So when using exponential smoothing, the forecast for all future time periods equal the same value. This is consistent with the underlying idea of a stationary time series. If a time series is stationary (or has no trend), it is reasonable to assume that the forecast of the next time period and all future time periods should equal the same value.
We can use the Solver parameters and options to identify the value for alpha that minimizes the MSE. Again, this is a nonlinear optimization problem because the MSE represents a nonlinear objective function.

A Word of Caution About Forecasting:
Although we can use the moving average or exponential smoothing technique to forecast a value for any future time period, as the forecast horizon lengthens, our confidence in the accuracy of the forecast diminishes because there is no guarantee that the historical patterns on which the model is based will continue indefinitely into the future.

Seasonality:
Many times series variables exhibit seasonality, or a regular, repeating pattern in the data. For example, in time series data for monthly fuel oil sales, we would expect to see regular jumps in the data during the winter months each year. Similarly, monthly or quarterly sales data for suntan lotion would likely show consistent peaks during the summer and valleys during the winter.
Two different types of seasonal effects are common in time series data: additive effects and multiplicative effects. Additive seasonal effects tend to be on the same order of magnitude each time a given season is encountered. Multiplicative seasonal effects tend to have an increasing effect each time a given season is encountered.

Stationary data with additive seasonal effects:
The following model is useful for modeling stationary time series data with additive seasonal effects.

Yt+n = Et + St+n-p

Et = alpha(Yt - St-p) + (1 - alpha)Et-1

St = beta(Yt - Et) + (1 - beta)St-p
(0<=alpha<=1) and (0<=beta<=1)

In this model, Et represents the expected level of the time series in period t, and St represents the seasonal factor for period t. The constant p represents the number of seasonal periods in the data. Thus, for quarterly data p=4 and for monthly data p=12. We can use Solver parameters to determine the values of alpha and beta that minimize the MSE for the problem.

A note on initializing forecasting models:
It is important to note that the other methods can be used to initialize the base level (Et) and seasonality (St) values used in the previous model. For instance, we could have used Solver to determine optimal (minimum MSE) values for the level and seasonality parameters along with the smoothing constants alpha and beta. However, even if Solver is used to determine "optimal" initial values, there is no guarantee that the resulting forecasts will be any more accurate than if the initial values were determined using an alternative technique. When the data set being modeled is large, minor differences in the initial values are likely to have little impact on your forecasts. But as the size of the data set decreases, the impact of difference in the initial values becomes more pronounced.

Stationary data with multiplicative seasonal effects:
A slight modification to the previous model makes it appropriate for modeling stationary time series data with multiplicative seasonal effects. In particular, the forecasting function becomes:

Yt+n = Et x St+n-p
Et = alpha(Yt/St-p) + (1 - alpha)Et-1
St = beta(Yt/Et) + (1 - beta)St-p
alpha and beta between 0 and 1.

In this model, Et again represents the expected level of the time series in period t, and St represents the seasonal factor for period t. The constant p represents the number of seasonal periods in the data.

Trend Models:
Stationary time series data in which there is no significant trend in the data over time. However, it is not unusual for time series data to exhibit some type of upward or downward trend over time. Trend is the long-term sweep or general direction of movement in a time series. It reflects the net influence of long-term factors that affects the time series in a fairly consistent and gradual way over time. In other words, the trend reflects changes in the data that occur with the passage of time. Because the moving average, weighted moving average, and exponential smoothing techniques use some average of the previous values to forecast future values, they consistently underestimate the actual values if there is an upward trend in the data. For example, consider the time series data given by 2, 4, 6, 8, 10, 12, 14, 16, and 19. These data shows a clear upward trend leading us to expect that the next value in the time series should be 20. However, the forecasting techniques discussed up to this point would forecast that the next value in the series is less than or equal to 18 because no weighted average of the given data could exceed 18.

Double moving average:
As its name implies, the double moving average technique involves taking the average of averages. Let Mt be the moving average for the past k time periods (including t):

Mt = (Yt + Yt-1 + ... + Yt-k+1)/k

The double moving average Dt for the last k time periods (including period t) is the average of the moving averages:

Dt = (Mt + Mt-1 + ... + Mt-k+1)/k

The double moving average forecasting function is then given by:

Yt+n = Et + nTt
Et = 2Mt - Dt

Tt = 2(Mt - Dt)/(k-1)

Note that Et represents the estimated level of the time series at period t and Tt represents the estimated trend. Thus, at period t, the forecast n periods into the future would be Et + nTt as indicated in above equation.
=====================================================

Time Series Forecasting (Part II)

Double Exponential smoothing (Holt's Method):

Double exponential smoothing (also known as Holt's method) is often an effective forecasting tool for time series data that exhibit a linear trend. After observing the value of the time series at period t (Yt), Holt's method computes an estimate of the base, or expected, level of the time series (Et), and the expected rate of increase or decrease (trend) per period (Tt). The forecasting function in Holt's method is represented by:

Yt+n = Et + nTt (equation 1)

Et = alpha.Yt + (1 - alpha)(Et-1 + Tt-1) (equation 2)

Tt = beta.(Et - Et-1) + (1 - beta)Tt-1 (equation 3)

We can use the forecasting function in above equation to obtain forecasts n time periods into the future where n = 1, 2, 3, and so on. The forecast for time period t+n (or Yt+n) is the base level at time period t (given by Et) plus the expected influence of the trend during the next n time periods (given by nTt).

The smoothing parameters alpha and beta in above equation can assume any value between 0 and 1.

Although Holt's method might appear to be more complicated, it is a simple three-step process:

Compute the base level Et for time period t using equation 2
Compute the expected trend value Tt for time period t using equation 3
Compute the final forecast Yt+n for time period t+n using equation 1

Holt-Winter's method for additive seasonal effects:

In addition to having an upward or downward trend, non-stationary data may also exhibit seasonal effects. Here again, the seasonal effects may be additive or multiplicative in nature. Holt-Winter's method is another forecasting technique that we can apply to time series exhibiting trend and seasonality. To determine Holt-Winter's method for additive seasonal effects, let p represent the number of seasons in the time series (for quarterly data, p=4; for monthly data, p=12). The forecasting function is then given by:

Yt+n = Et + nTt + St+n-p

Et = alpha(Yt - St-p) + (1 - alpha)(Et-1 + Tt-1)

Tt = beta.(Et - Et-1) + (1 - beta)Tt-1

St = gama(Yt - Et) + (1 - gama) St-p

The expected base level of the time series in time period t (Et) is updated in second equation, which takes a weighted average of the following two values:

Et-1 + Tt-1, which represents the expected base level of the time series at time period t before observing the actual value at time period t (given by Yt)
Yt - St-p, which represents the deseasonalized estimate of the base level of the time series at time period t after observing Yt

The estimated seasonal adjustment factor for each time period is calculated using fourth equation, which takes a weighted average of the following two quantities:

St-p, which represents the most recent seasonal index for the season in which time period t occurs
Yt - Et, which represents an estimate of the seasonality associated with time period t after observing Yt

Holt-Winter's method is basically a four-step process:

Compute the base level Et for time period t using second equation
Compute the estimated trend value Tt for time period t using third equation
Compute the estimated seasonal factor St for time period t using fourth equation
Compute the final forecast Yt+n for time period t+n using first equation

Holt-Winter's method for multiplicative seasonal effects:

Seasonal effects in the data may be becoming more pronounced over time. As a result, it may be more appropriate to model this data with the Holt-Winter's method for multiplicative seasonal effects. Fortunately, this technique is very similar to Holt-Winter's method for additive seasonal effects.

To determine Holt-Winter's method for multiplicative seasonal effects, we again let p represent the number of seasons in the time series (for quarterly data, p=4; for monthly data, p=12). The forecasting function is then given by:

Yt+n = (Et + nTt)St+n-p

Et = alpha(Yt/St-p) + (1 - alpha)(Et-1 + Tt-1)

Tt = beta.(Et - Et-1) + (1 - beta)Tt-1

St = gama(Yt/Et) + (1 - gama) St-p

The expected base level of the time series in time period t (Et) is updated in second equation, which takes a weighted average of the following two values:

(Et-1 + Tt-1), which represents the expected base level of the time series at time period t before observing the actual value at time period t (given by Yt)
(Yt/St-p), which represents the deseasonalized estimate of the base level of the time series at time period t after observing Yt

The estimated seasonal adjustment factor for each time period is calculated using fourth equation:

St-p, which represents the most recent seasonal index for the season in which time period t occurs
(Yt/Et), which represents an estimate of the seasonality associated with time period t after observing Yt

Modeling time series trends using regression:
We can build a regression model of a time series if data are available for one or more independent variables that account for the systematic movements in the time series. However, even if no independent variables have a causal relationship with the time series, some independent variables might have a predictive relationship with the time series. A predictor variable does not have a cause-and-effect relationship with the time series. Yet the behavior of a predictor variable might be correlated with that of the time series in a way that helps us forecast future values of the time series.
Trend is the long-term sweep or general direction of movement in a time series that reflects changes in the data over time. The mere passage of time does not cause the trend in the time series. But like the consistent passage of time, the trend of a time series reflects the steady upward or downward movement in the general direction of the series. Thus, time itself might represent a predictor variable that could be useful in accounting for the trend in a time series.

Linear trend model:
To see how we might use time as an independent variable, consider the following linear regression model:

Yt = B0 + B1X1t + et (equation 1)

where X1t = t. That is, the independent variable X1i represents the time period t (X11 = 1, X12 = 2, X13 = 3, and so on).
The regression model in above equation assumes that the systematic variation in the time series (Yt) can be described by the regression function B0 + B1X1t (which is a linear function of time). The error term et represents the unsystematic, or random, variation in the time series not accounted for by our model.
Thus, if we use ordinary least squares to estimate the parameters in above equation, our best estimate of Yt for any time period t is:

Yt = b0 + b1X1t

The above equation represents the equation of the line passing through the time series that minimizes the sum of squared differences between the actual values (Yt) and the estimated values (Yt). We might interpret this line to represent the linear trend in the data.

As the forecast horizon lengthens, our confidence in the accuracy of the forecasts diminishes because there is no guarantee that the historical trends on which the model is based will continue indefinitely into the future.

A note on the TREND( ) Function:
The TREND( ) function can be used to calculate the estimated values for linear regression models. The format of the TREND( ) function is as follows:

TREND(Y-range, X-range, X-value for prediction)

where Y-range is the in the spreadsheet containing the dependent Y variable, X-range is the range in the spreadsheet containing the independent X variable(s), and X-value for prediction is a cell (or cells) containing the values for the independent X variable(s) for which we want an estimated value of Y. The TREND( ) function has an advantage over the regression tool in that it is dynamically updated whenever any inputs to the function change. However, it does not provide the statistical information provided by the regression tool. It is best to use these two different approaches to doing regression in conjunction with one another.

Quadratic Trend Model:
Although the graph of the estimated linear regression function accounts for the upward trend in the data, the actual values do not appear to be scattered randomly around the trend line, as was assumed by our regression model in equation 1. An observation is more likely to be substantially below the line or only slightly above the line. This suggests that the linear trend model might not be appropriate for this kind of data.
As an alternative, we might try fitting a curved trend line to the data using the following quadratic model:

Yt = B0 + B1X1t + B2X2t + et

where X1t = t and X2t = t^2. The resulting estimated regression function for this model is:

Yt = b0 + b1X1t + b2 X2t

To estimate the quadratic trend function, we must add a column to the spreadsheet to represents the additional independent variable X2t = t^2.
As the forecast horizon lengthens, our confidence in the accuracy of the forecasts diminishes because there is no guarantee that the historical trends on which the model is based will continue indefinitely into the future.

Modeling seasonality with regression models:
The goal of any forecasting procedure is to develop a model that accounts for as much of the systematic variation in the past behavior of a time series as possible. The assumption is that a model that accurately explains what happened in the past will be useful in predicting what will happen in the future. Forecasts of future values for this time series would be more accurate if they reflected these systematic seasonal effects.

Adjusting trend predictions with seasonal indices:
A simple and effective way of modeling multiplicative seasonal effects in a time series is to develop seasonal indices that reflect the average percentage by which observations in each season differ projected trend values.
Thus, if we can determine seasonal indices representing the average amount by which the observations in a given quarter fall above or below the trend line, we could multiply our trend projections by these amounts and increase the accuracy of our forecasts.

Computing seasonal indices:
The goal in developing seasonal indices is to determine the average percentages by which observations in each season differ from the values projected for them using the trend model.

Forecasting with seasonal indices:
We can use the seasonal indices to adjust trend projections of future time periods for the expected effects of seasonality.

Summary of the calculation and use of seasonal indices:

Create a trend model and calculate the estimated value (yt) for each observation in the sample
For each observation, calculate the ratio of the actual value to the predicted trend value: Yt/yt. (For additive seasonal effects, compute the difference: Yt - yt.)
For each season, compute the average of the values calculated in step 2. These are the seasonal indices
Multiply any forecast produced by the trend model by the appropriate seasonal index calculated in step 3. (For additive seasonal effects, add the appropriate seasonal index to the forecast.)

Seasonal regression models:
An indicator variable is a binary variable that assumes a value of 0 or 1 to indicate whether or not a certain condition is true. To model additive seasonal effects in a time series, we might set up several indicator variables to indicate which season each observation represents. In general, if there are p seasons, we need p-1 indicator variables in our model. For example, data are collected on a quarterly basis. Because there are four seasons to model (p=4), we need three indicator variables, which we define as:

X3t = 1, if Yt is from quarter 1 - 0, otherwise

X4t = 1, if Yt is from quarter 2 - 0, otherwise

X5t = 1, if Yt is from quarter 3 - 0, otherwise

Notice that the definitions of X3t, X4t, and X5t assign a unique coding for the variables to each quarter in our data. These codings are summarized as:

Quarter 1: (X3t = 1, X4t = 0, X5t = 0)
Quarter 2: (X3t = 0, X4t = 1, X5t = 0)
Quarter 3: (X3t = 0, X4t = 0, X5t = 1)
Quarter 4: (X3t = 0, X4t = 0, X5t = 0)

Together, the values of X3t, X4t, and X5t indicates in which quarter observation Yt occurs.

The seasonal model:

We might expect that the following regression function would be appropriate for the time series data:

Yt = B0 + B1X1t + B2X2t + B3X3t + B4X4t + B5X5t + et

where X1t = t and X2t = t^2. This regression model combines the variables that account for a quadratic trend in the data with additional indicator variables discussed earlier to account for any additive systematic seasonal differences.

Crystal Ball predictor:
Given the complexity of some of the forecasting techniques described earlier, you have probably thought it would be nice if there was an Excel add-in to assist in carrying out some of these time series techniques. Fortunately, there is an Excel add-in called Crystal Ball Predictor (or CB Predictor) that automatically implements many of the techniques described earlier. CB Predictor is part of the Crystal Ball 2000 suite of Excel add-ins. Crystal Ball 2000 is a suite of Excel add-ins created and distributed by Decisioneering, Inc.

Using CB Predictor:
When Crystal Ball 2000 is installed and loaded, it inserts several new menu options in Excel. CB Predictor can be launched from the new CBTools menu option. CB Predictor provides a very easy-to-use interface that walks you through the process of finding an appropriate time series model for a given set of data.
The first dialog card, prompts you to enter the range in the spreadsheet containing the time series data you want to analyze. The next dialog card in CB Predictor allows you to indicate the type of data you have (e.g., hourly, daily, weekly, monthly, quarterly) and type of seasonality that might be present in the data. Note that if you select the "no seasonality" option, CB Predictor will not attempt to use any of the seasonal techniques to model the data.
The next dialog card in CB Predictor presents the "Method Gallery." This dialog card shows examples of the various time series techniques available in CB Predictor and allows you to select the ones you want to try. Descriptions of each method can be displayed by double-clicking on any of the example graphs on this card.
If you click the Preview button, CB Predictor fits all selected techniques to your data and determines which technique is best. (By default, CB Predictor uses the RMSE statistic to determine which technique is best.
It then displays the graph showing the actual data vs. fitted values for the best technique. You can also easily show the results for any of the other techniques simply by changing the selection in the "Method" drop-down list box. Finally, the Results dialog card allows you to indicate the number of periods into the future that you want to forecast. Because any forecast is uncertain, CB Predictor also gives you the option to select confidence intervals for the predictions to create a lower and upper bounds on the forecasts likely accuracy in the future. Various options are also given to indicate where the forecast should be written in your worksheet. CB Predictor will also produce various reports and charts summarizing the results of the analysis. Clicking the Run button causes CB Predictor to carry out the indicated actions.
The Methods table provides a summary of the relative performance of the difference time series technique on the given data set.
The Durbin-Watson (DW) statistic describes the amount of auto-correlation in a time series. Auto-correlation refers to the degree to which a previous data point in a time series is correlated to the next data point. The DW statistic ranges in value from 0 to 4. A DW values less than 1 indicates positive auto-correlation, whereas DW values greater than 3 indicate negative auto-correlation. DW values near 2 indicate no significant auto-correlation in the data.

Thiel's U statistic indicates how well each forecasting method does compared with a naive forecast (such as using the last period's value as the forecast of the next period). Thiel's U will equal 1 if a forecasting technique is essentially no better than using a naive forecast. Thiel's U values less than 1 indicate a technique is better than using a naive forecast. Values greater than 1 indicate a technique is actually worse than using a naive forecast.

Combining Forecasts:
Given the number and variety of forecasting techniques available, it can be a challenge to settle on a single method to use in predicting future values of a time series variable. Indeed, the state-of-the-art research in time series forecasting suggests that we should not use a single forecasting method. Rather, we can obtain more accurate forecasts by combining the forecasts from several methods into a composite forecast.
For example, suppose that we used three methods to build forecasting models of the same time series variable. We denote the predicted value for time period t using each of these methods as F1t, F2t, and F3t, respectively. One simple approach to combining these forecasts into a composite forecast Yt might involve taking a linear combination of the individual forecasts as:

Yt = b0 + b1F1t + b2F2t + b3F3t

We could determine the values for the bi using Solver or least squares regression to minimize the MSE between the combined forecast Yt and the actual data.
The combined forecast Yt in above equation will be at least as accurate as any of the individual forecasting techniques. To see this, suppose that F1t is the most accurate of the individual forecasting techniques. If b1 = 1 and b0 = b2 = b3 = 0, then our combined forecast would be Yt = F1t. Thus, b0, b2, and b3 would be assigned nonzero values only if this helps to reduce the MSE and produce more accurate predictions.

Adding independent variables to a regression model can never decrease the value of the R^2 statistic. Therefore, it is important to ensure that each independent variable in a multiple regression model accounts for a significant portion of the variation in the dependent variable and does not simply inflate the value of R^2. Similarly, combining forecasts can never increase the value of the MSE. Thus, when combining forecasts, we must ensure that each forecasting technique plays a significant role in accounting for the behavior of the dependent time series variable. The adjusted-R^2 statistic can also be applied to the problem of selecting forecasting techniques to combine in time series analysis.

Summary:
Because time series vary in nature (for example, with and without trend, with and without seasonality), it is helpful to be aware of the different forecasting techniques and the types of problems for which they are intended. There are many other time series modeling techniques besides those discussed earlier. In modeling time series data, it is often useful to try several techniques and then compare them based on measures of forecast accuracy, including a graphical inspection of how well the model fits the historical data. Crystal Ball Predictor is an Excel add-in for time series forecasting that makes it very easy to apply a number of time series techniques to a data set and compare their accuracy. If no one procedure is clearly better than the others, it might be wise to combine the forecasts from the different procedures using a weighted average or some other method.
====================================================

Introduction to Simulation Using Crystal Ball (Part I):
The calculations in a spreadsheet can be viewed as a mathematical model that defines a functional relationship between various input variables (or independent variables) and one or more bottom-line performance measures (or dependent variables). The following equation express this relationship:

Y = f(X1, X2, X3, . . ., Xk)

In many spreadsheets, the values of various input cells are determined by the person using the spreadsheet. These inputs cells correspond to the independent variables X1, X2, . . ., Xk in the previous equation. Various formulas (represented by f( ) above) are entered in other cells of the spreadsheet to transform the values of the input cells into some bottom-line output (denoted by Y above).

Simulation is a technique that is helpful in analyzing models in which the value to be assumed by one or more independent variables is uncertain.

Random Variables an Risk:

In order to compute a value for the bottom-line performance measure of a spreadsheet model, each input cell must be assigned a specific value so that all the related calculations can be performed. However, some uncertainty often exists regarding the value that should be assumed by one or more independent variables (or input cells) in the spreadsheet. This is particularly true in spreadsheet models that represent future conditions. A random variables is any variable whose value cannot be predicted or set with certainty. Thus, many input variables in a spreadsheet model represent random variables whose actual values cannot be predicted with certainty.

For example, projections of the cost of raw materials, future interest rates, future numbers of employees, and expected product demand are random variables because their true values are unknown and will be determined in the future.

If we cannot say with certainty what value one or more input variables in a model will assume, we also cannot say with certainty what value the dependent variable will assume. This uncertainty associated with the value of the dependent variable introduces an element of risk to the decision-making problem.

When such a decision is made, some chance exists that the decision will not produce the intended results. This chance, or uncertainty, represents an element of risk in the decision-making problem.
The term "risk" also implies the potential for loss. The fact that a decision's outcome is uncertain does not mean that the decision is particularly risky.

For example, whenever, we put money into a soft drink machine, there is a chance that the machine will take our money and not deliver the product. However, most of us would not consider this risk to be particularly great. From past experiance, we know that the chance of not receiving the product is small. But even if the machine takes our money and does not deliver the product, most of us would not consider this to be a tremendous loss. Thus, the amount of risk involved in a given decision-making situation is a function of the uncertainty in the outcome of the decision and the magnitude of the potential loss. A proper assessment of the risk present in a decision-making situation should address both of these issues.

Why analyze Risk?

Many spreadsheets built by business people contain estimated values for the uncertain inpuut variables in their models. If a manager cannot say with certainty what value a particular cell in a spreadsheet will assume, this cell most likely represents a random variable. Ordinarily, the manager will attempt to make an informed guess about the values such cells will assume. The manager hopes that inserting the expected, or most likely, values for all the uncertain cells in a spreadsheet will provide the most likely value for the cell containing the bottom-line performance measure (Y). The bottom with this type of analysis is that it tells the decision maker nothing about the variability of the performance measure.

For example, in analyzing a particular investment opportunity, we might determine that the expected return on a $1,000 investment is $10,000 within two years. But how much variability exists in the possible outcomes? If all the potential outcomes are scattered closely around $10,000 (say from $9,000 to $11,000), then the investment opportunity might still be attractive. If, on the other hand, the potential outcomes are scattered widely around (say from -$30,000 up to +$50,000), then the investment opportunity might be unattractive. Although these two scenarios might have the same expected or average value, the risks involved are quite different. Thus, even if we can determine the expected outcome of a decision using a spreadsheet, it is just as important, if not more so, to consider the risk involved in the decision.

Methods of Risk Analysis:

Several techniques are available to help managers analyze risk. Three of the most common are:

Best-case/worst-case analysis
What-if analysis
Simulation

Of these methods, simulation is the most powerful. Although the other techniques might not be completely effective in risk analysis, they are probably used more often than simulation by most managers in business today. This is largely due to the fact that most managers are unaware of the spreadsheet's ability to perform simulation and of the benefits provided by this technique.

Best-Case/Worst-Case Analysis:

If we don't know what value a particular cell in a spreadsheet will assume, we could enter a number that we think is the most likely value for the uncertain cell. If we enter such numbers for all the uncertain cells in the spreadsheet, we can easily calculate the most likely value of the bottom-line performance measure. This is also called the base-case scenario. However, this scenario gives us no information about how far away the actual outcome might be from this expected, or most likely, value.

One simple solution to this problem is to calculate the value of the bottom-line performance measure using the best-case, or most optimistic, and worst-case, or most pessimistic, values for the uncertain input cells.
These additional scenarios show the range of possible values that might be assumed by the bottom-line performance measure. As indicated in the earlier example about the $1,000 investment, knowing the range of possible outcomes is very helpful in assessing the risk involved in different alternatives. However, simply knowing the best-case and worst-case outcomes tells us nothing about the distribution of possible values within this range, nor does it tell us the probability of either scenario occurring.

The appeal of best-case/worst-case analysis is that it is easy to do. Its weakness is that it tells us nothing about the shape of the distribution associated with the bottom-line performance measure. Knowing the shape of the distribution of the bottom-line performance measure can be critically important in helping us answer a number of managerial questions.

What-If Analysis:

Prior to the introduction of electronic spreadsheets in the early 1980s, the use of best-case/worst-case analysis was often the only feasible way for a manager to analyze the risk associated with a decision. This process was extremely time-consuming, error prone, and tedious, using only a piece of paper, pencil, and calculator to recalculate the performance measure of a model using different values for the uncertain inputs. The arrivl of personal computers and electronic spreadsheets made it much easier for a manager to play out a large number of scenarios in addition to the best and worst cases - which is the essence of what0if analysis.

In what-if analysis, a manager changes the values of the uncertain input variables to see what happens to the bottom-line performance measure. By making a series of such changes, a manager can gain some insight into how sensitive the performance measure is to changes to the input variables. Although many managers perform this type of manual what-if analysis, it has three major flows.

First, if the values selected for the independent variables are based only on the manager's judgement, the resulting sample values of the performance measure are likely to be biased.

To select values for the uncertain variables that correctly reflect their random variations, the values must be randomly selected from a distribution, or pool, of values that reflects the appropriate range of possible values, as well as the appropriate relative frequencies of these variables.

Second, hundreds or thousands of what-if scenarios might be required to create a valid representation of the underlying variability in the bottom-line performance measure. No one would want to perform these scenarios manually nor would anyone be able to make sense of the resulting stream of numbers that would flash on the screen.

The third problem with what-if analysis is that the insight the manager might gain from playing out various scenarios is of little value when recommending a decision to top management. What-if analysis simply does not supply the manager with the tangible evidence (facts and figures) needed to justify why a given decision was made or recommended. Additionally, what-if analysis does not address the problem identified in our earlier discussion of best-case/worst-case analysis - it does not allow us to estimate the distribution of the performance measure in a formal enough manner. Thus, what-if analysis is a step in the right direction, but it is not quite a large enough step to allow managers to analyze risk effectively in the decisions they face.

Simulation:

Simulation is a technique that measures and describes various characteristics of the bottom-line performance measure of a model when one or more values for the independent variables are uncertain. If any independent variables in a model are random variables, the dependent variable (Y) also represents a random variable. The objective in simulation is to describe the distribution and characteristics of the possible values of the bottom-line performance measure Y, given the possible values and behavior of the independent variables X1, X2, . . ., Xk.

The idea behind simulation is similar to the notion of playing out many what-if scenarios. The difference is that the process of assigning values to the cells in the spreadsheet that represent random variables is automated so that: (1) the values are assigned in a non-biased way, and (2) the spreadsheet user is relieved of the burden of determining these values. With simulation, we repeatedly and randomly generate sample values for each uncertain input variable (X1, X2, . . ., Xk) in our model and then compute the resulting value of our bottom-line performance measure (Y). We can then use the sample values of Y to estimate the true distribution and other characteristics of the performance measure Y.

For example, we can use the sample observations to construct a frequency distribution of the performance measure, to estimate the range of values over which the performance measure might vary, to estimate its mean and variance, and to estimate the probability that the actual value of the performance measure will be greater than (or less than) a particular value. All these measures provide greater insight into the risk associated with a given decision than a single value calculated based on the expected values for the uncertain independent variables.

On Uncertainty and Decision-Making:
"Uncertainty is the most difficult thing about decision-making. In the face of uncertainty, some people react with paralysis, or they do exhaustive research to avoid making a decision. The best decision-making happens when the mental environment is focused. In a physical environment, you focus on something physical. In tennis, that might be the spinning seams of the ball. In a mental environment, you focus on the facts at hand. That fined-tuned focus doesn't leave room for fears and doubts to enter. Doubts knock at the door of our consciousness, but you don't have to have them in for tea and crumpets." - Timothy Gallwey, author of The Inner Game of Tennis and The Inner Game of Work.

Spreadsheet simulation using Crystal Ball:

To perform simulation in a spreadsheet, we must first place a random number generator (RNG) formula in each cell that represents a random, or uncertain, independent variable. Each RNG provides a sample observation from an appropriate distribution that represents the range and frequency of possible values for the variable. Once the RNGs are in place, new sample values are provided automatically each time the spreadsheet is recalculated. We can recalculate the spreadsheet n times, where n is the desired number of replications or scenarios, and the value of the bottom-line performance measure will be stored after each replication. We can analyze these stored observations to gain insights into the behavior and characteristics of the performance measure.

In particular, the spreadsheet add-in package Crystal Ball is designed specifically to make spreadsheet simulation a simple process. The Crystal Ball software provides the following capabilities, which are not otherwise available while working in Excel: additional functions that are helpful in generating the random numbers needed in simulation; additional commands that are helpful in setting up and running the simulation; and graphical and statistical summaries of the simulation data.

Starting Crystal Ball:

You can load Crystal Ball from within Excel as follows:

Click Tools
Click Add-Ins
Click the Crystal Ball option
VClick OK

This loads the memory resident portion of Crystal Ball and causes the Cell, Run, and CBTools menus to appear on the main menu bar. You can display the custom toolbar in Excel as follows:

Click View
Click Toolbars
Check the Crystal Ball option
Click OK

Random Number Generators:

The first step in spreadsheet simulation is to place an RNG formula in each cell that contains an uncertain value. Each of these formulas will generate (or return) a number that represents a randomly selected value from a distribution, or pool, of values. The distributions that these samples are taken from should be representative of the underlying pool of values expected to occur in each uncertain cell.

Crystal Ball provides several functions that can be used to create the RNGs required for simulating a model. For example, if we think that the behavior of an uncertain cell could be modeled as a normally distributed random variable with a mean of 125 and standard deviation of 10, then we could enter the formula = CB.Normal(125,10) in this cell.

Similarly, a cell in our spreadsheet might have a 30% chance of assuming the value 10, a 50% chance of assuming the value 20, and a 20% chance of assuming the value 30. To model the behavior of this random variable using the CB.Custom. The arguments, or parameters, required by the RNG functions allow us to generate random numbers from distributions with a wide variety of shapes.

Software Note:

A listing of all the available Crystal Ball RNG functions is available in Excel. To view this listing:

Click Insert, Function
Select the Crystal Ball function category

Discrete vs. Continuous Random Variables:

An important distinction exists between the discrete RNG and continuous RNG. In particular, the RNGs generate discrete outcomes and also continuous outcomes. That is, some of the RNGs can return only a distinct set of individual values, whereas the other RNGs can return any value from an infinite set of values. The distinction between discrete and continuous random variables is very important.

For example, the number of defective tires on a new car is a discrete random variable because it can assume only one of five distinct values: 0, 1, 2, 3, or 4. On the other hand, the amount of fuel in a new car is a continuous random variable because it can assume any value between 0 and the maximum capacity of the fuel tank. Thus, when selecting an RNG for an uncertain variable in a model, it is important to consider whether the variable can assume discrete or continuous values.
=====================================================

Introduction to Simulation Using Crystal Ball (Part II):
Preparing the model for simulation:

We must first select appropriate RNGs for the uncertain variables in the model. If available, historical data on the uncertain variables could be analyzed to determine appropriate RNGs for these variables. (Crystal Ball's Batch Fit tool can be used for this purpose.) If past data are not available, or if we have some reason to expect the future behavior of a variable to be significantly different from the past, then we must use judgement in selecting appropriate RNGs to model the random behavior of the uncertain variables.

After entering the appropriate RNGs, each time we press the recalculate key (the function key [F9]), the RNGs automatically select new values for all the cells in the spreadsheet that represent uncertain (or random) variables. Similarly, with each recalculation, a new value for the bottom-line performance measure appears. By default, Crystal Ball displays the mean (or average) value of the distribution for cells containing RNG functions. To make Crystal Ball display new values for your RNG cells whenever you manually recalculate the spreadsheet, click Cell, Cell Preferences, and deselect the check box labeled "Set to Distribution Mean."

Alternate RNG Entry:

Crystal Ball also offers an alternate way of entering RNGs in spreadsheet models. If you select a worksheet cell that contains a value and then click the Define Assumption icon on the Crystal Ball toolbar (or click Cells, Define Assumption), the Distribution Gallery appears. After selecting one of the distributions, another dialog box appears prompting you to enter various parameters for the distribution you selected. Although this feature can be helpful in visualizing distributions, RNGs entered in this manner are stored internally in Crystal Ball rather than as functions in the individual cells. You should be aware that certain capabilities of Crystal Ball (e.g., the Sensitivity Chart and Correlation Matrix tool) require assumption cells to be defined using the Distribution Gallery.

Running the Simulation:

The next step in performing the simulation involves recalculating the spreadsheet several hundred or several thousand times and recording the resulting values generated for the output cell, or bottom-line performance measure. Fortunately, Crystal Ball can do this for us if we indicate how many times we want it to recalculate (or replicate) the model and which cell(s) in the spreadsheet we want to track. We can use the Define Forecast button on the Crystal Ball toolbar to indicate the output cell (or cells) that we want Crystal Ball to track during the simulation. In some simulations, we might want to analyze several performance measures. In such a case, we can follow the preceding procedure repeatedly to select additional output cells to track.

Software Note:

When you define assumption, decision, or forecast cells, Crystal Ball automatically changes the background color of the cells to distinguish these cells from others on the spreadsheet. You can change the default colors Crystal Ball uses (or suppress this behavior entirely) by clicking Cell followed by Cell Preferences.

Determining the Sample Size:

We cannot analyze all of the infinite possibilities. But by taking a large enough sample from this infinite population, we can make reasonably accurate estimates about the characteristics of the underlying infinite population of values. The larger the sample we take (that is, the more replications we do), the more accurate our final results will be. But, performing many replications takes time, so we must make a trade-off in terms of estimation accuracy versus convenience. Thus, there is no simple answer to the question of how many replications to perform, but, as a bare minimum you should always perform at least 100 replications, and more as time permits or accuracy demands.

Running the Simulation:

We now need to instruct Crystal Ball to perform the simulation by clicking the Start icon on the Crystal Ball toolbar (or by clicking Run, Run). If you select the "When Stopped" option, Crystal Ball will not update the screen after each replication and will perform the replication more quickly.

Data Analysis:

The objective of performing simulation is to estimate various characteristics of the performance measure resulting from uncertainty in some or all of the input variables. After performing the replications, Crystal Ball summarizes the output data. The summary statistics can be shown by clicking View, Statistics.

The best case and the worst case:

Decision makers usually want to know the best-case and worst-case scenarios to get an idea of the range of possible outcomes they might face. The information is available from the simulation results by the Range Minimum and Range Maximum values.

Viewing the distribution of the output cells:

The best- and worst-case scenarios are the most extreme outcomes, and might not be likely to occur. To determine the likelihood of these outcomes requires that we know somrthing about the shape of the distribution of our bottom-line perfromance measure. To view a graph of the distribution of the results of the 500 replications generated for the output cell in the model, click View, Frequency Chart.

Incorporating graphs and statistics into a spreadsheet:

At times, you will want to save some of the graphs or statistics created by Crystal Ball. You can do so by selecting the Create Report option from the Run menu. This option displays the dialog box (Create Report). This dialog box provides options for saving different results to Excel. Crystal Ball automatically saves all the selected graphs and statistics in a worksheet in a new workbook.
Similarly, you might want to save the actual replication data generated during the simulation to a separate file for further analysis. You can do so by selecting the Extract Data option from the Run menu. This option displays Extract Data dialog box. Here again, Crystal Ball automatically saves all the selected data items in a worksheet in a new workbook.

The Uncertainty of sampling:

Unfortunately, we do not have the time or computer resources to determine the true characteristics (or parameters) of the population. The best we can do is take a sample from the population and, based on the sample, make estimates about the true characteristics of the underlying population. Our estimates will differ depending on the sample we choose and the size of the sample. So, the mean of the sample we take is probably not equal to the true mean we would observe if we could analyze the entire population of values for the performance measure.

There is some element of uncertainty surrounding the statistical estimates resulting from simulation because we are using a sample to make inferences about the population. Fortunately, there are ways of measuring and describing the amount of uncertainty present in some of the estimates we make about the population under study. This is typically done by constructing confidence intervals for the population parameters being estimated.

The benefits of simulation:

The goal of modeling is to give us greater insight into a problem to help us make more informed decisions. The results of our simulation analysis do give us greater insight into the example problem. In particular, we now have some idea of the best- and worst-case total outcomes. We have a better idea of the distribution and variability of the possible outcomes, and a more precise idea about where the mean of the distribution is located. We also now have a way of determining how likely it is for the actual outcome to fall above or below some value. Thus, in addition to our greater insight and understanding of the problem, we also have solid empirical evidence (the facts and figures) to support our recommendations.
The behavior of a performance measure gives a manager a useful tool in determining the optimal value for one or more controllable parameters in a decision problem.

Applying simulation in personal financial planning:
A recent article in the Wall Street Journal highlighted the importance of simulation in evaluating the risk in personal financial investments. In the face of widely divergent possibilities, the number crunching involved in simulation can bring some peace of mind. Steven Haas of Randolph, N.J., was not sure whether he had saved enough to accept an early retirement package from AT@T at age 53. After his financial advisor used simulation to determine he had a 95% probability that his money would last until he reached age 110, Mr. Haas took the package and retired. Mr. Hass reported that by using simulation, "I found that even under significantly negative scenarios we could live comfortably. It relieved a lot of anxiety."

Reservation Management Problem:

Businesses that allow customers to make reservations for services (such as airlines, hotels, and car rental companies) know that some percentage of the reservations made will not be used for one reason or another, leaving these companies with a difficult decision problem. If they accept reservations for only the number of customers that can actually be served, then a portion of the company's assets will be underutilized when some customers with reservations fail to arrive. On the other hand, if they overbook (or accept more reservations than can be handled), then at time, more customers will arrive than can actually be served. This typically results in additional financial costs to the company and often generates ill-will among those customers who cannot be served. Simulation might be used to help a company determine the optimal number of reservations to accept.

Inventory Control Example:

According to the Wall Street Journal, U.S. businesses recently had a combined inventory worth $884.77 billion dollars. Because so much money is tied up in inventories, businesses face many important decisions regarding the management of these assets. Frequently asked questions regarding inventory include:

What's the best level of inventory for a business to maintain?
When should goods be reordered (or manufactured)?
How much safety stock should be held in inventory?

The study of inventory control principles is split into two distinct areas - one assumes that demand is known (or deterministic), and the other assumes that demand is random (or stochastic). If demand is known, various formulas can be derived that provide answers to the previous questions (EOQ model). However, when demand for a product is uncertain or random, answers to the previous questions cannot be expressed in terms of a simple formula. In these situations, the technique of simulation proves to be a useful tool.

Project Selection Example:

Solver can be used in project selection problems in which the payoff for each project is assumed to be known with certainty. In many cases, a great deal of uncertainty exists with respect to the ultimate payoff that will be received if a particular project is undertaken. In these situations, Crystal Ball and OptQuest offer an alternate means for deciding which project(s) to undertake.

Summary:

Many of the input cells in a spreadsheet represent random variables whose values cannot be determined with certainty. Any uncertainty in the input cells flows through the spreadsheet model to create a related uncertainty in the value of the output cell(s). Decisions made on the basis of these uncertain values involve some degree of risk.

Various methods of risk analysis are available, including best-case/worst-case analysis, what-if analysis, and simulation. Of these three methods, simulation is the only technique that provides hard evidence (facts and figures) that can be used objectively in making decisions. To simulate a model, RNGs are used to select representative values for each uncertain independent variable in the model. This process is repeated over and over to generate a sample of representative values for the dependent variable(s) in the model. The variability and distribution of the sample values for the dependent variable(s) can then be analyzed to gain insight into the possible outcomes that might occur. OptQuest also can be used in determining the optimal value of controllable parameters or decision variables in simulation models.
====================================================

Queuing Theory:
Sometimes, it seems as if we spend most of our lives waiting in lines. We wait in lines at grocery stores, banks, airports, hotels, restaurants, theaters, theme parks, post offices, and traffic lights. At home, we are likely to spend time waiting in an "electronic line" if we use the telephone to order merchandise from mail-order firms, or to call the customer service number of most computer hardware or software companies.

Some reports indicate that Americans spend 37 billions hours a year waiting in lines. Much of this time represents a loss of a limited resource (time) that can never be recovered. Add the frustration and irritation many people experience while waiting in lines and it is easy to see why businesses should be interested in reducing or eliminating the amount of time their customers spend waiting in lines.

Waiting lines do not always contain people. In a manufacturing company, for example, subassemblies often wait in a line at machining centers to have the next operation performed on them. At a video rental store, returned videos often wait to be placed on shelves so they can be rented again. Electronic messages on the Internet sometimes wait at intermediate computing centers before they are sent to their final destinations. Costs could be reduced, or customer service improved, by reducing the amount of time that the subassemblies, videos, or electronic messages spend waiting in line.

In management science terminology, the term queuing theory represents the body of knowledge dealing with waiting lines. Queuing theory was conceived in the early 1900s when a Danish telephone engineer named A. K. Erlang began studying the congestion and waiting times occurring in the completion of telephone calls. Since then, a number of quantitative models have been developed to help business people understand waiting lines and make better decisions about how to manage them.

The purpose of Queuing Models:
Most queuing problems focus on determining the level of service that a company should provide. For example, grocery stores must determine how many cash registers to operate at a given time of day so that customers do not have to wait too long to check out. Banks must determine how many tellers to schedule at various times of day to maintain an acceptable level of service. Companies that lease copying machines must determine the number of technicians to employ so that repairs can be made in a timely manner.

In many queuing problems, management has some control over the level of service provided. In the examples just mentioned, customer waiting times could be kept to a minimum by employing a large number of servers (in the form of cashiers, tellers, and technicians). However, this can be expensive, or actually wasteful, if an excessive number of idle servers is maintained. On the other hand, employing a small number of servers keeps the cost of providing service low, but is likely to result in longer customer waiting times and greater customer dissatisfaction. Thus, a trade-off exists between the cost of providing service and the cost of having dissatisfied customers if service is lacking.

As service levels increase, the cost of providing service also increases, but the cost of customer dissatisfaction decreases (as does the length of time customers must wait for service). As service levels decreases, the cost of providing service also decreases, but the cost of customer dissatisfaction increases. The objective in many queuing problems is to find the optimal service level that achieves an acceptable ba;ance between the cost of providing service and customer satisfaction.

Queuing system Configurations:

The queuing systems we encounter in everyday life are configured in a variety of ways. Typically, there are three configurations.

The first configuration represents a single-queue, single-server system. In this configuration, customers enter the system and wait in line on a first-in, first-out (FIFO) basis until they receive service; then they exit the system. This type of queuing system is employed at most Wendy's and Taco Bell restaurants. You might also encounter this type of queuing system at some automatic teller machines (ATMs).

The second configuration represents a single-queue, multi-server system. Here again, customers enter the system and join a FIFO queue. Upon reaching the front of the line, a customer is serviced by the next available server. There could be more or fewer servers depending on the problem at hand. This type of queuing system is found at most airport check-in counters, post offices, and banks.

The third configuration represents a collection of single-queue, single-server systems. In this type of arrangement, when customers arrive, they must choose one of the queues and then wait in that line to receive service. This type of system is found at most grocery stores and most Burger King and McDonald's restaurants. The results presented for the first type of configuration can sometimes be generalized to analyze the third configuration also.

Characteristics of Queuing systems:

To create and analyze mathematical models of the queuing configurations we must make some assumptions about the way in which customers arrive to the system and the amount of time it takes for them to receive service.

Arrival Rate:
In most queuing systems, customers (or jobs in a manufacturing environment) arrive in a somewhat random fashion. That is, the number of arrivals that occurs in a given time period represents a random variable. It is often appropriate to model the arrival process in a queuing system as a Poisson random variable. To use the Poisson probability distribution, we must specify a value for the arrival rate, denoted as LANDA, representing the average number of arrivals per time period.

If the number of arrivals in a given period of time follows a Poisson distribution with mean LANDA, it can be shown that the interarrival times follows an exponential probability distribution with mean 1/LANDA.

The exponential distribution plays a key role in queuing models. It is one of the few probability distributions that exhibits the memoryless (or lack of memory) property. An arrival process is memoryless if the time until the next arrival occurs does not depend on how much time has elapsed since the last arrival. The Russian mathematician Markov was the first to recognize the memoryless property of certain random variables. Therefore, the memoryless property is also sometimes referred to as the Markov or Markovian property.

It is assumed that all the queuing models follow a Poisson distribution (or, equivalently, that interarrival times follow an exponential distribution). To use these models, it is important to verify that this assumption is valid for the queuing system being modeled. One way to verify that arrivals can be approximated by the Poisson distribution is to collect data on the number of arrivals occurring per time period for several hours, days, or weeks. The average number of arrivals per time period can be calculated from these data and used as an estimate of LANDA. A histogram of the actual data can be constructed and compared to a histogram of the actual probabilities expected of a Poisson random variable with mean LANDA. If the histograms are similar, it is reasonable to assume that the arrival process is approximately Poisson. (Additional goodness-of-fit tests can be found in most texts on queuing and simulation.)

Service Rate:

A customer who arrives at a service facility spends some amount of time (possibly 0) waiting in line for service to begin. We refer to this time as queue time. Service time is the amount of time a customer spends at a service facility once the actual performance of service begins. (So service time does not include queue time.)

It is often appropriate to model the service times in a queuing system as an exponential random variable. To use the exponential probability distribution for this purpose, we must specify a value for the service rate, denoted by MIUO, representing the average number of customers (or jobs) that can be served per time period. The average service time per customer is 1/MIUO time periods ( and the variance of the service time per customer is (1/MIUO)^2 time periods). Because the exponential distribution is continuous, the probability of an exponential random variable equaling any specific value is zero. Thus, probabilities associated with an exponential random variable must be defined in terms of intervals.

One way to verify that the service rate can be modeled using the exponential distribution is to collect data on the service times occurring per time period for several hours, days, or weeks. The average number of customers serviced per time period can be calculated from these data and used as an estimate of the service rate MIUO. Using actual data, a relative frequency distribution of the service times falling within various intervals can be constructed and compared to the distribution of the actual probabilities expected for each interval for an exponential random variable with a service rate of MIUO. If the distributions are similar, it is reasonable to assume that the distribution of service times is approximately exponential.

Kendall Notation:

Given the variety of queuing models that exist, a system known as Kendall notation was developed to allow the key characteristics of a specific queuing model to be described in an efficient manner. With Kendall notation, simple queuing models can be described by three characteristics in the following general format:

1/2/3

The first characteristic identifies the nature of the arrival process using the following standard abbreviations:

M = Markovian interarrival times (following an exponential distribution)

D = Deterministic interarrival times (not random)

The second characteristic identifies the nature of the service times using the following standard abbreviations:

M = Markovian service times (following an exponential distribution)

G = General service times (following a nonexponential distribution)

D = Deterministic service times (not random)

Finally, the third characteristic indicates the number of servers available. So, using Kendall notation, an M/M/1 queue refers to a queuing model in which the time between arrivals follows an exponential distribution, the service times follow an exponential distribution, and there is one server. An M/G/3 queue refers to a model in which the interarrival times are assumed to be exponential, the service times follow some general distribution, and three servers are present. An expected version of Kendall notation involves specifying six (rather than three) queue characteristics.

Queuing Models:

Numerous queuing models are available to evaluate different combinations of arrival distributions, service time distributions, and other queuing characteristics.

The M/M/s Model:
The M/M/s model is appropriate for analyzing queuing problems where the following assumptions are met:

There are s servers, where s is a positive integer
Arrivals follow a Poisson distribution and occur at an average rate of LANDA per time period
Each server provides service at an average rate of MIUO per time period, and actual service times follow an exponential distribution
Arrivals wait in a single FIFO queue and are serviced by the first available server
LANDA<s.MIUO

The final assumption indicates that the total service capacity of the system, (s.MIUO), must be strictly greater than the rate at which arrivals occur LANDA. If the arrival rate exceeds the system's total service capacity, the system would fill up over time and the queue would become infinitely long.

The M/M/s model with finite queue length:

The results for the M/M/s models assume that the size or capacity of the waiting area is infinite, so that all arrivals to the system join the queue and wait for service. In some situations, however, the size or capacity of the waiting area might be restricted - in other words, there might be a finite queue length. A balk refers to an arrival that does not join the queue because the queue is full or too long.

The M/M/s model with finite population:
The previous queuing models assume that the customers (or calls) arriving at the queuing system come from a population of potential customers that is infinite, or extremely large. Under this assumption, the mean arrival rate, LANDA, remains constant regardless of the number of calls in the system.

In some queuing problems, however, the possible number of arriving customers id finite. In other words, these queuing models have a finite arrival (or calling) population. In such a model, the average arrival rate for the system changes depending on the number of customers in the queue. The M/M/s model for finite arrival populations is appropriate for analyzing queuing problems where the following assumptions are met:

There are s servers, where s is a positive integer
There are N potential customers in the arrival population
The arrival pattern of each customer follows a Poisson distribution with a mean arrival rate of LANDA per time period
Each server provides service at an average rate of MIUO per time period, and actual service times follow an exponential distribution
Arrivals wait in a single FIFO queue and are serviced by the first available server

Note that the average arrival rate for this model (LANDA) is defined in terms of the rate at which each customer arrives. One of the most common applications for the M/M/s model with a finite arrival population is the machine repair problem.

The M/G/1 Model:

All the models presented so far assume that service times follow an exponential distribution. Random service times from an exponential distribution can assume any positive value. However, in some situations, this assumption is unrealistic. For example, consider the time required to change the oil in a car at an auto service center. This service probably requires at least 10 minutes and might require up to 30, 45, or even 60 minutes, depending on the service being performed. The M/G/1 queuing model enables us to analyze queuing problems in which service times cannot be modeled accurately using an exponential distribution.
The M/G/1 queuing model is quite remarkable because it can be used to compute the operating characteristics for any one-server queuing system where arrivals follow a Poisson distribution and the mean MIUO and standard deviation of the service times are known.

The M/D/1 Model:
The M/G/1 model can be used when service times are random with known mean and standard deviation. However, service times might not be random in some queuing systems. For example, in a manufacturing environment, it is not unusual to have a queue of material or subassemblies waiting to be serviced by a certain machine. The machine time required to perform the service might be very predictable - such as exactly 10 seconds of machine time per piece. Similarly, an automatic car wash might spend exactly the same amount of time on each car it services. The M/D/1 model can be used in these types of situations in which the service times are deterministic (not random).
The results for an M/D/1 model can be obtained using the M/G/1 model by setting the standard deviation of the service time to 0. Setting standard deviation = 0 indicates that no variability exists in the service times and, therefore, the service time for each unit is equal to the average service time MIUO.

Simulating Queues and the Steady-State assumption:
Queuing theory is one of the oldest and most well-researched areas of management science. Discussions of other types of queuing models can be found in advanced texts on management science and in texts devoted solely to queuing theory. However, keep in mind that the technique of simulation can also be used to analyze virtually any queuing problem you might encounter. Indeed, not all queuing models have closed-form equations to describe their operating characteristics. So, simulation is often the only means available for analyzing complex queuing systems where customers balk (don't join a queue upon arrival), renege (leave a queue before being served), or jockey (switch from one queue to another).

At the beginning of each day, most queuing systems start in an "empty and idle" condition and go through a transient period as business activity gradually builds up to reach the normal, or steady-state, level of operation. The queuing models presented describe only the behavior of the system in its steady-state level of operation. A queuing system can have different levels of steady-state operations at different times throughout the day. For example, a restaurant might have one steady-state level of operation for breakfast, and different steady-state levels at lunch and dinner. So, before using the models, it is important to identify the arrival rate and service rate for the specific steady-state level of operation you want to study. If an analysis of the transient phase is needed or if you want to model the operation of the system across different steady-state levels, you should use simulation.

Summary:
Waiting lines, or queues, are a common occurrence in many types of businesses. The study of the operating characteristics of waiting lines is known as queuing theory. Numerous mathematical models are available to represent and study the behavior of different types of queues. These models have different assumptions about the nature of the arrival process to the queuing system, the allowable size and nature of the queuing discipline, and the service process within the system. For many models, closed-form equations have been developed to describe various operating characteristics of the system. When closed-form solutions are not possible, the technique of simulation must be used to analyze the behavior of the system.
=====================================================

Project Management (Part I):
At some point, almost every manager assumes responsibility for the completion of some type of project. The project might be relatively simple - such as planning a company picnic or producing an employee newsletter - or it might be more complex - such as planning the launch of a space shuttle, designing and implementing a large computer information system, or constructing a multistory office building. Successfully completing a project of any size requires a thorough analysis of the physical and logical interdependencies among the tasks involved and accurate estimates of the time and resources required to accomplish these tasks. Keeping track of all these details for even a relatively small project can be overwhelming.

Two techniques that were developed to help managers plan, organize, and control projects: Critical Path Method (CPM) and the Program Evaluation and Review Technique (PERT). Both techniques were developed concurrently but independently during the 1950s. CPM was developed by representatives of DuPont and Sperry-Rand to assist in the management of industrial projects in which the time required to perform each activity in the project could be estimated with a high degree of accuracy. The focus of CPM is to determine when a project should be completed and to schedule when each activity in the project must begin in order to keep the project on schedule. Thus, PERT was designed for projects in which the time required to perform each activity is essentially a random variable. PERT focuses on estimating the probability of completing a project by a given deadline.

Over the years, the two techniques have blended together, so that most practitioners today refer to them collectively as PERT/CPM or CPM/PERT. The fact that these techniques have stood the test of time and are still widely used is a tribute to their usefulness and simplicity. Indeed, the basic ideas behind PERT/CPM provide the underpinnings for a number of project management software tools.

One of the key ideas in both CPM and PERT is that any project can be broken down into component activities that require different amounts of time and that must be accomplished in a specific order. The major difference between CPM and PERT involves how the time element of the activities is determined. However, both CPM and PERT require a detailed network of the project that clearly indicates each of the main activities in the project and their precedence relationships.

Creating the Project Network:

The activities in a project can be represented as a network in one of two ways. A network is a set of nodes that are connected in various ways by directed arcs. Thus, if we let the nodes in a network represent the activities in a project, we are employing the Activity-On-Node (AON) network design. The nodes in the network correspond to each project activity. The arcs in this network indicate the precedence relationships between the activities (or nodes).

The second way to represent a project as a network is the Activity-On-Arc (AOA) design, in which the arcs represent the project's activities. The nodes in an AOA network represent the start and finish points (or milestone) for each activity. An AOA network does not allow multiple arcs with common start and finish nodes.

When a project does not have a unique start or finish activity. In such cases, it is necessary to create artificial start/or finish activities. These activities are artificial because they require no time to complete and merely serve to give a project a unique start and finish point. Thus, by using artificial start and finish activities, we can ensure that the network for any project has a unique start and finish point.

CPM: an overview:
After creating a network representation of a project, the next step in the CPM technique is to determine the earliest time that each activity in the network can start and finish. We determine these times by making what is called a forward pass through the network. Ultimately, this analysis determines the earliest time that the project itself can be completed. Next, we make a backward pass through the network to determine the latest time that each activity can start and finish without delaying the completion of the project. One of the primary goals in CPM is to determine the critical path through the network. Therefore, the earliest time that the project can be completed is the time required to complete the set of activities along the longest path through the network. The critical path is the longest path through a project network.

Any delays in the start or finish times of the activities on the critical path (also known as critical activities) delay the completion of the project. A project manager should always identify the critical activities in a project to focus attention on these activities and to avoid delays on these activities when possible.

In conducting the forward pass through the network, we will use the activity times to determine the earliest possible start time and the earliest possible finish time for each activity. The following notations represent these quantities:

ti = amount of time required to perform activity i

ESTi = earliest start time for activity i

EFTi = earliest finish time for activity i

During the backward pass, we will determine the latest time that each activity can be started and finished without delaying the project. These quantities are represented as:

LSTi = Latest start time for activity i

LFTi = Latest finish time for activity i

In general, the earliest time at which an activity can finish is the earliest time at which it can start plus the expected time required to perform the activity. That is, for any activity i:

EFTi = ESTi + ti

Key points: The Forward Pass:

The earliest start time for the initial activity in a project is time zero
The earliest start time of an activity is equal to the latest (or maximum) early finish time of the activities directly preceding it
The earliest finish time of an activity is equal to its earliest start time plus the time required to perform the activity

The backward pass:

In general, the latest time at which an activity can start without delaying a project is the latest time by which it must be finished minus the time required to perform the activity. That is, for any activity i:

LSTi = LFTi - ti

Key points: The Backward Pass:

The latest finish time for the final activity in a project is equal to its earliest finish time as determined by the forward pass
The latest finish time for any other activity is equal to the earliest (or minimum) late start time of the activities directly following (or succeeding) it
The latest start time of an activity is equal to its latest finish time minus the time required to perform the activity

Determining the critical path:

One of the key objectives of CPM is to determine the critical path through a project network. The critical path consists of the set of activities that, if delayed in any way, would cause a delay in the completion of the entire project. The activities on the critical path can be identified easily from the results of the forward pass and backward pass. Specifically, the activities whose latest start times equal their earliest start times make up the critical path (or, equivalently, whose latest finish times equal their earliest finish times).

If any of the activities in this path do not start by their earliest start time, the overall time required to complete the project will increase - unless management intervenes in some way. The noncritical activities in a project are distinguished by the presence of slack. Slack is the amount of time by which the start of an activity can be delayed without delaying the project. Critical activities have zero slack and noncritical activities have slack values that are strictly positive. The amount of slack for any activity i is defined as:

Slack for activity i = LSTi - ESTi

or, equivalently, as:

Slack for activity i = LFTi - EFTi

A note on Slack:

Slack represents the amount of time by which the start of an activity can be delayed without delaying the entire project, assuming that all predecessor activites start at their earliest start times. If any activity on a noncritical path starts late, the slack available along the rest of the noncritical path is reduced. For this reason, it is safer to focus on the latest start times of each activity (rather than on slack) because if all activities start by their latest start times, the project should not be delayed.

Key points: Determining the Critical Path:

Critical activities have zero slack and cannot be delayed without delaying the completion of the project
Noncritical activities have some positive amount of slack that represents the amount of time by which the start times of these activities can be delayed without delaying the completion of the entire project, assuming that all predecessor activities start at their earliest start times

Project Management using spreadsheet:

We can use a spreadsheet in a variety of ways to manage a project. We can use a spreadsheet to perform all of the calculations required to determine the earliest and latest start and finish times for the project activities in the problem.

Calculating the earliest start times and the latest finish times is a bit tricky but can be accomplished easily using array formulas. An array formula can perform multiple calculations using a range of cells and then return either a single result or multiple results. You create array formulas in the same way that you create other formulas, except that you press [Ctrl] + [Shift] + [Enter] to enter the formula.

The array formulas create circular references in the workbook. A cicular reference occurs when the value in a cell depends on the value in another cell that, in turn, is dependent on the value in the original cell. Usually, a circular reference in a workbook indicates that a formula contains an error - and Excel displays a dialog box telling you so! However, there are occasions when a circular reference is exactly what is needed to accomplish a particular task. This is such an occasion. So, to tell Excel that we intend to use circular references:

Click Tools, Options
Click the Calculation tab
Select the Iteration option
Click OK

Now, each cell must implement the logic of the forward pass to calculate the earliest start times. For each activity, this involves determining the maximum earliest finish time (EFT) for the activities that precede it.

A Gantt chart is a popular technique for portraying the schedule of a project's activities over time. This type of chart is useful in helping a project manager see when activities can begin and end, and in keeping track of which activities are underway at any point in time.
===================================================

Project Crashing (Part II):

With the CPM technique, we can use the times required to perform each activity to determine the least amount of time required to complete the entire project. However, the time required to complete an activity can often be shortened, or crashed. For example, suppose that the amount of time it takes to paint a house is normally five days. But, by hiring more painters or asking the existing painters to work overtime, it might be possible to paint the same house in two days. Thus, activities can have normal times and crash times.

The normal time of an activity represents the ordinary amount of time required to complete the activity with the usual amount of resources and effort. The crash time of an activity represents the least amount of time in which an activity can be accomplished if extra effort and resources are applied to the activity. The extra effort and resources required to crash certain activities usually increase the cost of performing these activities.

In managing projects, we often need to evaluate the trade-offs between the cost of crashing certain activities and the benefit of accomplishing these activities in less than normal time. For example, assume that it takes 46 working days to complete the construction of the house. But what if the buyer wants the house built in 35 days? We need some way to determine if this can be done and, if so, how much additional cost will be involved. When a critical activity is delayed, the completion of the project will be delayed unless management chooses to crash some of the remaining critical activities.

An LP Approach to Crashing:

We can use Solver to help determine the least costly way of crashing a project to meet certain deadlines. Although it is tempting to apply Solver directly to the spreadsheet model, that model contains numerous nonlinearities that would force us to use Solver's evolutionary algorithm. Although this can work, the solution process is slow and the results may be local rather than global optimal solutions. As it turns out, we can also solve project crashing problems using an LP formulation that is easy for Solver to solve and guarantees we obtain global optimal solutions. To crash project through LP approach we first must determine the normal time, normal cost, crash time, and crash cost for each project activity.

Using LP to identify earliest and latest start times:

With minor modifications, the LP model used here for crashing projects can also be used to determine earliest start times and latest times for all the activities in the project. To do this, first solve for the earliest start times by changing the objective to minimize the sum of all the activity start times (MIN: SumTi) and do not allow crashing (set all Ci = 0). Next, solve for the latest start times by changing the objective to maximize the sum of all the activity start times (MAX: SumTi), force TM to equal its earliest start time identified in the previous step (set TM = 42), and do not allow crashing (set all Ci = 0).

Determining a Least costly crash schedule:
Another use of a crashing model is to determine the least costly way to complete a project at some time earlier than would be possible using normal activity times. For example, if the home buyer insists on the house being built within 35 days, we might want to determine the least costly way of crashing the schedule to meet this deadline. We can solve this type of problem easily using the existing model. In this case, we would simply add a constraint to hold the project completion time at 35 days and attempt to minimize the total crash cost.

Crashing as an MOLP:
Two objectives can be pursued in crashing model. On the one hand, we might want to minimize the finish time of the project. On the other hand, we might want to minimize the cost of crashing the project. These two objectives are in conflict with one another because reducing the completion time of the project increases the crash costs. Thus, we could use the techniques for MOLPs.
Another way to study the cost/time trade-offs is to re-solve the problem several times to determine the minimum crash cost for each possible completion time. A graph (Graph of the relationship between crash time and completion time) showing the minimum crash cost for each possible completion time is useful in evaluating the trade-offs involved in crashing. This type of graph gives us a clear picture of the cost/time trade-offs involved in crashing a project.

Certainty vs. Uncertainty:

Throughout our discussion of CPM, we assumed that the times required to complete project activities were known with certainty or could, at least, be estimated with a high degree of accuracy. This assumption does not hold for all projects. For example, consider the schedule of activities required to design and build the first space shuttle. Because no one had ever built a space shuttle before, no historical data were available for estimating how long it would take to perform many of the creative tasks required by this project. Even in projects in which the same or similar tasks have been performed before, the amount of time required to perform them might be different. PERT was developed as an attempt to deal with uncertainty in activity times.

In recent years, a number of problems have surfaced regarding the PERT technique, causing many to question the wisdom of using this technique at all.

PERT: an overview

PERT differs from CPM in that it assumes the times required to perform the project activities are not known with certainty. PERT assumes that the activity times are random variables that have some mean, or expected, value and some variance. However, rather than specifying a mean and variance for each activity, we must give three time estimates for each activity in the project. Specifically, for each activity, PERT requires estimates of:

ai = estimate of the duration of activity i assuming the most favorable conditions

bi = estimate of the duration of activity i assuming the least favorable conditions

mi = estimate of the most likely duration of activity i

We can think of ai, bi, and mi as representing, respectively, the best case, worst case, and most likely times required to perform activity i. PERT uses these values to calculate the expected duration of each activity as:

ti = (ai + 4mi + bi)/6

and estimates the variance of each activity's duration as:

vi = (bi - ai)^2/36

The preceding formulas are based on the assumption that the times required to perform each activity are random variables that follow the beta probability distribution. The beta distribution can be used to describe a wide variety of random variables, and is a reasonable choice for describing the behavior of most activity times when their true distributions are unknown.

After calculating the expected times for each activity, the next step in PERT is to identify the critical path for the project. Because PERT assumes that the times required to perform the activities along a path are random variables, the time required to complete a given path is also a random variable with some mean and variance.

The expected (or mean) time required to complete any path in the network is simply the sum of the expected times (the ti) of the activities on the path. If we assume that the individual activity times in a project are independent of one another (as does PERT), we may also calculate the variance of the completion time for any path as the sum of the variances (the vi) of the activities on the path. Because all the paths through a project network must be completed in order for the project to be completed, PERT deems the path with the largest expected completion time to be the most critical - the critical path.

The expected completion time and variance for the critical path are used in PERT to estimate the probability of completing the project by various dates and to assist in negotiating and setting deadlines for the project's completion. Many projects impose financial penalties for each day, week, or month a project is late; therefore, it is often important to identify a deadline for a project that management is confident will not be exceeded. Thus, in addition to identifying a critical path for management to scrutinize, PERT also claims to offer assistance in estimating and setting deadlines for a project.

The problems with PERT:

The PERT technique presents a number of problems that should cause us to approach it with skepticism. First, PERT assumes that the time required to perform project activities are random variables that are independent of one another. Although this assumption makes it easy to complete the calculations in PERT, it is probably not a realistic assumption. For example, if one activity along the critical path (or any other path) runs a bit over its expected time, a diligent project manager will make sure that one or more of the subsequent activities run a bit under their expected times to catch up on the schedule. To the extent that these over- and under-runs offset one another, the expected completion time of the project should not be affected seriously. However, the variance of the completion times of the paths through the network will be reduced as a result.

The more serious problem in PERT involves the identification of the critical path. PERT identifies the critical path as the path with the longest expected completion time. Thus, an unsuspecting project manager might focus on the activities on the critical path, believing that they are the activities most likely to delay the completion of the project. In reality, activities not on the critical path can pose a greater risk of delaying the project.

Implications:

Activities not on PERT's critical path might, in fact, be more critical to the competition of a project than the activities PERT identifies as critical. Indeed, every activity in a project has some probability of being critical. For some activities, this probability might be near 0; for others, it might be closer to 1. A project manager should focus on the activities with the highest probability of being critical - regardless of whether they fall on the critical path. Unfortunately, PERT does little to help us identify the project activities that are truly the most critical.

Simulating Project Networks:

Although the PERT technique has serious drawbacks, it does highlight an important point - activity times in a project are likely to be somewhat random. Although CPM ignores this point entirely, by calculating the earliest and latest start times for each activity, it at least shows which activities are likely to be critical if the actual activity times deviate from their expected values.

The best way to evaluate the impact of variability in activity times on the critical path and the completion time of a project involves the technique of simulation.

To simulate a project, we need to generate random times from appropriate probability distributions for each project activity. For the given set of random activity times, we can determine the critical path and the time required to complete the project. If we repeat this process many times, we can determine the frequency with which the project activities fall on the critical path. We could also analyze the resulting distribution project completion times to estimate more accurately the probability of completing the project within various time periods.

Generating a Random Activity Times:

To simulate the duration of a project, we first must identify probability distributions that describe the behavior of the random activity times. Because the three time estimates for each activity correspond to the best-case, most-likely-case, and worst-case scenarios for each activity, we might assume that the random behavior of the activity times can be approximated by a triangular probability distribution. Notice that the shape of the triangular distribution varies depending on the parameters a, m, and b, which correspond (respectively) to the best-case, most-likely-case, and worst-case time estimates for a given activity.

Microsoft Project:

Although spreadsheets can be used to track and manage projects, specialized project management software can greatly simplify much of this task. One of the most popular project management tools today is Microsoft Project (MS Project).

The first step to managing a project in MS Project is to enter the name and duration of each activity or task. The next step is to indicate the precedence relations between the various tasks. After the tasks, durations, and precedence relations are entered, MS Project can easily create several different views of a project. For instance, you can select View, PERT Chart to create the chart. (Although MS Project uses the term PERT Chart to refer to the type of chart, this chart is not based on the PERT technique described earlier).

Another useful view of a project is obtained by selecting View, Tracking Gantt. The resulting chart allows a project manager to keep track of when each task actually begins and track its progress toward completion.

MS Project offers many other features to assist with allocating and tracking costs and assigning resources to various activities in a project.

Summary:

When the activity times in a project can be estimated with a reasonable amount of certainty, CPM can be applied to determine the earliest start and finish times for each activity and the earliest time for completing the entire project. CPM can also be used to determine the latest possible times that activities can start and finish without delaying the completion of the project. The slack in a project helps to distinguish between critical and noncritical activities. LP techniques can be used to analyze CPM networks and determine optimal srash schedules for projects that must be completed ahead of schedule.

When activity times cannot be estimated with certainty, they must be modeled as random variables. Although the PERT technique was designed to address these types of problems, it has some serious theoretical and practical shortcomings. However, simulation can be used to provide a more complete and accurate analysis of activity networks containing random activity times.
=====================================================

Decision Analysis (Part I):
Models do not make decisions - people do. Although the insight and understanding gained by modeling problems can be helpful, decision making often remains a difficult task. The two primary causes for this difficulty are uncertainty regarding the future and conflicting values or objectives.

For example, suppose that when you graduate from college you receive job offers from two companies. One company (company A) is in a relatively new industry that offers potential for spectacular growth - or rapid bankruptcy. The salary offered by this company is somewhat lower than you would like, but would increase rapidly if the company grows. This company is located in the city that is home to your favorite professional sports team and close to your friends and family.

The other job offer is from an established company (company B) that is known for its financial strength and long-term commitment to its employees. It has offered you a stating salary that is 10 percent more than you asked, but you suspect it would take longer for you to advance in this organization. Also, if you work for this company, you would have to move to a distant part of the country that offers few of the cultural and sporting activities that you enjoy.

Which offer would you accept? Or would you reject both offers and continue looking for employment with other companies? For many, this might be a difficult decision. If you accept the job with company A, you might be promoted twice within a year - or you could be unemployed in six months. With company B, you can be reasonably sure of having a secure job for the foreseeable future. But if you accept the job with company B and then company A grows rapidly, you might regret not accepting the position with company A. Thus, the uncertainty associated with the future of company A makes this decision difficult.

To further complicate the decision, company A offers a more desirable location than company B, but the starting salary with company A is lower. How can you assess the trade-offs between starting salary, location, job security, and potential for advancement in order to make a good decision? There is no easy answer to this question. There are number of techniques that can help you structure and analyze difficult decision problems in a logical manner which will be discussed later.

Good decisions vs. Good outcomes:

The goal of decision analysis is to help individuals make good decisions. But good decisions do not always result in good outcomes. For example, suppose that after carefully considering all the factors involved in the two job offers, you decide to accept the position with company B. After working for this company for nine months, it suddenly announces that, in an effort to cut costs, it is closing the office in which you work and eliminating your job. Did you make a bad decision? Probably not. Unforeseeable circumstances beyond your control caused you to experience a bad outcome, but it would be unfair to say that you made a bad decision. Good decisions sometimes result in bad outcomes.

Even when a good decision is made, luck often plays a role in determining whether a good or bad outcome occurs. However, using a structured approach to make decisions should give us enhanced insight and sharper intuition about the decision problems we face.

As a result, it is reasonable to expect good outcomes to occur more frequently when using a structured approach to decision making than if we make decisions in a more haphazard manner.

Characteristics of decision problems:

Although all decision problems are somewhat different, they share certain characteristics. For example, a decision must involve at least two alternatives for addressing or solving a problem. An alternative is a course of action intended to solve a problem. The job selection example described earlier involves three alternatives: you could accept the offer from company A, accept the offer from company B, or reject both offers and continue searching for a better one.

Alternatives are evaluated on the basis of the value they add to one or more decision criteria. The criteria in a decision problem represent various factors that are important to the decision maker and influenced by the alternatives. For example, the criteria used to evaluate the job offer alternatives might include starting salary, expected salary growth, desirability of job location, opportunity for promotion and career advancement, and so on. The impact of the alternatives on the criteria is of primary importance to the decision maker. Note that not all criteria can be expressed in terms of monetary value, making comparisons of the alternatives more difficult.

Finally, the values assumed by the various decision criteria under each alternative depend on the different states of nature that can occur. The state of nature in a decision problem correspond to future events that are not under the decision maker's control. For example, company A could experience spectacular growth, or it might go bankrupt. Each of these contingencies represents a possible state of nature for the problem. Many other states of nature are possible for the company; for example, it could grow slowly, or not grow at all. Thus, an infinite number of possible states of nature could exist in this, and many other, decision problems. However, in decision analysis, we often use a relatively small, discrete set of representative states of nature to summarize the future events that might occur.

The Payoff Matrix:

A common way of analyzing a decision problem is to construct a pay off matrix. A payoff matrix is a table that summarizes the final outcome (or pay off) for each decision alternative under each possible state of nature. To construct a payoff matrix, we need to identify each decision alternative and each possible state of nature.

Decision Rules:

Several decision rules can be used to help a decision maker choose the best alternative. No one of these decision rules works best in all situations and each has some weaknesses. However, these rules help to enhance our insight and sharpen our intuition about decision problems so that we can make more informed decisions.

Non-probabilistic methods:

The decision rules can be divided into two categories: those that assume that probabilities of occurrence can be assigned to the states of nature in a decision problem (probabilistic methods), and those that do not (non-probabilistic methods).

The Maximax Decision Rule:

This type of reasoning is reflected in the maximax decision rule, which determines the maximum payoff for each alternative and then selects the alternative associated with the largest payoff. In some situations, the maximax decision rule leads to poor decisions.

The maximin Decision Rule:

A more conservative approach to decision making is given by the maximin decision rule, which pessimistically assumes that nature will always be "against us" regardless of the decision we make. This decision rule can be used to hedge against the worst possible outcome of a decision. To apply the maximin decision rule, we first determine the minimum possible payoff for each alternative and then select the alternative with the largest minimum payoff (or the maximum of the minimum payoffs - hence the term "maximin"). The maximin decision rule can also lead to poor decision making.

The Minimax regret decision rule:

Another way of approaching decision problems involves the concept of regret, or opportunity loss.

Some decision makers are troubled that the addition of a new alternative, which is not selected as the final decision, can change the relative preferences of the original alternatives. For example, suppose that a person prefers apples to oranges, but would prefer oranges if given the options of apples, oranges, and bananas. This person's reasoning is somewhat inconsistent or incoherent. But such reversals in preferences are a natural consequence of the minimax regret decision rule.

Probabilistic methods:

Probabilistic decision rules can be used if the states of nature in a decision problem can be assigned probabilities that represent their likelihood of occurrence. For decision problems that occur more than once, it is often possible to estimate these probabilities from historical data. However, many decision problems represent one-time decisions for which historical data for estimating probabilities are unlikely to exist.

In these cases, probabilities are often assigned subjectively based on interviews with one or more domain experts. Highly structured interviewing techniques exist to solicit probability estimates that are reasonably accurate and free of the unconscious biases that may impact an expert's opinions.

Expected Monetary Value:

The expected monetary value decision rule selects the decision alternative with the largest expected monetary value (EMV). The EMV of alternative i in a decision problem is defined as:

EMVi = Sum(rij.pj)

where:

rij = the payoff for alternative i under the jth state of nature
pj = the probability of the jth state of nature

The EMV for a given decision alternative indicates the average payoff we would receive if we encounter the identical decision problem repeatedly and always select this alternative. Selecting the alternative with the highest EMV makes sense in situations where the identical decision problem will be faced repeatedly and we can "play the averages." However, this decision rule can be very risky in decision problems encountered only once. A technique known as the utility theory - that allows us to account for this type of risk in our decision making.

Expected Regret:
We can also use the probability of the states of nature to compute the expected regret, or expected opportunity loss (EOL), for each alternative in a decision problem.
The decision with the smallest EOL will also have the largest EMV. Thus, the EMV and EOL decision rules always result in the selection of the same decision alternative. The expected monetary value (EMV) and expected opportunity loss (EOL) decision rules always result in the selection of the same decision alternative.

Sensitivity Analysis:
When using probabilistic decision rules, one should always consider how sensitive the recommended decision is to the estimated probabilities. We can answer this by building a data table that summarizes the EMVs for each alternative as we vary the probabilities.

The expected value of perfect information:
One of the primary difficulties in decision making is that we usually do not know which state of nature will occur. Estimates of the probability of each state of nature can be used to calculate the EMV of various decision alternatives. However, probabilities do not tell us which state of nature will occur - they only indicate the likelihood of the various states of nature.

The expected value of perfect information (EVPI) is the expected value obtained with perfect information minus the expected value obtained without perfect information (which is given by the maximum EMV); that is:

Expected value of perfect information = Expected value with perfect information - Maximum EMV

The minimum EOL in a decision problem will always equal the EVPI. The expected value of perfect information (EVPI) is equivalent to the minimum expected opportunity loss (EOL).

Decision Trees:
Although some decision problems can be represented and analyzed effectively using payoff tables, we can also represent decision problems in a graphical form known as a decision tree. A decision tree is composed of a collection of nodes (represented by circles and squares) interconnected by branches (represented by lines). A square node is called a decision node because it represents a decision. Branches emanating from a decision node represent the difficult alternatives for a particular decision.
The circular nodes in a decision tree are called event nodes because they represent uncertain events. The branches emanating from event nodes (called event branches) correspond to the possible states of nature or the possible outcomes of an uncertain event. The value next to each branch from the event nodes indicates the cash flow that will occur for that decision/event combination.
The various branches in a decision tree end at the small black dots called leaves. Because each leaf corresponds to one way in which the decision problem can terminate, leaves are also referred to as terminal nodes.

Rolling back a decision tree:
After computing the payoffs at each leaf, we can apply any of the decision rules described earlier. For example, we could identify the maximum possible payoff for each decision and apply the maximax decision rule. However, decision trees are used most often to implement the EMV decision rule - that is, to identify the decision with the largest EMV. We can apply a process known as rolling back to a decision tree to determine the decision with the largest EMV. To roll back this decision tree, we start with the payoffs and work our way from right to left, back through the decision tree, computing the expected values for each node.
At a decision node, we always select the alternative that leads to the best EMV. The optimal alternative at a decision node is sometimes indicated by "pruning" the suboptimal branches. The pruned branches are indicated by the double vertical lines (| |) on the suboptimal alternatives emanating from node 0.
=======================================================

Decision Analysis (Part II)
Using Treeplan:

A spreadsheet add-in called TreePlan can help us create and analyze decision trees in Excel. To attach the TreePlan add-in, choose the Open command from the File menu and open the file named treeplan.xla. To create a decision tree using TreePlan, open a new workbook, then involve TreePlan by choosing the TreePlan command from the Tools menu (or by pressing [Ctrl][t]). In response, TreePlan displays the dialog box. If you click the New Tree button, TreePlan creates a tree diagram with one initial decision node and two decision branches. This initial tree diagram is inserted in the spreadsheet near the cell that is active when TreePlan is invoked. TreePlan automatically labels the branches in the tree as Decision 1 and Decision 2.

Adding Branches:

To add a new decision branch to tree:

Click the decision node
Press [Ctrl][t] to invoke TreePlan

It is important to understand that TreePlan is context sensitive - that is, the dialog box that appears when you invoke TreePlan depends on which cell is selected when TreePlan is invoked. To add a branch to the currently selected decision node, click the Add branch option, and then click OK. To add another decision branch to the tree, we can follow the same procedure:

Click the decision node
Press [Ctrl][t] to invoke TreePlan
Click Add branch
Click OK

Adding Event Nodes:

To add an event node:

Select the terminal node for the branch
Press [Ctrl[[t] to invoke TreePlan

TreePlan provides a built-in option that allows you to copy a section, or subtree, of a decision tree to another part of the tree. It is important to copy subtrees using this command so that TreePlan can update the appropriate formulas in the spreadsheet. To create a copy of the event node:

Select the event node you want to copy
Press [Ctrl][t] to invoke TreePlan
Click Copy subtree
Click OK

To paste a copy of this subtree into the decision tree:

Select the target cell location
Press [Ctrl][t] to invoke TreePlan
Click Paste subtree
Click OK

Multistage Decision Problems:

Our discussion of decision analysis has considered only single-stage decision problems - that is, problems in which a single decision must be made. However, most decisions that we face lead to other decisions. As a simple example, consider the decision of whether or not to go out to dinner. If you decide to go out to dinner, you must then decide how much to spend, where to go, and how to get there. Thus, before you actually decide to go out to dinner, you will probably consider the other issues and decisions that must be made if you choose that alternative. These type of problems are called multistage decision problems. The EMV decision rule is most appropriately applied when we face a decision that will be made repeatedly and the results of bad outcomes can be balanced or averaged with good outcomes.

Risk Profiles:

When using decision trees to analyze one-time decision problems, it is particularly helpful to develop a risk profile to make sure the decision maker understands all the possible outcomes that might occur. A risk profile is simply a graph or tree that shows the chances associated with possible outcomes.

A risk profile is an effective tool for breaking an EMV into its component parts and communicating information about the actual outcomes that can occur as the result of various decisions.

Strategy Tables:

A strategy table is another risk analysis tool that allows a decision maker to analyze how the optimal decision strategy changes in response to simultaneous changes in probability estimates.

Utility Theory:

Although the EMV decision rule is widely used, sometimes the decision alternative with the highest EMV is not the most desirable or most preferred alternative by the decision maker. The EMVs of different decision alternatives do not necessarily reflect the relative attractiveness of the alternatives to a particular decision maker. Utility theory provides a way to incorporate the decision maker's attitudes and preferences toward risk and return in the decision-analysis process so that the most desirable decision alternative is identified.

Utility Functions:

Utility theory assumes that every decision maker uses a utility function that translates each of the possible payoffs in a decision problem into a non-monetary measure known as a utility. The utility of a payoff represents the total worth, value, or desirability of the outcome of a decision alternative to the decision maker. Different decision makers have different attitudes and preferences toward risk and return. Those who are "risk neutral" tend to make decisions using the maximum EMV decision rule. However, some decision makers are risk avoiders (or "risk averse"), and others look for risk (or are "risk seekers"). The utility functions typically associated with these three types of decision makers: Risk averse, Risk neutral, and Risk seeking. A "risk averse" decision maker assigns the largest relative utility to any payoff but has a diminishing marginal utility for increased payoffs (that is, every additional dollar in payoff results in smaller increases in utility).

The "risk seeking" decision maker assigns the smallest utility to any payoff but has an increasing marginal utility for increased payoffs (that is, every additional dollar in payoff results in larger increases in utility). The "risk neutral" decision maker (who follows the EMV decision rule) falls in between these two extremes and has a constant marginal utility for increased payoffs (that is, every additional dollar in payoff results in the same amount of increase in utility).

Constructing utility functions:

Assuming that decision makers use utility functions (perhaps at a subconscious level) to make decisions, how can we determine what a given decision maker's utility function looks like? One approach involves assigning a utility value of 0 to the worst outcome in a decision problem and a utility value of 1 to the best outcome. All other payoffs are assigned utility values between 0 and 1. (Although it is convenient to use endpoint values of 0 and 1, we can use any values provided that the utility value assigned to the worst payoff is less than the utility value assigned to the best payoff. Risk premium, refers to the EMV that a decision maker is willing to give up (or pay) in order to avoid a risky decision.

Using utilities to make decisions:

After determining the utility value of each possible monetary payoff, we can apply the standard tools of decision analysis to determine the alternative that provides the highest expected utility. We do so using utility values in place of monetary values in payoff tables or decision trees. By using utilities, decision makers can identify the alternative that is most attractive given their personal attitudes about risk and return.

The Exponential utility function:

In a complicated decision problem with numerous possible payoff values, it might be difficult and time-consuming for a decision maker to determine the different values for p* that are required to determine the utility for each payoff. However, if the decision maker is "risk averse," the exponential utility function can be used as an approximation of the decision maker's actual utility function. The general form of the exponential utility function is:

U(x) = 1 - e^(-x/R)

In this formula, e is the base of the natural logarithm (e = 2.718281 . . . ) and R is a parameter that controls the shape of the utility function according to a decision maker's risk tolerance.

Note that as R increases, the shape of the utility curve become flatter (or less "risk averse"). Also note that as x becomes large, U(x) approaches 1; when x = 0, then U(x) = 0; and if x is less than 0, then U(x)<0.

To use the exponential utility function, we must determine a reasonable value for the risk tolerance parameter R. One method for doing so involves determining the maximum value of Y for which the decision maker is willing to participate in a game of chance with the following possible outcomes:

Win $Y with probability 0.5

Lose $Y/2 with probability 0.5

The maximum value of Y for which the decision maker would accept this gamble should give us a reasonable estimate of R. Note that a decision maker willing to accept this gamble only at very small values of Y is "risk averse," whereas a decision maker willing to play for larger values of Y is less "risk averse."

Incorporating utilities in TreePlan:

The TreePlan add-in provides a simple way to use the exponential utility function to model "risk averse" decision preferences in a decision tree. To use the exponential utility function, we first construct a decision tree in the usual way. We then determine the risk tolerance value of R for the decision maker. Note that the value of R should be expressed in the same units as the payoffs in the decision tree. TreePlan requires that we enter the value of R in a cell named "RT" (short for Risk Tolerance). (This cell must be outside of the rectangular region containing the decision tree).

Multicriteria decision making:
A decision maker often uses more than one criterion or objective to evaluate the alternatives in a decision problem. Sometimes, these criteria conflict with one another. For example, consider again the criteria of risk and return. Most decision makers desire high levels of return and low levels of risk. But high returns are usually accompanied by high risks, and low levels of return are associated with low risk levels. In making investment decisions, a decision maker must assess the trade-offs between risk and return to identify the decision that achieves the most satisfying balance of these two criteria. Utility theory represents one approach to assessing the trade-offs between the criteria of risk and return. Many other types of decision problems involve multiple conflicting criteria. For example, in choosing between two or more different job offers, you must evaluate the alternatives on the basis of starting salary, opportunity for advancement, job security, location, and so on. If you purchase a video camcorder, you must evaluate a number of different models based on the manufacturer's reputation, price, warranty, size, weight, zoom capability, lighting requirements, and a host of other features. If you must decide whom to hire to fill a vacancy in your organization, you will likely have to evaluate a number of candidates on the basis of education, experience, references, and personality.

The multicriteria scoring model:

The multicriteria scoring model is a simple procedure in which we score (or rate) each alternative in a decision problem based on each criterion. The score for alternative j on criterion i is denoted by sij. Weights (denoted by wi) are assigned to each criterion indicating its relative importance to the decision maker. For each alternative, we then compute a weighted average score as:

Weighted average score for alternative j = Sum (wi.sij)

We then select the alternative with the largest weighted average score. Next, the decision maker specifies weights that indicate the relative importance of each criterion. Again, this is done subjectively.

Radar charts provide an effective way of graphically summarizing numerous alternatives in a multicriteria scoring model. The radar chart's ability to graphically portray the differences in the alternatives can be quite helpful - particularly for decision makers that do not relate well to tables of numbers.

The analytic hierarchy process:

Sometimes, a decision maker finds it difficult to subjectively determine the criterion scores and weights needed in the multicriteria scoring model. In this case, the analytic hierarchy process (AHP) can be helpful. AHP provides a more structured approach for determining the scores and weights for the multicriteria scoring model.

Pairwise Comparisons:

The first step in AHP is to create a pairwise comparison matrix for each alternative on each criterion.

Normalizing the comparisons:

The next step in AHP is to normalize the matrix of pairwise comparisons. To do this, we first calculate the sum of each column in the pairwise comparison matrix. We then divide each entry in the matrix by its column sum.

Consistency:

In applying AHP, the decision maker should be consistent in the preference ratings given in the pairwise comparison matrix. Thus, before using the scores derived from the normalized comparison matrix, the preferences indicated in the original pairwise comparison matrix should be checked for consistency.
If the decision maker is perfectly consistent in stating preferences, each consistency measure will equal the number of alternatives in the problem. It is difficult for a decision maker to be perfectly consistent in stating preferences between a large number of pairwise comparisons. Provided that the amount of inconsistency is not excessive, the scores obtained from the normalized matrix will be reasonably accurate. To determine whether the inconsistency is excessive, we compute the following quantities:

Consistency Index (CI) = (Landa - n)/(n - 1)

Consistency Ratio (CR) = CI/RI

Where:

Landa = the average consistency measure for all alternatives

n = the number of alternatives

RI = the appropriate random index - (n = 2, RI=0.00) - (n=3, RI=0.58) - (n=4, RI=0.90) - (n=5, RI=1.12) - (n=6, RI=1.24) - (n=7, RI=1.32) - (n=8, RI=1.41)

If the pairwise comparison matrix is perfectly consistent, then Landa = n and the consistency ration is 0. The values of RI give the average value of CI if all the entries in the pairwise comparison matrix were chosen at random, given that all the diagonal entries equal 1 and Pij = 1/Pij. If CR <= 0.10, the degree of consistency in the pairwise comparison matrix is satisfactory. However, if CR > 0.10, serious inconsistencies might exist and AHP might not yield meaningful results.

Obtaining criterion weights:

Before we can use values in a scoring model, we must also determine weights that indicate the relative importance of the three criteria to the three criteria maker. The pairwise comparison process is used to generate scores for the alternative on each criteria can also be used to generate criterion weights.

Implementing the scoring model:

The last step in AHP is to calculate the weighted average scores for each decision alternative.

Summary:

First, a payoff table can be used to summarize the alternatives in a single-stage decision problem. No one decision rule works best in all situations, but together, the rules help to highlight different aspects of a problem and can help develop and sharpen a decision maker's insight and intuition about a problem so that better decisions can be made. When probabilities of occurrence can be estimated for the alternatives in a problem, the EMV decision rule is the most commonly used technique.

Decision trees are particularly helpful in expressing multistage decision problems in which a series of decisions must be considered. Each terminal node in s decision tree is associated with the net payoff that results from each possible sequence of decisions. A rollback technique determines the alternative that results in the highest EMV. Because different decision makers derive different levels of value from the same monetary payoff.

The multicriteria scoring model requires the decision maker to assign a score for each alternative on each criterion. Weights are then assigned to represent the relative importance of the criteria, and a weighted average score is computed for each alternative. The alternative with the highest score is the recommended alternative. AHP provides a structured approach to determining the scores and weights used in a multicriteria scoring model if the decision maker has difficulty specifying these values.

====================================================

Ref: Spreadsheet Modeling & Decision Analysis, by Cliff T.Ragsdale

Product Design Engineering

Pages

Saturday, November 27, 2010

Spreadsheet Modeling and Decision Analysis

No comments:

Post a Comment

Get new posts by email: