Wednesday, August 19, 2020

Pivot Table - Excel

Excel mini-workshop:

In this section, we will see how spreadsheets are utilized to do simple data analysis. To perform this practice you need to have access to Microsoft excel.

To start this practice, please download the file that contains the data by clicking here.  Make sure you know the folder that you are going to save the file. Now Let's start!

When you open the file you should see a page like the one below. Every Excel file can have multiple tabs(or sheets). As you see, in this excel file, there are 6 tabs which are SKU Master, Store 312, Store 323, Store 415, Store 521, and Store 632.

  • SKU Master - information on each of SKU
  • Store 312 - daily transactional records for sales at store 312
  • Store 323 - daily transactional records for sales at store 323
  • Store 415 - daily transactional records for sales at store 415
  • Store 521 - daily transactional records for sales at store 521
  • Store 632 - daily transactional records for sales at store 632
Now click on tab 312 to see the transactional data for store 312. The questions for this practice that we are interested to find answers for are:
  1. Which specific week had the highest sales of MA Excellent Products in dollars for store 312?
  2. Are the sales (in units) between the different SKUs from MA Excellent Products correlated for store 312? 
  3. How does the profit margin for salesof MA Excellent Products in store 312 change over the time period?
  4. TinyCo is thinking of running a one-day promotion each week for MA Excellent Products– which day of the week makes the most sense for store 312?
  5. How do these SKUs of MA Excellent Products behave differently in other stores?
Before I let you start working on these questions, let's discuss a few concepts and some functions in excel.

Pivot tables: 
As we explained in the videos, the pivot table is a tool that helps us to summarize and analyze the data. In the following we have shown how to use a pivot table using an example.

Example: Let's find the SKU in store 312 that has the maximum unit sales for 2/21/2015.

Step 1: let's make sure we are in tab 312 since we are only interested in store 312
Step 2: Select the entire data that we want to summarize.


Step3: Click on the insert tab at the top.
Step 4: In the most right hand side of the insert tab, there is an icon looks like. Click on it.



Step 5: A window opens up which asks you to confirm the region of the data that you want to analyze and the cell/tab that you want to put your table. Go ahead and choose the the location of your pivot table. The default is new worksheet and we can leave it as is for now. Click OK so the table gets created.


Step 6: Now we should all have a pivot table in front of us. This table is empty and we need to construct the table so that it summarize the data the way we want. You should be able to see a pivot table fields on the right that has 4 boxes, Filter, Columns, Rows, and Values. Using these boxes that we can create a desirable table.


Back to the question, we want to summarize the data so it helps us identify the SKU that has the maximum sales for 2/21/2015 of store 312. There are many different ways to summarize the data so we can find our answer. Here I show one of them.
Let's create table that shows distinct SKU's in rows for 2/21/2015 and for each SKU show the total sales of each.

Step 7: To start we will grab the SKU field name from field name and drop it in rows.


The resulting table should look like something like this:


Now we can see distinct list of SKU's in rows. Since we want to see total of sales for each SKU, we select Unit Sales from field name but this time drop it in values. The values box is the only box that you can do operations on data such as summation, extraction, maximization, etc. Both Columns and Rows boxes are only for summarizing and grouping the data and you cannot perform any operations on the fields in them.

Step 8: After dropping Unit Sales in Values box, we also want to make sure we are adding the values and not any other operations. For this purpose, right click on the Unit Sales in the Values box and choose Field Settings.


In the field setting we want to make sure we are choosing "sum" as our desirable operations.


Click OK and continue. Right now you should all have a table look like below which shows total unit sales by SKU. But wait, aren't we missing something?


Let's review our question again, "Find the SKU with maximum unit sales for 2/21/2015 for store 312". Sure we have created a table that has total unit sales by SKU, but are those numbers only for store 312 and 2/21/2015?
Since the selected data for Pivot Table is from tab 312 then all the numbers in the table are for 312. However, in the 312 tab, there are many transactional sales by different dates. So how can we filter the table only for a specific date? Thanks to Excel this can be done pretty easy. Like before, we just need to select Date and drop it into Filter box. The filter box is designed to include only part of the selected data that we want, in this case on the transactions from 2/21/2015.

Step 9: After dropping Dates in Filters field, we should have a table look like this:


If your date is not grouped as desired, you can activate the grouping manually:


Last step here is to click on the filter above the Pivot Table and uncheck select all option. Then scroll down and choose 2/21/2015.


The result should look like the following:


As we can see the answer to our question is SKU 8000520021.

Correlation: the degree in which two or more variables changes are related to one another.

An example could be age and health related expenses. One might observe that as population ages, their health related expenses goes up. This relationship does not have to be in the same direction, it could be opposite as well. Example could be the higher the temperatures goes the less layers people wear.

Profit margin: it is the percentage difference between revenue and cost over revenue.
========================================================
Creating Histogram in EXCEL 2016:
So a histogram is simply a graphical representation of the distribution by mutually exclusive and collectively exhaustive bins or intervals.


=====================================================
How To Calculate Confidence Intervals In Excel:

=====================================================