Subido por Juan Nicolás Castillo usaquen

0a660c951096f910db2d424b1e2537b5 01OverviewIntro1of6

Anuncio
Session 1
Overview & Introduction
Course Objectives
Data
Information
Decisions
• Find, extract, organize and describe data
• Quantify possible relationships and uncertainty
• Develop spreadsheet models to analyze data and
evaluate risk
• Optimize decisions and justify a course of action
Data Analysis Shows Up Everywhere
• Behavioral targeting and customer
segmentation
• Donor development
• Constructing a portfolio of investments
• Demand forecasting
• Planning and allocating resources
A General Decision Making Framework
1. Define the business problem
2. Collect and organize the relevant data
3. Examine the relationship among different
factors and the extent of uncertainty
4. Develop an evaluation model
5. Evaluate potential solutions
6. Recommend a course of action
Example 1: Political Advertising
Expenditures in 2008 Election
2012 Election
“The 2012 presidential election is shaping up to be a
multibillion-dollar contest. President Obama’s re-election
committee is expected to raise at least $1 billion, and
Republicans have high hopes that their nominee will reach
the 10-figure level as well…
‘He [President Obama] would be a Top 100 advertiser,’ says
Brad Adgate, the senior vice president for research at
Horizon Media, a New York ad agency. ‘You know, it's what
Home Depot spends, about a billion dollars a year.’”
NPR, 2/18/2011
2010 Electoral Votes
538 votes, 270 to win
Advertising
Decisions
How Do We Win the Voter?
Example 2: Performing Arts Centers
• Decisions to be made:
–
–
–
–
–
Performance schedules
Pricing subscriptions and seating tiers
Ticket bundling
Fundraising campaigns
Advertising mix
How Do We Build the Audience?
Example 3: Salesforce Management
• How many (and which) employees need to be
working at a particular time?
• Which tasks should be completed by which
employees?
• Compensation packages to attract and retain
top talent?
Marketing to Consumers
Course Goals
• Conduct appropriate analysis of marketing
data
• Increase familiarity with Excel
• Build useful tools to solve business problems
QUESTIONS?
Organizing Data
• Data are often organized into a data table
“Cases”
“Records”
“Observations”
Variables
Linking Data with a Relational
Database
Variable Types
Categorical
Quantitative
No natural numerical meaning
Natural numerical meaning
May appear in a data table as a number
Already a number
Arithmetic makes no sense
Some arithmetic makes sense
Examples?
Has an appropriate unit
Examples?
TV Advertising
• Suppose the viewer shares for one hour of
television are as follows:
ABC
CBS
CW
FOX
NBC
Other TV
4%
13%
5%
8%
7%
63%
ABC
CBS
CW
FOX
NBC
Other TV
Demo Group 1
7%
8%
8%
4%
8%
65%
Demo Group 2
1%
18%
2%
12%
6%
61%
Overall
4%
13%
5%
8%
7%
63%
Overall
• Is this sufficient?
The Motion Picture Industry
• 2.4MM workers
• Contributes $180B to the economy
• Community as a whole pays more than $15B
in state and federal taxes
Virtual Stock Market for Movies
Examining Categorical Variables in the
Motion Picture Industry
• What kind of movies do studios produce and which succeed?
• Movie data compiled for 2001-2005
– Title
– Adaptation of (graphic) novel
– Basis in other media (e.g., TV show, video game, spin-off from another
movie)
– Tickets sold
– Gross revenue
– Production budget
– Marketing budget
– Release date
– Genre
– Studio
– MPAA ratings
Methods to Examine Categorical Data
• Frequency Tables
• Pie Charts
• Bar/column charts
• Contingency tables (cross-tabs)
• Side-by-side and segmented bar
charts
Frequency and Relative Frequency Tables
Movie Studio
Universal
20th Century Fox
Warner Bros.
Buena Vista
Sony Pictures
Lionsgate
Paramount Pictures
Sony Pictures Classics
New Line
Dreamworks SKG
Miramax
Miramax/Dimension
Focus Features
Fox Searchlight
Other
TOTAL
Movies Released in 2005
18
17
17
15
15
13
12
10
9
7
7
5
5
5
44
199
% of Releases
9.05%
8.54%
8.54%
7.54%
7.54%
6.53%
6.03%
5.03%
4.52%
3.52%
3.52%
2.51%
2.51%
2.51%
22.11%
100%
How Many Movies Do Studios
Release?
• From a frequency table, we can generate the
cumulative distribution
Visualizing Data: Bar Charts
• It’s often easier to look at a bar chart than at a
frequency table
Visualizing Data: Pie Charts
• Differences compared to bar chart and cumulative distribution?
Which is more useful?
Caveats About Bar and Pie Charts
• These figures are only appropriate if
observations fall into only one of the
categories
– Pie charts should add to 100%
– These visual representations focus on a single
categorical variables; can be generalized to
analyze combinations
Examining Relationships Among
Categorical Variables
• Contingency tables (cross-tabs) let us examine
patterns among multiple categorical variables
• Do studios release the same types of movies?
– Studio and genre
– Studio and rating
Studio and MPAA Ratings
• There are 351 movies released by these 4 studios in
our data
• Of the 351, 23 were rated G (marginal distribution)
• In our data, 20 G-rated movies were released by Buena
Vista (individual cell)
Studio and MPAA Ratings
• Contingency tables can be formatted to show
what fraction each cell is of the total
Studio and MPAA Ratings
• We want to know the ratings of movies that
different studios produce
• Can format the contingency table to display what
proportion of each studio’s movies have different
ratings
Studio and MPAA Ratings
• Which studios make the most movies of different
MPAA ratings?
• The contingency table can show us what
percentage of R rated movies are made by each
studio (conditional distribution)
Displaying Conditional Distributions
Displaying Conditional Distributions
• Can also display as a segmented bar chart
• If the conditional distribution is the same across different
categories, the two variables are said to be independent
Recap of Techniques
• Frequency tables, bar chart, pie chart
– Useful for looking at the distribution of a single
categorical variable
• Contingency tables, side-by-side and segmented
bar charts
– Useful for examining potential relationships among
categorical variables
• Present your results in a way that is consistent
with what you need to know
Analyzing Quantitative Variables
• Numerical methods
– Descriptive statistics
– Correlation
• Visual methods
– Histograms
– Box plots
– Time series plots
– Scatterplots
“Risk” and “Return”
• Investment performance
• Customer valuation
• Product demand
• Employee performance
Returns at Verizon
• The histograms show the frequency with
which different returns occur
Steps in Constructing Histograms
• Decide the width of each “bin”
– This will depend on the range you observe in the
data
• Determine how many observations fall into
each bin
• Decide which bin observations on the border
fall into
– Typically assigned to the higher bin
Common Descriptive Statistics for
Quantitative Data
• Measures of central tendency
– Mean – the average value
– Median – the “middle” value
• Measures of dispersion
– Standard deviation
– Variance
– Range
– Inter-quartile Range (IQR)
Excel Formulas for Measuring Central
Tendencies
• Mean: =average(data_range)
N
∑y
i
y=
i =1
N
• Median: =median(data_range)
– Unlike the mean, the median is not sensitive to
extreme values
Examining a Customer Portfolio
• From an analysis of subscribers to a large US telecom
provider
• Mean CLV ~ $1200; Median CLV ~ $800
– A distribution that tails of to the right is right skewed
– Who do you go after?
Excel Formulas for Measuring
Dispersion
• Variance: =var(data_range)
N
s2 =
2
(
y
−
y
)
∑ i
i =1
N −1
• Standard deviation: =stdev(data_range)
N
∑ ( y − y)
i
s=
i =1
N −1
2
Excel Formulas for Measuring
Dispersion
• Range = largest value – smallest value
=max(data_range)-min(data_range)
• IQR = range containing the middle 50% of the
data
=percentile(data_range,.75)-percentile(data_range,.25)
Dealing with Outliers
• Outliers are observations that stand apart
from the majority of observations
– Can heavily influence our analysis and conclusions
– Might be errors
– Should be noted in any conclusions drawn from
the data
Temporal Data
• Time series plots can be used to see temporal
patterns in the data
Stock Performance
Time Series and Forecasting
Sales
Trend
Cycle
Time
Assessing Relationships between
Quantitative Variables
• Scatterplots let us examine the association between two
variables
– Consider the direction, form, and extent of dispersion
• Daily returns of Verizon vs. S&P 500 in 2010
Assessing Relationships between
Quantitative Variables
• The correlation assesses the strength of the
linear relationship between two quantitative
variables
=correl(data_X,data_Y)
1
× Σ( X − X )(Y − Y )
Cov( X , Y )
n −1
r=
=
σ ( X ) × σ (Y )
1
1
2
2
Σ( X − X ) ×
Σ(Y − Y )
n −1
n −1
i
i
i
i
Correlation Analysis in Finance
• Correlation matrix of daily returns for Verizon,
Comcast, and AT&T
Some Examples of Correlation
r = +0.4
r = +0.9
Y
Y
Y
r = -0.7
X
X
X
Correlation can be misleading. Beware of...
r = 0.8
r=0
Y
Y
outliers!
non-linearities!
X
X
Recap of Techniques
• Examining individual variables
– Descriptive statistics
– Histograms
– Time series plots
• Examining potential relationships among
multiple variables
– Correlation (quantitative vs. quantitative data)
– Scatterplots (quantitative vs. quantitative data)
Descargar