One Crank or Two?

One Crank or Two?

Data Science and Intro Stat Kari Lock Morgan Assistant Professor of Statistics Penn State University ECOTS May 2018 Data Science in Intro How should intro stat adapt in an era of

abundant availability and use of data? Data Science and Intro Stat Computin g Statistics (Concepts, Methods, Theory)

Domain knowledg e Data Science? as needed to make sense of data Intro Stat? simple

How to Adapt? Focus on making sense of data! Focus on Making Sense of Data ( )= ( ) + ( ) ( ) 2 ( ) = ( )

( = ) = ( 1 ) ( ) ( < ) = 0

How to Adapt? Focus on making sense of data! What kind?!? Data Collection Classical statistics: Design,

Inference! randomness Ask a question Collect (small) data to answer it Data science?

Inference? Obtain available (big) data See what it tells you Data Quality vs Quantity Which provides a better (MSE) estimate? a) A simple random sample of n = 100

b) A non-random sample of n = 50 million (!) (say from the US population of 320 million) with correlation of 0.05 between x and probability of inclusion (relatively small) The small random sample!!! Meng, X.L. (2016). Discussion of Perils and potentials of selfselected entry to epidemiological studies and surveys, Journal of the Royal Statistical Society: Series A (Statistics in Society), 179(2), 319-376.

Data Quality over Quantity For population inference, small random sample beats large biased sample For causality, small randomized experiment beats large observational study (Statistics beats data science? ) Design (randomness) remains important inference remains important! How far might the estimate be from the truth?

Is the effect more than might be seen by chance? But Random sampling/assignment is hard! Non-random data are EVERYWHERE!!! For intro stat to remain relevant, we have to acknowledge and embrace the abundance of available data. AP Stat theme 2 (of 4): Data must be collected according to a well-developed plan if valid information is to be

obtained. How to Adapt? Focus on making sense of data! Keep some design and inference Do more with available data

Design and Inference Random sampling and assignment Inferential concepts sampling variability interval estimation hypothesis testing How can we cover this more efficiently? Simulation-based inference: more for less

What can we cut? Lets prioritize the good stuff! Available Data Acknowledge that not all data come from question -> design -> inference Data quality and limitations (e.g. sampling bias, confounding, missing data) Inferential cautions (e.g. multiple testing, sample size, non-random) Multivariable thinking

Highlight the abundance, diversity, and omnipresence of data One Way to Start www.gapminder.org/tools/ www.gapminder.org/data/ How to Adapt? Focus on making sense of data!

Keep some design and inference Do more with available data Emphasize the overlap Emphasize Overlap

EDA, especially data visualization Choice of graph/stat/parameter/method Modeling Interpretation and communication Context, background, real conclusions Technology Technology in Intro Stat Use technology in a way that engages students eliminates tedious work

excites students enhances conceptual understanding empowers students to make sense of data extendable or easy? Data Science in Intro Lets think about how to keep intro stat relevant in an era of data science! My opinion: Focus on making sense of data Acknowledge that not all data analysis is question

-> purposeful design -> inference But that the above remains valuable! Emphasize the overlap What do you all think?!? www.tricider.com/brainstorming/3R3ZmK3a02l [email protected]

Recently Viewed Presentations

  • Analysis of Algorithms - Gordon College

    Analysis of Algorithms - Gordon College

    Analysis of Algorithms CPS212 ... for all x > n Order of magnitude growth analysis Asymptotic growth - as the growth approaches infinity An asymptote of a real-valued function y = f(x) is a curve which describes the behavior of...
  • Mutations are any changes in the genetic material

    Mutations are any changes in the genetic material

    I have the sugar ribose RNA I have the sugar deoxyribose DNA I include the bases guanine cytosine and adenine Both. I travel out of the nucleus to a ribosome RNA. I have the base uracil RNA RNA and Protein...
  • Monster

    Monster

    Monday, February 11th. 1. Cacophony-n.-harsh or discordant sound. 2. Bravado-n.- a show of boldness intended to impress or intimidate. 3. Gullible- adj.-easily persuaded to believe something
  • Who We Are - Vermont

    Who We Are - Vermont

    Who We Are. Founded in 2000. Statewide energy efficiency utility. Administered by VEIC, under appointment of Public Service Board
  • Coleman, Guillory, Hernandez, Seale Teacher(s): Time: This Course:

    Coleman, Guillory, Hernandez, Seale Teacher(s): Time: This Course:

    Teacher(s): Time: The. Course Organizer. Student: Course Dates: Course Standards. This Course: Course Questions: is. about. U. S History Early. The story of American from exploration to reconstruction and how culture, economics, government, and geography have affected the shaping of...
  • The 4 CFPB Final Rules of the Dodd-Frank Wall Street Reform ...

    The 4 CFPB Final Rules of the Dodd-Frank Wall Street Reform ...

    Agenda. In the 2010 Dodd-Frank Wall Street Reform and Consumer Protection Act (Dodd-Frank Act aka DFA), Congress adopted Ability to Repay (ATR) requirements on closed-end mortgage loans and also established a presumption of compliance with certain mortgages called Qualified Mortgages...
  • Archetypes - Cypress Woods English I

    Archetypes - Cypress Woods English I

    K: Manic Monday 16. No School 17. Archetypes Notes Continue Act I 18. Cont. Act I 19. Cont. Act I 20. Act I **Extra Credit due Feb 3 4. Autonomous (aw TON uh muhs): independent, self-contained Synonym: independent Sentence: The...
  • Accounting Code Supplement for New Councils Purpose of

    Accounting Code Supplement for New Councils Purpose of

    Definition of fair value AASB 13. Fair value in AASB 13 is: 'the price that would be received to sell an asset or paid to transfer a liability in an orderly transaction in the principal (or most advantageous) market at...