Mô tả

Statistics and probability control your life. I don't just mean What YouTube's algorithm recommends you to watch next, and I don't just mean the chance of meeting your future significant other in class or at a bar. Human behavior, single-cell organisms, Earthquakes, the stock market, whether it will snow in the first week of December, and countless other phenomena are probabilistic and statistical. Even the very nature of the most fundamental deep structure of the universe is governed by probability and statistics.

You need to understand statistics.

Nearly all areas of human civilization are incorporating code and numerical computations. This means that many jobs and areas of study are based on applications of statistical and machine-learning techniques in programming languages like Python and MATLAB. This is often called 'data science' and is an increasingly important topic. Statistics and machine learning are also fundamental to artificial intelligence (AI) and business intelligence.

If you want to make yourself a future-proof employee, employer, data scientist, or researcher in any technical field -- ranging from data scientist to engineering to research scientist to deep learning modeler -- you'll need to know statistics and machine-learning. And you'll need to know how to implement concepts like probability theory and confidence intervals, k-means clustering and PCA, Spearman correlation and logistic regression, in computer languages like Python or MATLAB.

There are six reasons why you should take this course:

  • This course covers everything you need to understand the fundamentals of statistics, machine learning, and data science, from bar plots to ANOVAs, regression to k-means, t-test to non-parametric permutation testing.

  • After completing this course, you will be able to understand a wide range of statistical and machine-learning analyses, even specific advanced methods that aren't taught here. That's because you will learn the foundations upon which advanced methods are build.

  • This course balances mathematical rigor with intuitive explanations, and hands-on explorations in code.

  • Enrolling in the course gives you access to the Q&A, in which I actively participate every day.

  • I've been studying, developing, and teaching statistics for over 20 years, and I think math is, like, really cool.

What you need to know before taking this course:

  • High-school level maths. This is an applications-oriented course, so I don't go into a lot of detail about proofs, derivations, or calculus.

  • Basic coding skills in Python or MATLAB. This is necessary only if you want to follow along with the code. You can successfully complete this course without writing a single line of code! But participating in the coding exercises will help you learn the material. The MATLAB code relies on the Statistics and Machine Learning toolbox (you can use Octave if you don't have MATLAB or the statistics toolbox). Python code is written in Jupyter notebooks.

  • I recommend taking my free course called "Statistics literacy for non-statisticians". It's 90 minutes long and will give you a bird's-eye-view of the main topics in statistics that I go into much much much more detail about here in this course. Note that the free short course is not required for this course, but complements this course nicely. And you can get through the whole thing in less than an hour if you watch if on 1.5x speed!

  • You do not need any previous experience with statistics, machine learning, deep learning, or data science. That's why you're here!

Is this course up to date?

Yes, I maintain all of my courses regularly. I add new lectures to keep the course "alive," and I add new lectures (or sometimes re-film existing lectures) to explain maths concepts better if students find a topic confusing or if I made a mistake in the lecture (rare, but it happens!).

You can check the "Last updated" text at the top of this page to see when I last worked on improving this course!

What if you have questions about the material?

This course has a Q&A (question and answer) section where you can post your questions about the course material (about the maths, statistics, coding, or machine learning aspects). I try to answer all questions within a day. You can also see all other questions and answers, which really improves how much you can learn! And you can contribute to the Q&A by posting to ongoing discussions.

And, you can also post your code for feedback or just to show off -- I love it when students actually write better code than me! (Ahem, doesn't happen so often.)

What should you do now?

First of all, congrats on reading this far; that means you are seriously interested in learning statistics and machine learning. Watch the preview videos, check out the reviews, and, when you're ready, invest in your brain by learning from this course!

Bạn sẽ học được gì

Descriptive statistics (mean, variance, etc)

Inferential statistics

T-tests, correlation, ANOVA, regression, clustering

The math behind the "black box" statistical methods

How to implement statistical methods in code

How to interpret statistics correctly and avoid common misunderstandings

Coding techniques in Python and MATLAB/Octave

Machine learning methods like clustering, predictive analysis, classification, and data cleaning

Yêu cầu

  • Good work ethic and motivation to learn.
  • Previous background in statistics or machine learning is not necessary.
  • Python -OR- MATLAB with the Statistics toolbox (or Octave).
  • Some coding familiarity for the optional code exercises.
  • No textbooks necessary! All materials are provided inside the course.

Nội dung khoá học

19 sections

Introductions

5 lectures
[Important] Getting the most out of this course
04:28
About using MATLAB or Python
04:09
Statistics guessing game!
08:47
Using the Q&A forum
05:16
(optional) Entering time-stamped notes in the Udemy video player
01:52

Math prerequisites

8 lectures
Should you memorize statistical formulas?
03:12
Arithmetic and exponents
04:02
Scientific notation
05:53
Summation notation
04:21
Absolute value
03:04
Natural exponent and logarithm
08:00
The logistic function
08:58
Rank and tied-rank
06:30

IMPORTANT: Download course materials

1 lectures
Download materials for the entire course!
04:40

What are (is?) data?

7 lectures
Is "data" singular or plural?!?!!?!
01:53
Where do data come from and what do they mean?
06:09
Types of data: categorical, numerical, etc
14:56
Code: representing types of data on computers
08:58
Sample vs. population data
12:02
Samples, case reports, and anecdotes
05:31
The ethics of making up data
06:57

Visualizing data

14 lectures
Bar plots
11:37
Code: bar plots
16:59
Box-and-whisker plots
05:41
Code: box plots
08:41
"Unsupervised learning": Boxplots of normal and uniform noise
02:31
Histograms
11:16
Code: histograms
16:40
"Unsupervised learning": Histogram proportion
02:22
Pie charts
05:59
Code: pie charts
13:22
When to use lines instead of bars
06:11
Linear vs. logarithmic axis scaling
09:04
Code: line plots
07:24
"Unsupervised learning": log-scaled plots
01:44

Descriptive statistics

25 lectures
Descriptive vs. inferential statistics
04:31
Accuracy, precision, resolution
07:28
Data distributions
11:26
Code: data from different distributions
32:08
"Unsupervised learning": histograms of distributions
01:57
The beauty and simplicity of Normal
05:29
Measures of central tendency (mean)
12:47
Measures of central tendency (median, mode)
12:17
Code: computing central tendency
13:57
"Unsupervised learning": central tendencies with outliers
03:07
Measures of dispersion (variance, standard deviation)
17:48
Code: Computing dispersion
26:33
Interquartile range (IQR)
04:53
Code: IQR
15:58
QQ plots
07:20
Code: QQ plots
15:34
Statistical "moments"
08:23
Histograms part 2: Number of bins
10:00
Code: Histogram bins
12:24
Violin plots
03:19
Code: violin plots
10:09
"Unsupervised learning": asymmetric violin plots
02:31
Shannon entropy
11:02
Code: entropy
20:15
"Unsupervised learning": entropy and number of bins
01:26

Data normalizations and outliers

18 lectures
Garbage in, garbage out (GIGO)
04:10
Z-score standardization
09:25
Code: z-score
12:50
Min-max scaling
05:06
Code: min-max scaling
08:16
"Unsupervised learning": Invert the min-max scaling
02:35
What are outliers and why are they dangerous?
14:26
Removing outliers: z-score method
09:26
The modified z-score method
04:03
Code: z-score for outlier removal
22:30
"Unsupervised learning": z vs. modified-z
02:38
Multivariate outlier detection
09:26
Code: Euclidean distance for outlier removal
09:01
Removing outliers by data trimming
05:47
Code: Data trimming to remove outliers
11:03
Non-parametric solutions to outliers
04:40
Nonlinear data transformations
13:46
An outlier lecture on personal accountability
03:03

Probability theory

24 lectures
What is probability?
12:17
Probability vs. proportion
09:25
Computing probabilities
10:28
Code: compute probabilities
14:34
Probability and odds
04:58
"Unsupervised learning": probabilities of odds-space
02:30
Probability mass vs. density
13:06
Code: compute probability mass functions
11:37
Cumulative distribution functions
13:46
Code: cdfs and pdfs
10:10
"Unsupervised learning": cdf's for various distributions
02:25
Creating sample estimate distributions
18:31
Monte Carlo sampling
02:53
Sampling variability, noise, and other annoyances
08:41
Code: sampling variability
26:15
Expected value
10:09
Conditional probability
12:45
Code: conditional probabilities
20:12
Tree diagrams for conditional probabilities
06:24
The Law of Large Numbers
09:50
Code: Law of Large Numbers in action
19:23
The Central Limit Theorem
10:34
Code: the CLT in action
16:21
"Unsupervised learning": Averaging pairs of numbers
02:09

Hypothesis testing

12 lectures
IVs, DVs, models, and other stats lingo
16:45
What is an hypothesis and how do you specify one?
15:08
Sample distributions under null and alternative hypotheses
10:38
P-values: definition, tails, and misinterpretations
18:54
P-z combinations that you should memorize
06:51
Degrees of freedom
12:21
Type 1 and Type 2 errors
14:18
Parametric vs. non-parametric tests
09:12
Multiple comparisons and Bonferroni correction
12:36
Statistical vs. theoretical vs. clinical significance
06:51
Cross-validation
11:30
Statistical significance vs. classification accuracy
11:12

The t-test family

14 lectures
Purpose and interpretation of the t-test
13:13
One-sample t-test
08:08
Code: One-sample t-test
20:46
"Unsupervised learning": The role of variance
02:50
Two-samples t-test
13:06
Code: Two-samples t-test
22:09
"Unsupervised learning": Importance of N for t-test
04:45
Wilcoxon signed-rank (nonparametric t-test)
07:35
Code: Signed-rank test
18:33
Mann-Whitney U test (nonparametric t-test)
06:03
Code: Mann-Whitney U test
05:21
Permutation testing for t-test significance
11:25
Code: permutation testing
25:26
"Unsupervised learning": How many permutations?
05:21

Confidence intervals on parameters

7 lectures
What are confidence intervals and why do we need them?
08:45
Computing confidence intervals via formula
06:43
Code: compute confidence intervals by formula
17:11
Confidence intervals via bootstrapping (resampling)
08:58
Code: bootstrapping confidence intervals
14:32
"Unsupervised learning:" Confidence intervals for variance
01:25
Misconceptions about confidence intervals
06:22

Correlation

22 lectures
Motivation and description of correlation
18:19
Covariance and correlation: formulas
14:09
Code: correlation coefficient
27:49
Code: Simulate data with specified correlation
13:50
Correlation matrix
09:34
Code: correlation matrix
20:25
"Unsupervised learning": average correlation matrices
02:51
"Unsupervised learning": correlation to covariance matrix
04:16
Partial correlation
10:23
Code: partial correlation
19:55
The problem with Pearson
06:43
Nonparametric correlation: Spearman rank
07:17
Fisher-Z transformation for correlations
06:54
Code: Spearman correlation and Fisher-Z
07:40
"Unsupervised learning": Spearman correlation
01:28
"Unsupervised learning": confidence interval on correlation
02:25
Kendall's correlation for ordinal data
10:32
Code: Kendall correlation
18:09
"Unsupervised learning": Does Kendall vs. Pearson matter?
02:38
The subgroups correlation paradox
04:41
Cosine similarity
05:26
Code: Cosine similarity vs. Pearson correlation
21:19

Analysis of Variance (ANOVA)

11 lectures
ANOVA intro, part1
17:51
ANOVA intro, part 2
19:56
Sum of squares
18:13
The F-test and the ANOVA table
07:28
The omnibus F-test and post-hoc comparisons
12:38
The two-way ANOVA
20:54
One-way ANOVA example
13:24
Code: One-way ANOVA (independent samples)
16:34
Code: One-way repeated-measures ANOVA
12:17
Two-way ANOVA example
11:17
Code: Two-way mixed ANOVA
14:28

Regression

18 lectures
Introduction to GLM / regression
19:53
Least-squares solution to the GLM
09:46
Evaluating regression models: R2 and F
16:17
Simple regression
13:17
Code: simple regression
09:12
"Unsupervised learning": Compute R2 and F
01:05
Multiple regression
13:01
Standardizing regression coefficients
12:18
Code: Multiple regression
18:42
Polynomial regression models
08:56
Code: polynomial modeling
15:46
"Unsupervised learning": Polynomial design matrix
00:51
Logistic regression
16:55
Code: Logistic regression
09:27
Under- and over-fitting
16:18
"Unsupervised learning": Overfit data
01:56
Comparing "nested" models
12:25
What to do about missing data
06:36

Statistical power and sample sizes

3 lectures
What is statistical power and why is it important?
10:05
Estimating statistical power and sample size
11:22
Compute power and sample size using G*Power
04:10

Clustering and dimension-reduction

14 lectures
K-means clustering
13:46
Code: k-means clustering
22:11
"Unsupervised learning:" K-means and normalization
01:53
"Unsupervised learning:" K-means on a Gauss blur
01:26
Clustering via dbscan
14:18
Code: dbscan
33:03
"Unsupervised learning": dbscan vs. k-means
03:04
K-nearest neighbor classification
06:20
Code: KNN
11:48
Principal components analysis (PCA)
16:34
Code: PCA
17:31
"Unsupervised learning:" K-means on PC data
01:35
Independent components analysis (ICA)
12:45
Code: ICA
12:40

Signal detection theory

9 lectures
The two perspectives of the world
05:29
d-prime
12:30
Code: d-prime
15:02
Response bias
08:02
Code: Response bias
04:15
F-score
22:01
Receiver operating characteristics (ROC)
07:34
Code: ROC curves
08:10
"Unsupervised learning": Make this plot look nicer!
01:33

A real-world data journey

10 lectures
Note about the code for this section
00:05
Introduction
04:22
MATLAB: Import and clean the marriage data
16:36
MATLAB: Import the divorce data
08:17
MATLAB: More data visualizations
06:32
MATLAB: Inferential statistics
10:45
Python: Import and clean the marriage data
20:37
Python: Import the divorce data
12:51
Python: Inferential statistics
11:24
Take-home messages
05:43

Bonus section

2 lectures
About deep learning
00:23
Bonus content
01:03

Đánh giá của học viên

Chưa có đánh giá
Course Rating
5
0%
4
0%
3
0%
2
0%
1
0%

Bình luận khách hàng

Viết Bình Luận

Bạn đánh giá khoá học này thế nào?

image

Đăng ký get khoá học Udemy - Unica - Gitiho giá chỉ 50k!

Get khoá học giá rẻ ngay trước khi bị fix.