3.27 statistics

8.11.1

3.27 statistics

Usage:

include statistics

import statistics as ...

The Pyret Statistics library. It consists of functions that calculate relevant statistical values of data sets, and functions for statistical modeling of numerical data.

Every function in this library is available on the statistics module object. For example, if you used import statistics as S, you would write S.median to access median below. If you used include, then you can refer to identifiers without writing S. as a prefix.

3.27.1 Basic Statistical Values

mean :: (l :: List<Number>) -> Number

Calculates the arithmetic mean, also known as the average, of the numbers in l. This is simply the sum of all the values in the list, divided by its length.

Examples:

check:
  mean([list: ]) raises "Empty List"
  mean([list: 1]) is 1
  mean([list: 2, 2, 4.5, 1.5, 1, 1]) is 2
end

median :: (l :: List<Number>) -> Number

Calculates the median of the numbers in l. This is the “middle-most” value in the list, if the values were sorted. If the list is of even length, returns the average of the two middle-most values.

Examples:

check:
  median([list: ]) raises "Empty List"
  median([list: 2]) is 2
  median([list: -1, 0, 1, 2, 5]) is 1
end

modes :: (l :: List<Number>) -> List<Number>

Calculates the modes of the numbers in l. These are the numbers that appear most often in the list. If no number appears more than once, returns the empty list. The modes will be returned in sorted order.

Computing the mode of a list of values is unambiguous when there is a unique “most common” element. Computer scientists and mathematicians agree that when two values are equally “most common”, they are both considered modes of the list. The natural generalization of this is that when all values occur equally often, they are all modes of the list. However, many high-school textbooks assert that when no element appears more than once, no element should be considered a mode. To avoid confusing high-school students, we adopt the definition they will find in their textbooks.

Examples:

check:
  modes([list: ]) is [list: ]
  modes([list: 1, 2, 3, 4]) is [list: ]
  modes([list: 1, 2, 3, 1, 4]) is [list: 1]
  modes([list: 1, 2, 1, 2, 2, 1]) is [list: 1, 2]
  modes([list: 1, 2, 2, 1, 2, 1]) is [list: 1, 2]
end

has-mode :: (l :: List<Number>) -> Boolean

Determines if a list of numbers has any modes, i.e., any repeated values.

Examples:

check:
  has-mode([list: ]) is false
  has-mode([list: 1, 2, 3, 4]) is false
  has-mode([list: 1, 2, 2, 1, 2, 2]) is true
  has-mode([list: 1, 2, 3, 2]) is true
end

mode-smallest :: (l :: List<Number>) -> Number

Returns the smallest mode of a list of numbers, if any is present.

Examples:

check:
  mode-smallest([list: ]) raises "empty"
  mode-smallest([list: 1]) raises "no duplicate values"
  mode-smallest([list: 1, 2, 3, 4, 5]) raises "no duplicate values"
  mode-smallest([list: 1, 1, 2]) is 1
  mode-smallest([list: 1, 2, 1, 2]) is 1
end

mode-largest :: (l :: List<Number>) -> Number

Returns the largest mode of a list of numbers, if any is present.

Examples:

check:
  mode-smallest([list: ]) raises "empty"
  mode-smallest([list: 1]) raises "no duplicate values"
  mode-smallest([list: 1, 2, 3, 4, 5]) raises "no duplicate values"
  mode-smallest([list: 1, 1, 2]) is 1
  mode-smallest([list: 1, 2, 1, 2]) is 2
end

mode-any :: (l :: List<Number>) -> Number

Returns an arbitrary mode of a list of numbers, if any is present.

Examples:

check:
  mode-any([list: ]) raises "empty"
  mode-any([list: 1]) raises "no duplicate values"
  mode-any([list: 1, 2, 3, 4, 5]) raises "no duplicate values"
  mode-any([list: 1, 1, 2]) is 1
  mode-any([list: 1, 2, 1, 2]) satisfies lam(m): (m == 1) or (m == 2) end
end

stdev :: (l :: List<Number>) -> Number

Gives the population or uncorrected sample standard deviation of the data set represented by numbers in l.

Examples:

check:
  stdev([list: ]) raises "list is empty"
  stdev([list: 2]) is 0
  stdev([list: 2, 4, 4, 4, 5, 5, 7, 9]) is 2
end

stdev-sample :: (l :: List<Number>) -> Number

Gives the corrected sample standard deviation of the data set represented by numbers in l.

Examples:

check:
  stdev-sample([list: ]) raises "list is empty"
  stdev-sample([list: 2]) raises "division by zero"
  stdev-sample([list: 2, 4, 4, 4, 5, 5, 7, 9]) is-roughly 2.1380899
end

3.27.2 Statistical Models

Pyret currently supports two functions for working with simple linear-regression models. Further support will be added over time.

linear-regression :: (X :: List<Number>, Y :: List<Number>) -> (Number -> Number)

Calculates a linear regression to model a simple independent -> dependent variable relationship, using ordinary least squares regression. Its result is a predictor function to predict a y-value given an x-value.

Examples:

check:
  predictor = linear-regression([list: 0, 1, 2, 3], [list: 3, 2, 1, 0])
  predictor(1) is-roughly 2
  predictor(1.5) is-roughly 1.5
  predictor(1000) is-roughly -997
end

r-squared :: (
X :: List<Number>,
Y :: List<Number>,
f :: (Number -> Number)
)
-> Number

Calculates the coefficient of determination for a simple linear model, which measures how well the predictor function (from linear-regression) matches the given actual function (the argument f).

Examples:

PI = ~3.1415926535

fun f-good(x): 3 - x end
fun f-poor(x): 3 * num-cos((x * PI) / 6) end
fun f-bad(x): 3 end

xs = [list: 0, 1, 2, 3]
ys = [list: 3, 2, 1, 0]
check:
  r-squared(xs, ys, f-good) is-roughly 1
  r-squared(xs, ys, f-poor) is-roughly 0.87846096
  r-squared(xs, ys, f-bad)  is-roughly -1.8
end

contents ← prev up next →

1	Getting Started
2	Language Concepts
3	Builtins and Libraries
4	Pyret Style Guide
5	Internals
6	Glossary

3.1	Global Utilities
3.2	Numbers
3.3	Strings
3.4	Booleans
3.5	Raw Array
3.6	Tables
3.7	lists
3.8	sets
3.9	arrays
3.10	string-dict
3.11	option
3.12	pick
3.13	either
3.14	srcloc
3.15	pprint
3.16	s-exp
3.17	s-exp-structs
3.18	color
3.19	image-structs
3.20	The image libraries
3.21	world
3.22	gdrive-sheets
3.23	data-source
3.24	reactors
3.25	chart
3.26	plot
3.27	statistics
3.28	math