assume series
Get quantiles for
OR Generate descriptive statistics. Descriptive
statistics include those that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding Analyzes both numeric and object series, as well as The percentiles to include in the output. All
should fall between 0 and 1. The default is A white list of data types to include in the result. Ignored for ‘all’ : All columns of the input will be included in the output. A list-like of dtypes : Limits the results to the provided data types. To limit the
result to numeric types submit None (default) : The result will include all numeric columns. A black list of data types to omit from the result. Ignored for
Whether to treat datetime dtypes as numeric. This affects statistics calculated for the column. For DataFrame input, this also controls whether datetime columns are included by default. New in version 1.1.0. ReturnsSeries or DataFrameSummary statistics of the Series or Dataframe provided. Notes For numeric data, the result’s index will include For object data (e.g. strings or timestamps), the result’s index will include If multiple object values have the highest
count, then the For mixed data types provided via a The include and exclude parameters can be used to limit which columns in a Examples Describing a numeric >>> s = pd.Series([1, 2, 3]) >>> s.describe() count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0 dtype: float64 Describing a categorical >>> s = pd.Series(['a', 'a', 'b', 'c']) >>> s.describe() count 4 unique 3 top a freq 2 dtype: object Describing a timestamp >>> s = pd.Series([ ... np.datetime64("2000-01-01"), ... np.datetime64("2010-01-01"), ... np.datetime64("2010-01-01") ... ]) >>> s.describe(datetime_is_numeric=True) count 3 mean 2006-09-01 08:00:00 min 2000-01-01 00:00:00 25% 2004-12-31 12:00:00 50% 2010-01-01 00:00:00 75% 2010-01-01 00:00:00 max 2010-01-01 00:00:00 dtype: object Describing a >>> df = pd.DataFrame({'categorical': pd.Categorical(['d','e','f']), ... 'numeric': [1, 2, 3], ... 'object': ['a', 'b', 'c'] ... }) >>> df.describe() numeric count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0 Describing all
columns of a >>> df.describe(include='all') categorical numeric object count 3 3.0 3 unique 3 NaN 3 top f NaN a freq 1 NaN 1 mean NaN 2.0 NaN std NaN 1.0 NaN min NaN 1.0 NaN 25% NaN 1.5 NaN 50% NaN 2.0 NaN 75% NaN 2.5 NaN max NaN 3.0 NaN Describing a column from a >>> df.numeric.describe() count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0 Name: numeric, dtype: float64 Including only numeric columns in a >>> df.describe(include=[np.number]) numeric count 3.0 mean 2.0 std 1.0 min 1.0 25% 1.5 50% 2.0 75% 2.5 max 3.0 Including only string columns in a >>> df.describe(include=[object]) object count 3 unique 3 top a freq 1 Including only categorical columns from a >>> df.describe(include=['category']) categorical count 3 unique 3 top d freq 1 Excluding numeric columns from a >>> df.describe(exclude=[np.number]) categorical object count 3 3 unique 3 3 top f a freq 1 1 Excluding
object columns from a >>> df.describe(exclude=[object]) categorical numeric count 3 3.0 unique 3 NaN top f NaN freq 1 NaN mean NaN 2.0 std NaN 1.0 min NaN 1.0 25% NaN 1.5 50% NaN 2.0 75% NaN 2.5 max NaN 3.0 |