Chart Gallery - Distribution

Histogram icon

Histogram

Description

A histogram displays the distribution of data over a continuous interval or specific time period. The height of each bar in a histogram indicates the frequency of data points within the interval/bin. It’s a great tool to identify where values are concentrated, or if there are extreme values or gaps in the dataset.

When to use

Histograms are good for showing the general distribution of dataset variables. You can see roughly where the peaks of the distribution are, whether the distribution is skewed or symmetric and if there are any outliers.

Dos and donts

Histogram dos and donts 1

Always start at a zero baseline.

Histogram dos and donts 2

Use an appropriate number of bins and interpretable bin boundaries.

Histogram dos and donts 3

Don’t use unequal bin sizes.

Histogram dos and donts 4

Don’t use histograms for non-continuous data (use bar/column chart instead).

Tools available

MS Office Power BI Illustrator R D3.js Python

Population pyramid icon

Population pyramid

Description

A population pyramid consists of two histograms, one for each gender (conventionally, males on the left and females on the right) where the population numbers are shown horizontally (X-axis) and the age vertically (Y-axis). The values can be displayed either as a percentage of the total population or as a raw number.

When to use

Population pyramids are the most effective visualization to analyze changes or differences in population groups. From the population pyramid, information about the population broken down by age and sex can be identified, which can also lead to other aspects of the population.

Dos and donts

Population pyramid dos and donts 1

Use consistent colour to represent the same gender.

Population pyramid dos and donts 2

Sort value in descending order by age group.

Tools available

MS Office Power BI Illustrator R D3.js Python

Boxplot icon

Boxplot

Description

The boxplot (also known as a box and whisker plot) uses boxes and lines to show the distributions of one or more groups of numeric data based on a 5-point summary of data points: the upper extreme (“maximum”), upper quartile (Q3), median, lower quartile(Q1), and lower extreme (minimum) values. Through these five values, the boxplot provides information regarding the variability and skewness of the distribution. Box limits indicate the range of the central 50% of the data, with a central line marking the median value. Lines extend from each box to capture the range of the remaining data, with dots placed past the line edges to indicate outliers.

When to use

Boxplots are used to show distributions of numeric data values, especially when you want to compare them between multiple groups. They are built to provide high-level information at a glance, offering general information about a dataset’s symmetry, skew, variance, and outliers. They allow for a comparison of data from different categories for easier, more effective decision-making.

Dos and donts

Boxplot dos and donts 1

Order the groups when they don’t have inherent order (for example, sorting groups by median).

Boxplot dos and donts 2

Use horizontal boxplot when there are a lot of groups to plot, or if the group names are long.

Boxplot dos and donts 3

Avoid using boxplots if you want to show the distribution in only one group. A histogram is recommended in this case.

Tools available

MS Office Power BI R