# Quantile Quantile plots - GeeksforGeeks (2024)

- AI ML DS
- Data Science
- Data Analysis
- Data Visualization
- Machine Learning
- Deep Learning
- NLP
- Computer Vision
- Artificial Intelligence
- AI ML DS Interview Series
- AI ML DS Projects series
- Data Engineering
- Web Scrapping

Open In App

Last Updated : 11 Feb, 2024

Comments

Improve

The quantile-quantile( q-q plot) plot is a graphical method for determining if a dataset follows a certain probability distribution or whether two samples of data came from the same population or not. Q-Q plots are particularly useful for assessing whether a dataset is normally distributed or if it follows some other known distribution. They are commonly used in statistics, data analysis, and quality control to check assumptions and identify departures from expected distributions.

### Quantiles And Percentiles

Quantiles are points in a dataset that divide the data into intervals containing equal probabilities or proportions of the total distribution. They are often used to describe the spread or distribution of a dataset. The most common quantiles are:

- Median
: The median is the middle value of a dataset when it is ordered from smallest to largest. It divides the dataset into two equal halves.**(50th percentile)** - Quartiles
: Quartiles divide the dataset into four equal parts. The first quartile (Q1) is the value below which 25% of the data falls, the second quartile (Q2) is the median, and the third quartile (Q3) is the value below which 75% of the data falls.**(25th, 50th, and 75th percentiles)** - Percentiles: Percentiles are similar to quartiles but divide the dataset into 100 equal parts. For example, the 90th percentile is the value below which 90% of the data falls.

**Note:**

- A q-q plot is a plot of the quantiles of the first data set against the quantiles of the second data set.
- For reference purposes, a 45% line is also plotted;
if the samples are from the same population then the points are along this line.**For**

Normal Distribution:

The normal distribution (aka Gaussian distribution Bell curve) is a continuous probability distribution representing distribution obtained from the randomly generated real values.

.

Normal Distribution with Area Under CUrve

### How to Draw Q-Q plot?

To draw a Quantile-Quantile (Q-Q) plot, you can follow these steps:

: Gather the dataset for which you want to create the Q-Q plot. Ensure that the data are numerical and represent a random sample from the population of interest.**Collect the Data**: Arrange the data in either ascending or descending order. This step is essential for computing quantiles accurately.**Sort the Data**: Determine the theoretical distribution against which you want to compare your dataset. Common choices include the normal distribution, exponential distribution, or any other distribution that fits your data well.**Choose a Theoretical Distribution**: Compute the quantiles for the chosen theoretical distribution. For example, if you’re comparing against a normal distribution, you would use the inverse cumulative distribution function (CDF) of the normal distribution to find the expected quantiles.**Calculate Theoretical Quantiles**:**Plotting**- Plot the sorted dataset values on the x-axis.
- Plot the corresponding theoretical quantiles on the y-axis.
- Each data point (x, y) represents a pair of observed and expected values.
- Connect the data points to visually inspect the relationship between the dataset and the theoretical distribution.

Interpretation of Q-Q plot

- If the points on the plot fall approximately along a straight line, it suggests that your dataset follows the assumed distribution.
- Deviations from the straight line indicate departures from the assumed distribution, requiring further investigation.

### Exploring Distribution Similarity with Q-Q Plots

Exploring distribution similarity using Q-Q plots is a fundamental task in statistics. Comparing two datasets to determine if they originate from the same distribution is vital for various analytical purposes. When the assumption of a common distribution holds, merging datasets can improve parameter estimation accuracy, such as for location and scale. Q-Q plots, short for quantile-quantile plots, offer a visual method for assessing distribution similarity. In these plots, quantiles from one dataset are plotted against quantiles from another. If the points closely align along a diagonal line, it suggests similarity between the distributions. Deviations from this diagonal line indicate differences in distribution characteristics.

While tests like the chi-square and Kolmogorov-Smirnov tests can evaluate overall distribution differences, Q-Q plots provide a nuanced perspective by directly comparing quantiles. This enables analysts to discern specific differences, such as shifts in location or changes in scale, which may not be evident from formal statistical tests alone.

### Python Implementation Of Q-Q Plot

## Python3

`import`

`numpy as np`

`import`

`matplotlib.pyplot as plt`

`import`

`scipy.stats as stats`

`# Generate example data`

`np.random.seed(`

`0`

`)`

`data `

`=`

`np.random.normal(loc`

`=`

`0`

`, scale`

`=`

`1`

`, size`

`=`

`1000`

`)`

`# Create Q-Q plot`

`stats.probplot(data, dist`

`=`

`"norm"`

`, plot`

`=`

`plt)`

`plt.title(`

`'Normal Q-Q plot'`

`)`

`plt.xlabel(`

`'Theoretical quantiles'`

`)`

`plt.ylabel(`

`'Ordered Values'`

`)`

`plt.grid(`

`True`

`)`

`plt.show()`

**Output:**

Q-Q plot

Here, as the data points approximately follow a straight line in the Q-Q plot, it suggests that the dataset is consistent with the assumed theoretical distribution, which in this case we assumed to be the normal distribution.

### Advantages of Q-Q plot

: Q-Q plots can compare datasets of different sizes without**Flexible Comparison****requiring equal sample sizes.**: They are dimensionless, making them suitable for comparing datasets with**Dimensionless Analysis****different units or scales.**: Provides a clear visual representation of data distribution compared to a theoretical distribution.**Visual Interpretation**: Easily detects departures from assumed distributions, aiding in identifying data discrepancies.**Sensitive to Deviations**: Helps in assessing distributional assumptions, identifying outliers, and understanding data patterns.**Diagnostic Tool**

### Applications Of Quantile-Quantile Plot

The Quantile-Quantile plot is used for the following purpose:

: Q-Q plots are frequently used to visually inspect whether a dataset follows a specific probability distribution, such as the normal distribution. By comparing the quantiles of the observed data to the quantiles of the assumed distribution, deviations from the assumed distribution can be detected. This is crucial in many statistical analyses, where the validity of distributional assumptions impacts the accuracy of statistical inferences.**Assessing Distributional Assumptions**: Outliers are data points that deviate significantly from the rest of the dataset. Q-Q plots can help identify outliers by revealing data points that fall far from the expected pattern of the distribution. Outliers may appear as points that deviate from the expected straight line in the plot.**Detecting Outliers**: Q-Q plots can be used to compare two datasets to see if they come from the same distribution. This is achieved by plotting the quantiles of one dataset against the quantiles of another dataset. If the points fall approximately along a straight line, it suggests that the two datasets are drawn from the same distribution.**Comparing Distributions**: Q-Q plots are particularly useful for assessing the normality of a dataset. If the data points in the plot closely follow a straight line, it indicates that the dataset is approximately normally distributed. Deviations from the line suggest departures from normality, which may require further investigation or non-parametric statistical techniques.**Assessing Normality**: In fields like econometrics and machine learning, Q-Q plots are used to validate predictive models. By comparing the quantiles of observed responses with the quantiles predicted by a model, one can assess how well the model fits the data. Deviations from the expected pattern may indicate areas where the model needs improvement.**Model Validation**: Q-Q plots are employed in quality control processes to monitor the distribution of measured or observed values over time or across different batches. Departures from expected patterns in the plot may signal changes in the underlying processes, prompting further investigation.**Quality Control**

### Types of Q-Q plots

There are several types of Q-Q plots commonly used in statistics and data analysis, each suited to different scenarios or purposes:

: A symmetric distribution where the Q-Q plot would show points approximately along a diagonal line if the data adheres to a normal distribution.**Normal Distribution**: A distribution where the Q-Q plot would display a pattern where the observed quantiles deviate from the straight line towards the upper end, indicating a longer tail on the right side.**Right-skewed Distribution**: A distribution where the Q-Q plot would exhibit a pattern where the observed quantiles deviate from the straight line towards the lower end, indicating a longer tail on the left side.**Left-skewed Distribution**: A distribution where the Q-Q plot would show observed quantiles clustered more tightly around the diagonal line compared to the theoretical quantiles, suggesting lower variance.**Under-dispersed Distribution**: A distribution where the Q-Q plot would display observed quantiles more spread out or deviating from the diagonal line, indicating higher variance or dispersion compared to the theoretical distribution.**Over-dispersed Distribution**

## Python3

`import`

`numpy as np`

`import`

`matplotlib.pyplot as plt`

`import`

`scipy.stats as stats`

`# Generate a random sample from a normal distribution`

`normal_data `

`=`

`np.random.normal(loc`

`=`

`0`

`, scale`

`=`

`1`

`, size`

`=`

`1000`

`)`

`# Generate a random sample from a right-skewed distribution (exponential distribution)`

`right_skewed_data `

`=`

`np.random.exponential(scale`

`=`

`1`

`, size`

`=`

`1000`

`)`

`# Generate a random sample from a left-skewed distribution (negative exponential distribution)`

`left_skewed_data `

`=`

`-`

`np.random.exponential(scale`

`=`

`1`

`, size`

`=`

`1000`

`)`

`# Generate a random sample from an under-dispersed distribution (truncated normal distribution)`

`under_dispersed_data `

`=`

`np.random.normal(loc`

`=`

`0`

`, scale`

`=`

`0.5`

`, size`

`=`

`1000`

`)`

`under_dispersed_data `

`=`

`under_dispersed_data[(under_dispersed_data > `

`-`

`1`

`) & (under_dispersed_data < `

`1`

`)] `

`# Truncate`

`# Generate a random sample from an over-dispersed distribution (mixture of normals)`

`over_dispersed_data `

`=`

`np.concatenate((np.random.normal(loc`

`=`

`-`

`2`

`, scale`

`=`

`1`

`, size`

`=`

`500`

`),`

`np.random.normal(loc`

`=`

`2`

`, scale`

`=`

`1`

`, size`

`=`

`500`

`)))`

`# Create Q-Q plots`

`plt.figure(figsize`

`=`

`(`

`15`

`, `

`10`

`))`

`plt.subplot(`

`2`

`, `

`3`

`, `

`1`

`)`

`stats.probplot(normal_data, dist`

`=`

`"norm"`

`, plot`

`=`

`plt)`

`plt.title(`

`'Q-Q Plot - Normal Distribution'`

`)`

`plt.subplot(`

`2`

`, `

`3`

`, `

`2`

`)`

`stats.probplot(right_skewed_data, dist`

`=`

`"expon"`

`, plot`

`=`

`plt)`

`plt.title(`

`'Q-Q Plot - Right-skewed Distribution'`

`)`

`plt.subplot(`

`2`

`, `

`3`

`, `

`3`

`)`

`stats.probplot(left_skewed_data, dist`

`=`

`"expon"`

`, plot`

`=`

`plt)`

`plt.title(`

`'Q-Q Plot - Left-skewed Distribution'`

`)`

`plt.subplot(`

`2`

`, `

`3`

`, `

`4`

`)`

`stats.probplot(under_dispersed_data, dist`

`=`

`"norm"`

`, plot`

`=`

`plt)`

`plt.title(`

`'Q-Q Plot - Under-dispersed Distribution'`

`)`

`plt.subplot(`

`2`

`, `

`3`

`, `

`5`

`)`

`stats.probplot(over_dispersed_data, dist`

`=`

`"norm"`

`, plot`

`=`

`plt)`

`plt.title(`

`'Q-Q Plot - Over-dispersed Distribution'`

`)`

`plt.tight_layout()`

`plt.show()`

**Output:**

Q-Q plot for different distributions

Previous Article

Box Plot

Next Article

### Please __Login__ to comment...

### Similar Reads

Article Tags :

- AI-ML-DS
- Machine Learning
- Data Visualization
- ML-EDA

Practice Tags :

- Machine Learning

Trending in News

- California Lawmakers Pass Bill to Limit AI Replicas
- Best 10 IPTV Service Providers in Germany
- Python 3.13 Releases | Enhanced REPL for Developers
- IPTV Anbieter in Deutschland - Top IPTV Anbieter Abonnements
- Content Improvement League 2024: From Good To A Great Article

We use cookies to ensure you have the best browsing experience on our website. By using our site, you acknowledge that you have read and understood our Cookie Policy & Privacy Policy

## References

- https://www.geeksforgeeks.org/quantile-quantile-plots/
- https://builtin.com/data-science/q-q-plot
- https://www.library.virginia.edu/data/articles/understanding-q-q-plots

Author: Fr. Dewey Fisher

Last Updated:

Views: 5669

Rating: 4.1 / 5 (62 voted)

Reviews: 93% of readers found this page helpful

Name: Fr. Dewey Fisher

Birthday: 1993-03-26

Address: 917 Hyun Views, Rogahnmouth, KY 91013-8827

Phone: +5938540192553

Job: Administration Developer

Hobby: Embroidery, Horseback riding, Juggling, Urban exploration, Skiing, Cycling, Handball

Introduction: My name is Fr. Dewey Fisher, I am a powerful, open, faithful, combative, spotless, faithful, fair person who loves writing and wants to share my knowledge and understanding with you.