Introduction

At SizeBuddy, in 2021 we started with the challenge of "How can we predict clothing sizes as accurately as possible to prevent returns?" The goal was quickly set; an accurate tool that focuses on usability for both the consumer and the clothing seller.

Of course, a method was needed in the back-end to make predictions. In this article, we describe why we chose not to use height, age, and weight in predictions, but instead relied on a well-fitting piece of clothing.

**The idea behind length, age and weight**

The main reason for choosing a solution that does not use height, age, and weight in predictions is statistical. To make a long story short, predicting based on height, age, and weight is not accurate enough to give good size advice. In this article, we will explain why.

In principle, predicting body measurements based on height, age, and weight would be a nice idea. You can imagine that having a formula in which personal data is entered and expected body measurements come out would seem like an attractive option. At SizeBuddy, we initially thought so too, so we set out to create and test such a formula. As three founders with university degrees in quantitative studies, this task was right up our alley.

The idea was to create a formula based on a lot of data about bodies that could predict, for example, someone's chest or hip circumference. This was easily done by using regression analyses.

**Explaining regression analyses:**

In a regression analysis, software measures the influence of different factors on an output factor. This may sound difficult, but it is easier to understand with a practical example.

Suppose you grow a lot of plants (of the same species) at home and are curious about the influence of sunlight and water on their growth. You record the growth of the plants per week (in centimeters), the number of hours of sunlight, and how much water (in liters) you give the plant. If you analyze this data in a regression analysis, the factors will show how much influence water and sunlight have on the growth of your plants. In this case, for example, a factor of 0.1 on water and 2 on sunlight would mean the following: if you give the plant 1 liter of water extra per week, it will grow an average of 0.1 centimeters more in the week. For sunlight, it would apply that if the plant gets 1 extra hour of sunlight per week, it will grow an extra 2 centimeters.

The factors, together with an intercept (the starting position of a formula, the intersection with the y-axis), form the most accurate formula to predict plant growth in this case. These factors are only accurate when used together in one formula.

To perform a regression analysis, enough data must first be available. The more data points there are, the more accurate the predictions will be. After some time, we found a database at SizeBuddy that stored physical data (height, age, weight, gender, different body measurements). This dataset is the one that the US government uses for research and can, therefore, be considered accurate. Even after various tests to test the validity of this dataset, it turned out to be very accurate.

After performing the regression analyses, formulas were obtained to predict various body measurements (which we could convert to correct sizes at SizeBuddy). When testing these formulas, it quickly became clear that this method could not provide accurate predictions.

**Formula Ineffectiveness:**

No formula for predicting anything is perfect, so to find out how accurate this formula is, we had the formula predict body measurements and compared them to the actual body measurements of these (5000) people.

The graph below shows the differences between the predicted waist circumference and the actual waist circumference.

The average difference in waist circumference between two sizes (such as M and L) is 5 centimetres. The points between the yellow lines illustrate people who have predicted their waist circumference in line with their size. All points above and below this plane are people whose waist circumference deviates so much that they will be recommended the wrong size based on waist circumference. For all these people, this formula does not work!

At SizeBuddy, it quickly became clear that such a formula does not provide accurate predictions for everyone. Many people will get a wrong prediction because the formula is based on averages and forgets the exceptions to the rule. It's also logical when you think about it. If we put two men who are 1.96 and 120 kilos next to each other, the formula would predict the same sizes for both men. However, if one man is a bodybuilder and the other is obese, it's logical that these men don't have the same clothing size.

**The SizeBuddy solution:**

When it became clear that working based on formulas to predict body sizes doesn't work, we at SizeBuddy started looking for other solutions. After a lot of brainstorming, we came up with the idea: what if we incorporate information that everyone already has, a clothing item that already fits well, to make an accurate prediction for all other brands? That's how SizeBuddy was born. When predictions were analyzed in this way, hardly any incorrect predictions were found. How easy is it to fill in a clothing item that fits well? Moreover, at SizeBuddy in 2022, we found a way to apply the same principle to shoes, with success!

Since then, we have developed SizeBuddy into the most accurate and user-friendly tool for clothing and shoe webshops on the market. Today, we make hundreds of consumers happy with accurate predictions, and many entrepreneurs and clothing webshops with fewer returns and a higher conversion rate. Would we also be able to make your clothing webshop more efficient?

## ×ª×’×•×‘×•×ª