Introduction
Inventory management is at the heart of the supply chain optimization in fashion retail stores. Not having enough products in stock leaves the customers unhappy while having too many leads to higher expenses and unsold items. Determining the sold-out date of a stores’ articles allows an efficient use of its inventory. However, the fashion sector is known for its short product cycles which make classical dynamic pricing strategies inefficient. Consequently, automatic learners have been recently developed and used to estimate this key sold-out date value.
Inventory Costs in Numbers
Inventory costs may seem fixed and determined by the price you pay for the products. However, they also represent the inventory financing charges and the opportunity losses due to a bad management. In 2015, Target Canada Co closed all of its 133 stores after a disastrous loss of $5.4 billion in inventory management. Even if this may be an exception, according to Brian Sutter the director of marketing at Wasp Barcode Technologies, the inventory-carrying costs can rise as high as 40% of the company’s average inventory investments. He insists on the availability of many possibilities to reduce these costs.
One of the many solutions for the fashion retail sector is an accurate prediction of the sold-out date. This avoids overstocked and out of stocks stores resulting in a lower opportunity loss and less inventory-related costs.
Dataset
To tackle this problem, we use the historical sales from October 1, 2017, to January 31, 2018, of 12’824 sporting products from an online retailer in Germany. All these items have been sold-out in February.
Note that using the same approach we could do the same with the data from the last 4 months and predict the sales of the products in the month to come.
In the following table, we present the original features of the data used:
Name | Description |
pid | Product number |
size | Size |
color | Color |
brand | Brand |
rrp | Recommended retail price |
mainCategory | Main category of the item |
category | Category of the item |
subCategory | Subcategory of the item |
stock | Initial stock in February 1, 2018 |
releaseDate | The release date of the item |
A product is uniquely identified with its pid and size. Moreover, we have access to the historical price of each product since their release data and the historical volume of each product per day.
As the number of features is low and uninformative, we need to obtain more explanatory variables through a feature engineering procedure.
Feature engineering
First, as our datasets are split, we decide to merge them by having the tuplet (pid, size, day) as the unique index for each observation in our merged data. We thus obtain more than a million rows.
Thanks to our knowledge of the retailer’s information, we can also extract more information such as the dates of heavy advertisement campaigns for specific products. Moreover, we added discrete features in order to take into account other factors such as the weather or holidays.
Lastly, we computed a measure of popularity for the existing features such as size, color, and brand based on the sales in the training set. These variables are called likelihood features and should be handled with care to avoid overfitting. Indeed, these variables can induce a bias that could make the model too specific to this dataset and reduce its performance on new data.
The following table summarizes some of our findings.
New feature | Description |
Price_RRP | Price over RRP (Recommended Retail Price) |
Price_RRP_2 | Square of price over RRP |
Release_incentive | A decaying variable giving a weight for new products |
Weekday | Categorical variable, 1 if it is a weekday |
Ferie | Score variable for the holidays in Germany |
Consumer_price_index | Consumer price index in the fashion sector |
Popularity_brand | Popularity of a brand based on the historical sales |
Weather | Score variable for the weather conditions in Germany |
Pre_christmas | Indicator variable for the pre-christmas period |
Bundes_Liga | Decaying variable giving a weight to the day around a bundesliga game |
At the end of our preprocessing procedure, we have 67 features at our disposal to train our model.
Models
As a first step, we will present the innovative modeling scheme we have used. First, we randomly split our training set into 3 folds to obtain 3 train folds (each containing 2 thirds of our data) and 3 test folds (each containing the last third of the data). We train 3 different regressors on each train fold and validate them using the associated test fold. Consequently, we obtain an unbiased estimate of the true test error.
This enables us to perform a grid search and evaluate our regressors with different hyperparameters and find the optimal ones. Then, we add the predicted values for each method in our training set to form what is called the regression columns. As the last step, using the same split, we train our data using the best regressor of the first part and we estimate the test error with test fold error. We tried multiple regressors to find the most suitable one.
First, we trained an extreme learning machine (ELM) invented by Guan-Bin Huang. It is an extremely fast single-Layer feedforward neural network. As its name states, it contains only one layer of neurons and assigns Gaussian random weights from the features to the input layer. After the grid search, we have selected 1’000 neurons with a sigmoid activation function and another 1’000 for the Gaussian radial basis function.
Second, we used xgboost invented by Tianqi Chen which relies on the Gradient Boosting technique. The max depth of the trees is 12, the subsample ratio of the selected feature is 0.8, the step size shrinkage used to avoid overfitting is 0.04 and the number of boosted trees to be estimated is 1’500. Due to these hyperparameters, the method is very slow. However, it holds the best results on the test folds.
Lastly, We regressed our data using the Random Forests algorithm. As it is the slowest algorithm, we used the default hyperparameters of the scikit-learn library.
Finally, as our training set contains historical sales per day for each item, we will predict the daily sales and accumulate them until we reach the limit of the initial stock (the sold-out day). Consequently, all our test error estimation is based on the sales rather than sold-out date.
Results
Applying this modelling scheme to our augmented data (with the engineered features), we train our regressors and we record the average Mean Squared Error in the sales from the test folds. It is the mean squared difference between the real sales and the predicted ones per item and day. The results are summarized in the graph.
We see that the MSE is extremely low for a sale forecast and that our methods are performing well. We also note the high improvement of 28% in MSE by adding cleverly engineered features.
Conclusion
We managed to build a modelling scheme that enables us to accurately predict the sales of more than 12’000 products during a month and by extension their approximate sold-out date. Indeed, the best model allows predicting on average with an error of 0.7 the number of sales for a given article and date.
With this model, a fashion retail company can more precisely manage its inventory and thus maximize its profit by saving millions of dollars.