Article for forecasting perishable goods

In conclusion it is important to suggest economic reasons behind this methodology of using AI.

Key words: LSTM, prediction, perishable goods, time series, logistics, deliveries,time series

Please look for references extracted from these sources:

European Journal of operation research; Journal of operation management; Sci Hub,Google scholar; IEEE explore.

and try with GRU method instead of LSTM if you see it gets better result

Uvod: Supply chain management in the retail

Industry- state of the art

Next, we present a brief

state-of-the-art of both systems as it is relevant in the

context of this paper. In subsequent sections, we

present approaches combining them.

Problem, rjesenje, slike… https://sci-hub.se/10.1016/j.asoc.2005.06.001

‘bullwhip effect’’

In case of perishable products, with short life cycles, having appropriate (short-term) forecasting is extremely critical. Da Veiga et al. [73] forecasted the demand for a group of perishable dairy products using Autoregressive Integrated Moving Average (ARIMA) and Holt-Winters (HW) models. The results were compared based on mean absolute percentage error (MAPE) and Theil inequality index (U-Theil). The HW model showed a better goodness-of-fit based on both performance metrics.

As perishable products must be sold due to a very short preservation time, demand forecasting for this type of products has drawn increasing attention. Yang and Sutrisno [93] applied and compared regression analysis and neural network techniques to derive demand forecasts for perishable goods. They concluded that accurate daily forecasts are achievable with knowledge of sales numbers in the first few hours of the day using either of the above methods.

Metoda:

In order to obtain the optimal results for this study, collaboration with Croatia’s largest[Naomi-Fri1]  supermarket chain seemed to be a reasonable foundation for analyzing its logistics operations and real time data of short shelf-life product deliveries. The current ordering method of the stores is based on the past experiences, taking into consideration the spoilage, stock-out rates and holiday seasons.

The biggest challenge here is that, delivery points depend directly on end customers and are conditioned by customer behavior and psychology. The frequency of orders, the type and quantity of goods delivered to a particular place from an assigned warehouse, which obviously depend on seasonality or pre-holiday times. Estimated. There are also unpredictable events, such as, the most recent, Corona pandemic, where food chains need to act fast and adjust to the new changes caused by it. This is the kind of situation where past experience will not be helpful. I However..hv

This problem can be avoidable by implementing artificial intelligence, thus demand of perishable goods can be predicted a few days in advance. esides those traditional methods, data-driven approaches like artificial neural networks (ANNs) aWith this in mind, the use of the LSTM model is proposed, due to the simplicity and popularity of use, for which the parameters have been carefully selected to achieve the highest possible prediction efficiency.**GRU and other give similar results?LSTM better using the Tensorflow and Keras libraries in the Python programming language.

for the application of predicting the user temperature setpoints, the LSTM network is chosen because of its robustness, better resolution of the vanishing and exploding gradient, and the ability to learn larger intervals between events crucial to the neural network as opposed to classical RNNs[Naomi-Fri2] .

4.1 Long short-term memory (LSTM) networks

Jingyi Du [9] stated that LSTM (neural networks) has gained a great attention in deep learning, especially time series. LSTM network is used to predict the apple stock price by using multiple feature input and single feature input variable to verify the prediction on stocks time series. Results showed positive when used multi-feature as an inputThe LSTM neural network, a specialized recurrent neural network (RNN) developed by Hochreiter and Schmidhuber (1997), is capable of learning the long-term dependencies by using the backpropagation method and avoiding noise at the same time by filtering out the gradients (Colah, 2015). The flowchart of the LSTM methodology is shown in Figure 5. RNNs consist of a single tanh layer that functions as a repeating-module chain for cell states. In an LSTM network, there are four such neural network layers interacting to add more information to the process regulated by the three “gate” structures along with the input cell state. The gates consist of sigmoid neutral network layers performing pointwise multiplication in the process. The outputs of the sigmoid layer vary between 0 and 1. There is also a sigmoid layer known as “forget gate layer” which skips/eliminates the white noise from the process when the gate output is 0 (Colah, 2015). The output of the cell states is obtained through a sigmoid layer tanh or rectified linear unit (reLU) (Li and Cao, 2018) and dense to obtain the values between –1 and 1 after the multiplication with the sigmoid gate’s output (https://keras.io/layers/recurrent/). Li and Cao (2018) stated that LSTMs are better at prediction for non-linear models than ARIMA and backpropagation neural network. The train–test–split CV was used with the LSTM networks for training, validation and drop-out splits to obtain the best performance results.

We based this method on transportation , Carrier providers could increase their service capacity upon request to meet our demands if we could give an early demand. Which would later on be better for planning, bin packing and more efficient delivery routes, such as grouping.

For this experiment, it is used only 100 points located in the capital of the country. The following characteristics were considered while choosing the points:  active, located in the busier parts of the city of Zagreb, distance (were sufficiently distant from each other to cover the relevant area of ​​Zagreb.) the size (in terms of the number of orders where size would be a relevant variable to correlate the periodicity of deliveries.) **A 10-point sequential length was taken as the initial amount of data to investigate the LSTM modelling and get the approximate and quicker results. From that, we took 100 points, knowing that there are about 130 points located in the capital region. Some points are not considered since, they have periods of time where there are no existing data of the deliveries, assuming that at that period of time they were closed. These outliners is making the prediction accuracy lower, so it is taken out for that reason. To get better results just start from that period wen changes were made. Pitaj Nikicu**

*Some points on the other hand give huge errors due to bullwhip effect, since the demand changes at certain point of the period. Therefore, it is recommend to use AI to avoid mitigate sudden changes in the number of orders.

Data analysis

From the available historical data given, there are over 1000 different delivery points of the: warehouses, shops, kiosks and generally different shops, which are defined by catalog number, address and spatial coordinates.

The data set consists of two parameters: Unloading Time and Client ID for the following representative points: in the period from 1.1.2018. until 31.12.2019. year.
The Customer ID parameter indicates the store that ordered the delivery, ie the destination where the order should be delivered, and the Unloading Time indicates the date and time when the order was unloaded.  These parameters will be used as inputs for the LSTM network (Figure 1.1.2Figure 1.1.2).

Before sending data to the network, it is necessary to process it in a format that is suitable for further use.

The data for 22 weeks was then arranged in a day-wise format to reduce the impact of seasonality on the data;  Step 3: The input data are pre-processed by normalizing using the min–max scalar; m Scikit learn

Downsampling is to resample a time-series dataset to a wider time frame. For example, from minutes to hours, from days to years. The result will have a reduced number of rows and values can be aggregated with mean(), min(), max(), sum() etc.  may want to summarize hourly data to provide a daily maximum value. There is more sense to take daily frequency, not less than that. ariables that are measured at different scales do not contribute equally to the model fitting & model learned function and might end up creating a bias. Thus, to deal with this potential problem feature-wise normalization such as MinMax Scaling is usually used prior to model fitting. Model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on training and test data.( 65%-20%-15%.) to evaluate the model

Thus we used MinMax Scaling is that it is highly influenced by the maximum and minimum values in our data so if our data contains outliers it is going to be biased.is not necessary in this case because all values, ie time windows, are transferred to the numerical value of zero and one, and all values ​​that have more than one delivery in that time window are transferred back to the numerical value of the unit 1.

Since the data are of two-dimensional shape, it was necessary to make a transformation into a three-dimensional shape to fit the structure of the LSTM model. To model the input and output parameters, the data are typically transformed into a resolution of 15 minutes, and 96 such 15-min time intervals are expected 24 hours in advance, which defines the dimension of the output prediction vector. Due to the frequency of deliveries defined at the one-day level, this parameter is also observed at the one-day level and a prediction of seven days in advance is selected. In case of need for uniformity, the 1-day observation level can be generalized to a 15-minute observation level by simply extending the vector with the same values ​​as needed.

**The coded python program for stacked LSTM was used with the epoch size = 300, batch size = 20, window size = 10, test–train–split value = 0.8, validation split = 0.1, dropout-keep probability value = 0.2, hidden units: 100 in two-stacked LSTM layers with test–train–split CV to yield performance values

The main objective during training the algorithm is to minimize the loss between the actual output from the predicted output from the given training data. The training is started with the arbitrarily set weights, then weights are updated incrementally as we move closer and closer to the loss. The size of the steps to reach the loss depends on learning rate. After testing and tuning the parameters the learning rate of 0.001 is set to get the optimum loss. Adam optimizer which is a variant of Stochastic gradient descent (sgd) is used

Unloading time will be used to predict other internal and external conditions. It has also been proven that, by inserting additional, but significant, parameters such as working and non-working day (“0” for non-working day, and “1” for working day), day of the week (“0” for Monday, “1” for Tuesday, “2” for Wednesday, etc.) and months (“1” for the first month, “2” for the second, etc.), there is a possibility of improving the prediction by a negligible percentage, as in this case, the prediction has improved significantly, by approximately 9.85 % according to trained data. The figure shows the specified parameters.

Change values, 0,1 ,2

(historical data is first processed and prepared for the learning)

After processing, the network is structured, where the previously mentioned parameters are inserted into the network. Once the LSTM network (Figure) of the task has been learned, two layers of the network are sufficient. The network also uses the Dropout function to avoid overtraining.

Slika 1.1.4 Prikaz hiperparametra u neuronskoj mreži.

For the ten active points, each delivery point was set to have its own model. In the later stages of the project, an ensemble of methods will be used due to the large number of delivery points so that individual models are avoided.

The model focuses on the dynamics of active points (active / inactive level) although the network predicts the exact number of orders. Which are then processed in such a way that the point is inactive if there are no deliveries that day, otherwise active. If necessary, it is possible to achieve additional precision in predicting the number of orders by adding additional parameters as input to the network, but in this case it is not necessary. Matplot lib and sns are used for plotting graphs, python anaconda environment and jupyter notebook were used

Organization has a total of 121 distributed locations and picturing all of them is not necessary to understand the nature of models so, I selected two different destination locations where the volume is highly distributed.

Grafovi-Nikica**

Application

Description automatically generated with low confidence

Slika 1.1.5 Prikaz predikcije dinamičnosti aktivne točke P0001

The figure shows the predictions of the number of orders for delivery point P0001 based on the processed historical data. The periodicity of orders is observed, ie the dynamics of the movement of the number of orders for a certain delivery point over a period of two months. For example, for item P0001 it can be observed that there is no delivery during the weekend, and most deliveries are during the middle of the week.

Chart, waterfall chart

Description automatically generated

Slika 1.1.6 Predikcija dostave za dostavnu točku P0001.

The next few graphs (Figure 1.1.7, Figure 1.1.8 and Figure 1.1.9) show a comparison between actual (historical) and projected data on a one-day basis. Historical orders are shown in a blue line, and dots indicate prediction results (red) and actual data (blue) from the validation set.Chart

Description automatically generated

Slika 1.1.7 Predikcija dostave za dostavnu točku P0001

Slika 1.1.8 Predikcija dostave za dostavnu točku P1229

Chart

Description automatically generated

Slika 1.1.9 Predikcija dostave za dostavnu točku P0302

The graphs of the number of orders already show excellent prediction results which can confirm that the model is suitable for data learning. To calculate the error, the RMSE (Root mean squared error) method was used and a mean absolute percentage error of 0.26 was obtained, which corresponds to an accuracy of 80.4%. This confirms that the ability to predict the model is very satisfactory.

Based on literature study RMSE is a widely used performance metric compared to other metrics that are used in regression and also the errors are squared before being averaged, significant errors are assigned a relatively high weight by the RMSE, it means that when large errors are especially undesirable, RMSE is more useful. RMSE does not necessarily increase with the variance of the errors. RMSE increases with the variance of the frequency distribution of error magnitudes[10]. So, in this study RMSE is adopted to calculate the performance of the model.

++*The input data are only from internal conditions, which gives capacity for further improvements, and adjustment for external conditions such as weather.

** With rising food scarcity, the spoilage of fresh produces such as fruits and vegetables and reduction in sales volume are a few key concerns for fresh supply chain. The purpose of this study is to select the appropriate forecasting model to be implemented at the retail stage for Demand forecasting 1057 selected vegetables which will reduce the inventory levels. Thus, it will prevent potential stock-outs and leftovers. The LSTM and SVR models with CV displayed better results for demand forecasting with the least error. The results obtained from this study cannot be generalized. However, the proposed approach can be used to select appropriate forecasting in a specific situation.

h did not consider any external factors. The future research can be done using weekly data and considering factors influencing demand to get the accurate results.

Therefore, it will be proposed for the partner to use this approach is implemented at the

Research on this parameter has indicated that it is potentially possible to delay or move earlier delivery point activity, split or consolidate an order, to group individual, geographically related, locations into deliveries when calculating routing optimization, thus creating room for potential additional savings. Delivery points also predefine the type of vehicle (city or intercity delivery), starting points, affect employee selection, break times and other parameters. The parameter directly affects the definition of routing problems and sets the spatial framework for route calculations. Since this is an indirect prediction of customer consumption, it is expected that the decision tree approach will give good results, and since it is a small output set (active / inactive and time period), various classification methods will be tried, and for additional accuracy can be used a combination of several different approaches (ensemble approaches). Since the number of delivery points is large and it is possible that these individual points are close to each other, the possibility of grouping points into clusters will be considered and cluster prediction will be performed instead of individual points, aiming to speed up and simplify the replication process.

Additionally, it is potentially possible to delay or move an earlier delivery point activity, split or consolidate an order, to group individual, geographically related, locations into deliveries when routing optimization calculations, thus creating room for potential additional savings. Delivery points also predefine the type of vehicle (city or intercity delivery), starting points, affect employee selection, break times and other parameters. The parameter directly affects the definition of routing problems and sets the spatial framework for route calculations. Since this is an indirect prediction of customer consumption, it is expected that the decision tree approach will give good results, and since it is a small output set (active / inactive and time period), various classification methods will be tried, and for additional accuracy can be used a combination of several different approaches (ensemble approaches). Since the number of delivery points is large and it is possible that these individual points are close to each other, the possibility of grouping points into clusters will be considered and cluster prediction will be performed instead of individual points, which aims to speed up and simplify the process.


 [Naomi-Fri1]Jel se smije spomenuti ovako ili mora biti generalnije?

 [Naomi-Fri2]Hocemo li dodati ukaratko o LSTM-u?

find the cost of your paper

Discussion Forum: Course Reflection

ILOs Understand the general nature, purposes, and techniques of literature with a sense of its relationship to life and culture. Recognize a representative selection of literary works by major writers….

The Value Of The Humanities

Required ResourcesRead/review the following resources for this activity:  Minimum of 1 scholarly source  Use textbook Chapter 1 Initial Post InstructionsFor the initial post, address the following: What is the value….

Assignment: Designing a Plan for Outcome Evaluation

SOCW 6311 wk 10 Assignment: Designing a Plan for Outcome Evaluation Social workers can apply knowledge and skills learned from conducting one type of evaluation to others. Moreover, evaluations themselves can….