Can you predict when a booking will be cancelled?
About the client
Our client is a group of the main combined transport (CT) operators in Europe who service about 70% of all main European transport companies. The CT operators manage train freights & train terminals throughout the European continent. They cover about 5 million transports a year.
Context Customers book the train transport of their loading units through a digital hub. This can be for example a trip from Moskou to Hamburg with the truck, en then all the way to Norway via train. To be profitable, a freight train should be booked for at least 90%. In reality, not all booked freight is present at the moment of departure of the train. There are two types of cancellation scenarios. Either those cancellations are declared by the client: he has adapted the status in the digital hub. Or either, the cancellations are not declared by the client for some reason: truck has broken down, the client forgets to cancel the booking, the truck is stuck in traffic and won’t make it to the train terminal etc. Today operators do not have any clue whether a customer might cancel a booking or if a loading unit will not arrive at the departure terminal. In such situation, it is hard to optimize the use of the train to its maximum capacity which is something very important for the business as the train has been ordered for a fixed price. If the capacity of the train is under a certain threshold, the transport operator loses money on that single route/path. Leaving out unused slots (wagons) on the train takes time and storage space. Therefore, it could be useful for the CT operators to predict a booking so they can optimize the train loading capacity.
What SII Belgium did
SII Belgium set up a predictive model for the cancellation of the booked slots on European freight trains. The group of the CT operators assumed there would be 1 to 2 % of cancellations. There are also 30 million existing records in the archives, but unfortunately, they didn’t have any records on the cancellations. When a client cancels a booking, that information was not relevant anymore for the further planning of the trains. This is why there is no data of the cancellations from the past available. By analyzing the data with the CRISP model, SII Belgium hoped to discover a link between the risk of a cancellation and other elements such as client, driver, destination etc. This method was used in the execution of the data mining project. The data mining model was trained and evaluated to eliminate the errors. For the actual analysis, SII Belgium used Microsoft’s Cortana Intelligence suite and especially the module for machine learning. We performed the analysis with data from the production database – at 3 different moments – and it took about 30 man days to complete the analysis. With the two-class classification model, data was injected in it to analyze which graph was produced. This was necessary to see if the model was good enough and validate it. The first analysis covered 60.000 records. At the second and third exercise, the data volume had grown to 140.000 & 800.000 records. The analysis showed that there is an average of 10,7% of bookings that are cancelled.
There is no significant correlation with the available features as to predict with good certainty if a transport will be cancelled. The best scoring predictive models give a very low probability to predict correctly if a loading unit will not appear at the departure terminal or that a booking will be cancelled. In order to really have a predicting analysis, the probability should be high enough. The research and analysis didn’t find any links that allow to predict accurately whether there will be a cancellation or not. The fact that there is no correlation is also useful as information. It excludes certain assumptions. We do however encourage to further invest in data mining, with the objective to increase profitability by improving the efficiency of train capacity planning. Other parameters could be identified.