The simpler model is sometimes better.
We used Meta's Prophet to forecast the flow of patients between care providers. However, we had problems.
First, the errors were pretty high when back testing. The average error over all care providers was about 20%.
Second, the sum of all care providers in a region was sometimes wildly larger than we'd expect. Curiously, the forecasts for the individual providers in the region were reasonably good.
Thirdly, even if the total numbers of patients flowing between care providers were plausible, it wasn't compatible with the model that forecast the number of patients at the care provider.
A simpler model
We tried a model that used just the historical average number of patients for a care provider. This calculation was over all data on the same day of the week and same month of the year (but ignoring the Covid pandemic when figures were weird).
Example: for any given Monday in January, we simply looked at the average flow between two nodes for all Mondays in every January over all non-Covid years.
This approach yielded a lower error than Prophet - about 16%. See below for what metric we used to quantify the error.
Odd figures
When we rendered the Prophet forecasts in the front-end application, they didn't seem too bad most of the time. For instance, we predict that 3027 people would go through the emergency departments in all hospitals in region X tomorrow when today it was 2795. OK, not too shocking.
But if we looked at flows with small numbers, the figures looked crazy. For instance, we predicted the number of people being discharged from all hospitals in a region into mental health units would be 15 tomorrow when this week it had actually averaged about 1.
One of the issues was Prophet itself. Prophet timeseries for small numbers may well have predictions below 0. Obviously, the users did not like us predicting a negative number of patients. But if we simply mapped all negative number to zero, we might get surprises.
Let's take some random numbers:
>>> import numpy as np
>>> xs = np.random.normal(1, 2, 10)
>>> sum(xs)
9.788255911110234
Let's say the flow from a to c is X and the flow from b to c is Y.
Which error metric to use?
I was using root mean squared error to calculate errors whereas a colleague was using mean absolute error. Which is most appropriate must be decided on a case by case basis. RMSE punishes a data set with a large number of outliers. Whereas, "If being off by 10 is just twice as bad as being off by 5, then MAE is more appropriate". Note that "MAE will never be higher than RMSE because of the way they are calculated" - see this SO answer.