1 Introduction

In-migration to states or territories, cities, and towns in developed nations is, on the whole, the primary determinant for population growth and change (Bell et al. 2015). Accurately estimating in-migration probabilities for specific destinations is therefore important to understanding subnational demographic processes (Smith et al. 2013). Flows of migrants and changes in these between origin and destination communities are, in many cases, more important than changes induced by fertility behaviours (Willekens 2016). Consequently, accurately plotting age profiles of migration to destinations is important to understanding demographic changes and in projecting future migration flows between specific origins and destinations (Rogers 1975, 1986). It is also the basis for comparative work on migration across regions and nations, as well as projecting how these might change over time (Bernard and Bell 2015).

Place-to-place flows are commonly used as the empirical basis for estimating destination-specific internal migration in Australia. Source data are invariably from the 5-yearly Census of Population and Housing from which the demographer can, in theory, pair origins and destinations at fine-grained geographic levels to provide age-specific migration profiles (ABS 2016). Nevertheless, in spite of the commonalities exhibited in the shape of migration profiles across countries and internally [for example, as demonstrated by Rogers and Castro (1981)], the heterogeneity of age-specific migration flows to specific destinations and the imperfectness of census data in accurately depicting underlying profiles remain as ongoing concerns for demographers (Baffour and Raymer 2019).

For census data, one of the most significant issues is the temporal point-in-time capture of movements which negates moves that occurred between the temporal capture points. Specifically, for Australia two questions are included in the census to determine where the respondent lived five years ago and one year ago. These are compared to the current place of usual residence, also provided by respondents, and coded to determine whether a migration event has occurred subsequent to one or five years ago. Standardised geographic frameworks are applied to isolate the spatial specificities of the migration event and these feed into population estimates by age and gender (ABS 2019) as well as projection processes.

While important to demographic analysis, the point-in-time capture of migration is limited (to one- and 5-year intervals) and masks underlying dynamicism in the flows and age profiles for migration to and from specific destinations. Not least, the absolute number of migration events is under-captured by an unknown amount since an individual may undertake multiple moves within the one- or 5-year interval, aside from the ones captured at those points in time (Courgeau 1973). This is pertinent for understanding migration in Australian because, during their life course, Australians undertake more moves on average than residents of other developed nationsFootnote 1 (Bell et al. 2015, 2017).

The one- and 5-year intervals for census migration data also creates problems for demographers seeking to compare or convert across time or areas. For example, absolute numbers of migration flows captured by the 1-year variable and multiplied by five are generally significantly larger than captured in the 5-year variable. This difference is due to multiple moves made by individuals during the longer period (Rees 1977). Demographers call this the “1-year/5-year” problem, reflective of variations between migration probabilities collected over one- and 5-year intervals and hindering direct comparisons between the two. To address this issue, some demographers favour migration ratios, the probability of migrating to a particular destination conditional on out-migrating from a given origin, for their relative stability in derivation from point-in-time intervals (Rogers et al. 2003).

Studies have demonstrated further weaknesses exist in census migration data including in the accuracy of destination migration which is captured for infants born within the transition interval (Rees et al. 2000) and for older migrants (Wilson and Terblanche 2018; Wilson 2020). In addition, age and gender profiles in census migration data for more rural and remote destinations are known to be more volatile, partly as a result of the issues above and combined with temporal unpredictability due to their relatively small populations (Peters et al. 2016). As a key input to modelling, the cumulative impacts of deviations from the underlying age profiles for migration are to reduce the accuracy of comparative migration metrics and their inputs to other forms of population modelling like projections.

In theory, some of the shortcomings in census data might be overcome through application of administrative datasets capturing migration. However, in Australia at least, while the landscape of unit record administrative data provision for such purposes is improving, there are currently no nationally consistent datasets (with sufficient coverage and longitudinal scope) for application to this purpose. Consequently, demographers continue to develop methods for generating accurate destination-specific profiles from noisy and incomplete data (Rogers et al. 2010).

A growing number of studies for modelling and smoothing migration profiles are evident (for example, Rogers et al. (1978), Rogers and Castro (1981), Rogers and Watkins (1987), Congdon (2008), Wilson (2010), Bernard and Bell (2015), Wilson (2020), Dyrting (2020)). The basis for attempts to estimate smooth origin–destination-age-specific migration intensities are counts of movers by destination from an origin population, both of which are available by single year of age from census data. Modelling migration involves a multinomial process within which smooth intensities can be estimated using a number of methods. Willekens (2008), for example, utilised the maximum likelihood method which Hachen (1988) highlighted can be approached from either a competing-risk perspective, where out-migration is estimated separately for each destination, or from a generation–distribution perspective (Rogers et al. 2002), where out-migration and migration ratios are estimated separately. Both approaches have been used to prepare inputs to models for projecting subnational populations (Campbell 1996; Nash 2020).

P-splines are a powerful tool for smoothing and have been applied to estimating mortality (Currie et al. 2004; Camarda 2012; Gonzaga and Schmertmann 2016), and the generation component of out-migration (Dyrting 2020). Our aim here is to extend the method to more accurately smooth the distribution component of migration than methods currently employed. Our approach is motivated by Rogers et al. (2002)’s observation that, while migration probabilities are strongly age-dependent, migration ratios are less so. On this basis, the problem of separating signal from noise, which all smoothing methods seek to solve, should be more effective for ratios than for probabilities.

In the next section, we give a review of transition-style migration probabilities and ratios and summarise the problem of removing irregularities in the age profile through smoothing. In Sects. 3 and 4 we introduce the multinomial model and the P-spline method applied to ratios, demonstrating its solution by iterated linear regressions. In Sect. 5, we apply the method to smoothing the distribution component of interstate migration for Australia. In Sect. 6, we illustrate how it can be combined with a method for smoothing the generation component of out-migration to smooth destination-specific migration probabilities. In Sect. 7 we show how the method can be used to directly compare ratios for 1-year and 5-year migration intervals.

2 Migration ratios

Transition type migration data, such as is collected by the Australian Census for Population and Housing, consists of observations

$$\begin{aligned} _nM^{j}=\left[ \begin{array}{c} _nM^{j}_0\\ \vdots \\ _nM^{j}_{\omega } \end{array}\right] \quad \mathrm { and }\quad N=\left[ \begin{array}{c} N_0\\ \vdots \\ N_{\omega } \end{array}\right] \end{aligned}$$
(1)

of \(_nM^{j}_x\) movers of age \(x+n\) to destination j from an initial population \(N_x\) of age x of a specific origin. Raw out-migration probability \({_n{{\tilde{m}}^{}}}\) and migration ratios \(_n{\tilde{c}}^{j}\) are given by the fractions

$$\begin{aligned} {_n{{\tilde{m}}^{}}}=\frac{_nM^{}}{N} \quad \mathrm { and }\quad _n{\tilde{c}}^{j}= \frac{_nM^{j}}{_nM^{}}, \end{aligned}$$
(2)

where here and in the following all matrix operations and functions act elementwise and

$$\begin{aligned} _nM^{}=\sum _{j=1}^{d} {_nM^{j}} \end{aligned}$$
(3)

is the vector of total movers, and \(d\) is the number of possible destinations.

Within the generation–distribution framework \({_n{{\tilde{m}}^{}}}\) is the observed probability of out-migrating from the origin area, and \(_n{\tilde{c}}^{j}\) is the observed probability of migrating to destination j conditional on out-migrating. We see from Eq. (2) that \(N\) is the population exposed to the risk of out-migrating and \(_nM^{}\) is the population of out-migrants exposed to the risk of in-migrating to j. Since both of these populations are finite, the observed probabilities will have elements of sample noise which we seek to remove using smoothing. In the following we assume that a good estimate of total out-migration \(_nm^{}\) has been obtained and address the problem of finding smooth vectors \(_nc^{j}\) that fit the observed ratios \(_n{\tilde{c}}^{j}\).

3 A multinomial model of migration

Assuming a multinomial model of migration counts (Willekens 2008) one can show that the log likelihood of observing ratios \(_n{\tilde{c}}^{j}\) when the underlying ratios are \(_nc^{j}\) is

$$\begin{aligned} {{\mathscr {L}}}_c={_nM^{}}'\cdot \sum _{j=1}^d {_n{\tilde{c}}^{j} \log (_nc^{j}}) , \end{aligned}$$
(4)

where \(A\cdot B\) denotes matrix multiplication and \(A'\) is the transpose of A.

It is difficult to maximise the above form of the log likelihood function because there is an auxiliary condition that the migration ratios must sum to unity. To handle this condition, it is useful to express \(_nc^{j}\) in terms of conditional ratios \(_na^{j}\) defined by

$$\begin{aligned} _nc^{j}=\left\{ \begin{array}{ll} _na^{1}, &{} j=1\\ _ns^{j}\times {_na^{j}}, &{} j=2,\ldots ,d-1\\ _ns^{d}, &{} j=d\end{array}\right. \end{aligned}$$
(5)

where \(_ns^{j}\) is the product

$$\begin{aligned} _ns^{j}=\prod _{1\le k<j} (1-{_na^{k}}). \end{aligned}$$
(6)

Observed conditional ratios \(_n{\tilde{a}}^{j}\) are defined similarly. \(_na^{j}\) is the probability of migrating to j conditional on not migrating to destinations \(1,\ldots ,j-1\). Let \(_nK^{j}\) be the number of movers to destinations with index j or greater

$$\begin{aligned} _nK^{j}=\sum _{k=j}^{d}{_nM^{k}}, \end{aligned}$$
(7)

then the log likelihood function can be rewritten as

$$\begin{aligned} {{\mathscr {L}}}_c=\sum _{j=1}^{d-1} {{\mathscr {L}}}_j , \end{aligned}$$
(8)

where

$$\begin{aligned} {{\mathscr {L}}}_j=(_nK^{j})'\cdot y_j , \end{aligned}$$
(9)

and

$$\begin{aligned} y_j={_n{\tilde{a}}^{j}}\log {_na^{j}}+(1-{_n{\tilde{a}}^{j}})\log (1-{_na^{j}}). \end{aligned}$$
(10)

The derivation of Eq. (8) is given in Sect. A.2. Written in this form the likelihood is easier to maximise as we only need to impose the condition \(0\le {_na^{j}}\le 1\) on the conditional ratios. Finally, in order to give ratios a common representation which is independent of the interval n we express the n-year conditional ratios \(_na^{j}\) in terms of implied ratios at 1-year intervals \(a^{j}\)

$$\begin{aligned} _na^{j} = {_nT^{j}}\cdot a^{j} , \end{aligned}$$
(11)

where the matrix \({_nT^{j}}\) is given iteratively by

$$\begin{aligned} \begin{array}{rl} _nT^{1}&{}{}={_nU},\\ _nT^{j}&{}{}=\mathrm {diag}\left( \frac{1}{1-{_na^{j-1}}}\right) \cdot {_nT^{j-1}}\cdot \mathrm {diag}\left( 1-a^{j-1}\right) . \end{array} \end{aligned}$$
(12)

Here \(_nU\) is the matrix with elements

$$\begin{aligned} _nU_{xr}=\left\{ \begin{array}{ll} 0 &{} r< x\\ \left( \prod _{x\le k<r}(1-m^{}_k)\right) m^{}_r/{_nm^{}_x} &{} x \le r < x + n\\ 0 &{} r \ge x+n \end{array}\right. , \end{aligned}$$
(13)

and \(m^{}_k\) and \(_nm^{}_x\) are probabilities obtained by smoothing total out-migration (Dyrting 2020). The derivation of Eq. (11) is given in Sect. A.3. Our strategy is now to smooth conditional ratios by maximising \({{\mathscr {L}}}_j\) sequentially as a function of \(a^{j}\) only.

4 Penalised splines

In this section we use penalised splines (P-splines) to smooth conditional ratios (Eilers and Marx 1996). Since we will be smoothing them sequentially, we drop the destination index j to lighten the notation. Represent implied conditional ratios in terms of B-splines using the functional form

$$\begin{aligned} \mathrm {logit}\left( a^{}\right) =B\cdot \theta ^{} . \end{aligned}$$
(14)

Here \(B\) is a matrix of B-spline functions arranged columnwise (de Boor 2001). Conditional ratios are smoothed by maximising the penalised log likelihood function

$$\begin{aligned} {{\mathscr {L}}}={_nK^{}}'\cdot y - \frac{\lambda ^{}}{2}{\theta ^{}}'\cdot D'_k\cdot D_k\cdot \theta ^{} , \end{aligned}$$
(15)

where \(D_k\) is the k-order difference matrix and \(\lambda ^{}>0\) is a penalty parameter. In principle this equation could be solved for \(\theta ^{}\) using a multivariate optimisation routine, but because the number of B-spline nodes is potentially large we need an alternate solution method. Assuming the maximum of the penalised log likelihood occurs at a stationary point we get a system of nonlinear equations for \(\theta ^{}\) which can be solved by iterative linear regressions. Let \(\bar{\theta ^{}}\) be the current approximation to the B-spline weights. The updated value \(\theta ^{}\) is the solution to

$$\begin{aligned} Q^{}\left( \bar{\theta ^{}}\right) \cdot \theta ^{}=b(\bar{\theta ^{}}), \end{aligned}$$
(16)

where

$$\begin{aligned} Q^{}(\theta ^{})&= {} G'\cdot W^{}(\theta ^{})\cdot G+ \lambda ^{}D'_k\cdot D_k , \end{aligned}$$
(17)
$$\begin{aligned} b(\theta ^{})&= {} G'\cdot V\cdot \left( {_n{\tilde{a}}^{}}-{_na^{}}\right) + G'\cdot W^{}(\theta ^{})\cdot G\cdot \theta ^{} , \end{aligned}$$
(18)

and

$$\begin{aligned} W^{}(\theta ^{})=\mathrm {diag}\left( {_na^{}}(1-{_na^{}})\,{_nK^{}}\right) . \end{aligned}$$
(19)

The derivation of this iteration and the expression for \(G\) are given in Appendix 2.

Smoothness is controlled by the penalty parameter \(\lambda ^{}\). The higher the value the smoother the ratio \(a^{}\). The penalty can be specified explicitly or chosen automatically by minimising one of a number of measures that seek to balance the decreased fitting error against the increased effective number of parameters as \(\lambda ^{}\) is made smaller. Two popular measures are the Akaike information criterion (Akaike 1974)

$$\begin{aligned} \mathrm {AIC}(\lambda ^{}) =\mathrm {dev}+ 2\times \mathrm {dim}, \end{aligned}$$
(20)

and the Bayesian information criterion (Schwarz 1978)

$$\begin{aligned} \mathrm {BIC}(\lambda ^{})=\mathrm {dev}+\mathrm {dim}\times \log (1+\omega ) , \end{aligned}$$
(21)

where

$$\begin{aligned} \mathrm {dev}(\theta ^{},\lambda ^{})=-2\times {_nK^{}}'\cdot y \end{aligned}$$
(22)

is the deviance and

$$\begin{aligned} \mathrm {dim}(\theta ^{},\lambda ^{})=\mathrm {tr}(H^{}) \end{aligned}$$
(23)

is the effective dimension of \(\theta ^{}\) calculated using the trace of the hat matrix of the linearised problem

$$\begin{aligned} H^{}=\left( G'\cdot W^{}\cdot G+\lambda ^{}D_k' \cdot D_k\right) ^{-1}\cdot G'\cdot W^{}\cdot G. \end{aligned}$$
(24)

From experiments with Australian census data we found that \(\mathrm {AIC}(\lambda ^{})\) would often under-smooth and \(\mathrm {BIC}(\lambda ^{})\) would often over-smooth. We found that a good compromise was the Akaike information criterion with corrections (Hurvich and Tsai 1989)

$$\begin{aligned} \mathrm {AICc}(\lambda ^{}) = \mathrm {AIC}(\lambda ^{}) + 2 \frac{\mathrm {dim}(\mathrm {dim}+1)}{\omega -\mathrm {dim}} \end{aligned}$$
(25)

5 Application to interstate migration: ratios

As an application of the smoothing method outlined above, we consider estimation of the distribution component of Australia interstate migration. This is an important step in the preparation of origin–destination-age-specific migration probabilities which are necessary inputs to a multistate lifetable analysis or population projection model (Rogers 1975). Data from the 2016 Australian Census of Population and Housing were used to calculate raw and smoothed destination-age-specific out-migration ratios for each of Australia’s six states and two mainland territories for both 1-year and 5-year intervals. With P-splines the knots should be spaced at intervals small enough such that an unpenalised ratio (\(\lambda ^{}=0\)) will show more variation than is justified by the data (Eilers and Marx 1996). For most age ranges we found that a knot spacing of approximately three years was sufficiently small. For eastern states, out-migration to the Australian Capital Territory changes rapidly over the age ranges 17 to 19 (1-year ratios) and 12 to 19 (5-year ratios) reflecting its importance as a destination for young adults entering tertiary education. Therefore, over the age interval 12 to 21 we used knots spaced at 1-year intervals and for the remainder of the age range 0 to 90 we used knots at 3-year intervals. Our fits did not change substantially if a finer grid for the knots was used. The results presented are for quadratic B-splines because we found that linear B-splines would occasionally give kinks at the knot points. A linear penalty (\(k=1\)) was used, with \(\lambda ^{}\) determined by minimising \(\mathrm {AICc}(\lambda ^{})\). For comparison we have included the results of a linear kernel regression smoothing of the raw migration ratios using a Gaussian kernel (Fan and Gijbels 1996) and the Rule-Of-Thumb bandwidth selector (Ruppert et al. 1995).

Figures 1 and 2 compare raw and smoothed 1-year migration ratios for Australia’s largest state, New South Wales, and its least populated mainland territory, the Northern Territory. The complete set of 112 origin–destination ratios for all states and territories is given in Figures S-1 to S-16 of Online Resource 1. Also shown is the 95% confidence interval for observed ratios based on the P-spline fit. As observed by Rogers et al. (2002) ratios do not exhibit as strong a dependence on age as probabilities. Also, apart from a strong constant component there does not appear to be a repeating pattern common across destinations, and yet we do observe a definite variation with age, in particular the presence of a “student peak” for migration to the ACT from New South Wales, Victoria, and Queensland (see Fig. 1, S-2, and S-3). The two smoothing methods give similar results when both the origin population and the destination-specific ratio are large (see the age-specific migration ratio from New South Wales to Queensland). They begin to show differences when dispersion in the raw data increases, especially over advanced ages (see the age-specific migration ratio from the Northern Territory to Tasmania).

Fig. 1
figure 1

Source: Based on Australian Bureau of Statistics data

New South Wales interstate out-migration ratios 2015–2016 by age in 2015 and destination, two smoothing methods. Grey area, 95% confidence interval for observed ratios based on P-spline fit.

Fig. 2
figure 2

Source: Based on Australian Bureau of Statistics data

Northern Territory interstate out-migration ratios 2015–2016 by age in 2015 and destination, two smoothing methods. Grey area, 95% confidence interval for observed ratios based on P-spline fit.

Table 1 gives summary statistics for the two smoothing methods: a goodness-of-fit measure given by the deviance

$$\begin{aligned} \mathrm {dev}_{c}=2\times {_nM^{}}'\cdot \sum _{j=1}^d {_n{\tilde{c}}^{j} \log (_n{\tilde{c}}^{j}/_nc^{j}}), \end{aligned}$$
(26)

and our assessment of each fit’s deficiencies, if any, focussing on two types: Over-smoothing of the student peak and under-smoothing of the profile at ages 80 and over. P-spline has the lowest value of \(\mathrm {dev}_{c}\) for 13 of the 16 state-interval combinations. For the three cases where kernel regression has the lowest deviance its fit shows signs of under-smoothing at senior ages. P-spline performs better because it is able to model the increase in sample variance with decreasing population at risk (the total number of movers) whereas kernel regression assumes it is the same for all ages. Furthermore, kernel regression has over-smoothed the student peak to ACT in the 1-year data for New South Wales, Victoria, and Queensland (Fig. 1, S-2, and S-3 respectively) and the 5-year data for Queensland (Figure S-11).

Table 1 Summary statistics for two smoothing methods applied to Australian interstate out-migration ratios, 2016

6 Application to interstate migration: probabilities

Once smoothed ratios have been found, all of the destination-specific migration probabilities \(_nm^{j}\) are then available to us through the expression

$$\begin{aligned} {_nm^{j}}={_nm^{}}\times {_nc^{j}}. \end{aligned}$$
(27)

An alternate framework for smoothing is the competing-risk approach (Hachen 1988), where age-specific probabilities are smoothed separately for each destination. We compare our generation–distribution approach to two competing-risk smoothing methods: smoothing with kernel regression (Bernard and Bell 2015) and smoothing with Wilson (2010)’s student model migration schedule (student MMS). Data from the 2016 Australian Census of Population and Housing were used to calculate raw and smoothed destination-age-specific out-migration probabilities for each of Australia’s six states and two mainland territories for both 1-year and 5-year intervals. The generation component was smoothed using P-TOPALS (Dyrting 2020) and the distribution component taken from Sect. 5. For kernel regression, raw destination-specific migration probabilities were smoothed using linear polynomials, a Gaussian kernel (Fan and Gijbels 1996), and the Rule-Of-Thumb bandwidth selector (Ruppert et al. 1995). For student MMS, destination-specific migration probabilities were smoothed with a three-step process:

  1. Step 1:

    the model was fitted to total out-migration, and the parameter values saved.

  2. Step 2:

    for each destination the model was fitted to destination-specific migration probabilities, keeping the profile parameters fixed at their values from Step 1 and only adjusting the level parameters.

  3. Step 3:

    for each destination all parameters were fitted starting from their Step 2 values.

Figures 3 and 4 compare raw and smoothed 1-year migration probabilities for New South Wales and the Northern Territory. The complete set of 112 origin–destination probabilities for all states and territories are provided in Figures S-17 to S-32 of Online Resource 1. These figures also show the 95% confidence interval for observed probabilities based on the P-spline fit. The origin–destination-specific schedules display a variety of differences in the position and prominence of student, labour, and retirement peaks. Because it is concentrated over a narrow age range, a prominent student peak will not be well fitted by kernel regression, which tends to over-smooth the feature (see New South Wales to ACT in Fig. 3, and Northern Territory to SA in Fig. 4).

Approximating out-migration to a given destination j as a Poisson process it can be shown that the size of the sample noise in \({_n{{\tilde{m}}^{j}}}\) relative to \(_nm^{j}\) is \(1/\sqrt{N\times _nm^{j}}\). This implies that the relative size of the sample noise increases as the exposed population \(N\) decreases. It also shows that the relative size of sample noise will be larger for destinations with lower probabilities. In each of Figs. 34, and S-17 to S-32 destinations are arranged from top to bottom in order of decreasing probability. We see an increase in the amount of sample noise relative to the level, and when it is large both kernel regression and student MMS can give unrealistic profiles (see Northern Territory to Tasmania in Fig. 4).

Fig. 3
figure 3

Source: Based on Australian Bureau of Statistics data

New South Wales interstate out-migration probability 2015–2016 by age in 2015 and destination, three smoothing methods. Grey area, 95% confidence interval for observed probabilities based on P-spline fit.

Fig. 4
figure 4

Source: Based on Australian Bureau of Statistics data

Northern Territory interstate out-migration probability 2015–2016 by age in 2015 and destination, three smoothing methods. Grey area, 95% confidence interval for observed probabilities based on P-spline fit.

Table 2 gives summary statistics for the three smoothing methods: two goodness-of-fit measures \(\mathrm {dev}\) and \(\mathrm {dev}_{m}\), a shape plausibility measure \({\bar{P}}\), and our assessment of each fit’s deficiencies, if any. The first goodness-of-fit measure is the multinomial deviance

$$\begin{aligned} \mathrm {dev}=2\times {N}'\cdot \left[ (1-{_n{{\tilde{m}}^{}}})\log \left( \frac{1-{_n{{\tilde{m}}^{}}}}{1-{_nm^{}}}\right) +\sum _{j=1}^d {{_n{{\tilde{m}}^{j}}} \log \left( \frac{{_n{{\tilde{m}}^{j}}}}{_nm^{j}}\right) } \right] \end{aligned}$$
(28)

for a joint fit of all destination-specific migration probabilities for a given origin. A good joint fit does not necessarily imply a good fit of total out-migration, the quantity of importance for the origin population. For this reason, we give a second goodness-of-fit measure, binomial deviance for the total out-migration

$$\begin{aligned} \mathrm {dev}_{m}=2\times {N}'\cdot \left[ (1-{_n{{\tilde{m}}^{}}})\log \left( \frac{1-{_n{{\tilde{m}}^{}}}}{1-{_nm^{}}}\right) +{_n{{\tilde{m}}^{}}}\log \left( \frac{{_n{{\tilde{m}}^{}}}}{_nm^{}}\right) \right] . \end{aligned}$$
(29)

Low deviances can sometimes be achieved by an unrealistic profile, and for this reason we include the measure \({\bar{P}}\), equal to the sum of the percentage that each destination-specific schedule’s profile differs from a reference profile

$$\begin{aligned} {\bar{P}}= \sum _{j=1}^d 100 \left( 1-\frac{_nm_{\mathrm {ref}}'\cdot {_nm^{j}}}{|_nm_{\mathrm {ref}}||_nm^{j}|} \right) , \end{aligned}$$
(30)

where \(_nm_{\mathrm {ref}}\) is a reference schedule and \(|v|\) is the absolute value of vector v. For reference schedule we used the P-TOPALS smoothed interstate probabilities from Dyrting (2020). When we assessed each fit’s deficiencies, we focused on whether it over-smoothed the student peak (S), under-smoothed advanced ages (E), or had an implausible shape (X).

Table 2 Summary statistics for three smoothing methods applied to Australian interstate out-migration probabilities, 2016

Table 2 shows that, for all origins and intervals, our generation–distribution approach gives realistic profiles, with the lowest or equal lowest value for \({\bar{P}}\). For 1-year probabilities it has the lowest value for \(\mathrm {dev}_{m}\) for seven of the eight states, and the lowest value for \(\mathrm {dev}\) for five of the eight states, with the kernel regression (SA, ACT, and NT out-migration) or student MMS (NT out-migration) achieving a lower value with an unrealistic profile. For 5-year probabilities the three methods are more evenly matched. Our approach has the lowest value of \(\mathrm {dev}_{m}\) for five of the eight states, but kernel regression has the lowest value for \(\mathrm {dev}\) for seven of the eight states. Our approach always has a lower value for \(\mathrm {dev}\) compared to student MMS but has the lowest value for only one state (NSW).

7 Application to the 1-year/5-year problem

We previously described the 1-year/5-year census migration problem, which is hallmarked by differences in aggregate migration flows, and therefore derived probabilities, when comparing across time and space. Five-year probabilities are less than five times 1-year probabilities, the difference being due to multiple moves over the longer interval (Rees 1977). Unlike probabilities, migration ratios appear to be more stable across different intervals (Rogers et al. 2003) and the method developed here is useful for comparing them because they share a common representation in terms of implied 1-year migration ratios (see Sect. A.1).

Figures 5 and 6 show 1-year and 5-year implied migration ratios from New South Wales and the Northern Territory respectively using data from the 2016 Census. The complete set of 56 origin–destination ratios for all states and territories is given in Figures S-33 to S-40 of Online Resource 1. We see that the 5-year migration ratios implied by 5-year data have the same level as ratios from 1-year data and similar age profiles. This suggests that a crude method for converting 5-year probabilities to 1-year probabilities would be to focus on converting the generation component, keeping the distribution component fixed at its implied 1-year values.

Fig. 5
figure 5

Source: Based on Australian Bureau of Statistics data

New South Wales interstate out-migration ratios 2016, two intervals.

Fig. 6
figure 6

Source: Based on Australian Bureau of Statistics data

Northern Territory interstate out-migration ratios 2016, two intervals.

8 Discussion and conclusion

This paper has extended the P-spline method to the problem of smoothing the migration ratio component of a generation–distribution representation of origin–destination-age-specific migration probabilities. Existing methods address this problem through smoothing of bi-regional flows, primarily in-migration or out-migration with one destination–origin pairing (usually a State with the rest of the country). The contribution of our method is to provide a multi-regional approach to smoothing destination-migration profiles. The method has been implemented as an Excel add-in which is included in Online Resource 2. The potential downstream benefits from this method include the preparation of more accurate inputs for origin–destination-specific population projection models, and the construction of multi-regional life tables.

Using the example of Australian interstate migration, we have shown how P-splines can give an accurate fit to the migration ratio profile across high-curvature ages and a good treatment of sample noise both when the population at risk is low, such as advanced ages, and when the destination has a low conditional probability of migration. When combined with the use of P-TOPALS for smoothing the generation component we find that the P-spline method produced smooth origin–destination profiles that were both realistic and accurate, performing better than both kernel regression and student MMS for 1-year probabilities, but for 5-year probabilities the results for the three methods were more evenly matched. We used the method to directly compare 1-year migration ratios and implied 1-year ratios from 5-year data and found that they have the same level and similar age-specific shapes.

The framework we have adopted here, the generation–distribution decomposition of migration flows, is the same as used by Rogers et al. (2002), although our focus and tools are different. One difference is that we use single year of age data and smooth with P-splines while they used 5-year age groupings and smoothed using a log-linear model. The second difference is that we assume a complete set of data while they were interested in the problems posed by incomplete data, in particular the problem of repairing incomplete data by imposing age and spatial structures from an external source. The extent that the distribution component differs for 1-year and 5-year intervals is currently an open question. Liaw (1984) conjectured that most of the difference between one- and 5-year migration probabilities is due to the generation component, while Rogerson (1990) argues that the distribution component is also affected by the interval width. Rogers et al. (2003) concluded that the assumption of constant distribution component needed to be relaxed and some type of variation introduced through exogenous covariates. Our contribution to this debate has been to provide a representation of the distributional component in terms of implied 1-year ratios which enables 1-year and 5-year ratios to be directly compared.

Migration between the eight Australian states and territories is a good test case for our method because it spans two orders of magnitude of migration flow sizes from an average of 398.5 persons per single year of age of movers from New South Wales to Queensland over interval 2015–2016 to an average of 3.5 persons per single year of age from the Northern Territory to Tasmania over the same period. How would our method adapt to smaller spatial scales?, or more precisely, as the spatial scale is decreased is the method robust as the flow size decreases and manageable as the number of possible destinations increases? As the flow size \(_nK^{}\) decreases the penalty term in Eq. (15) will become more important. For the case of a linear penalty (\(k=1\)) the B-spline weight will tend to a constant

$$\begin{aligned} \theta ^{} = \theta ^{}_0 \iota , \end{aligned}$$
(31)

where \(\iota\) is a vector of ones. Because B-splines form a partition of unity (\(B\cdot \iota =1\)) the conditional ratio \(a^{}\) will, in this limit, tend to a value independent of age. The method therefore adaptively converges to the OPCS model described by Wilson and Bell (2004), where migrants are distributed to destinations in a fixed proportion regardless of age. Another aspect of applying the smoothing method at smaller spatial scales is the increase in the dimension of the multi-regional migration matrix, which scales as the square of the number of regions. The burden on the user of estimating, in one stage, the entire base period matrix has been cited as one of the challenges to practical implementation of a multi-regional population projection model (Wilson and Bell 2004). One of the strengths of our method is that it allows a multi-stage approach to estimating the matrix: for each origin estimate the out-migration rate, and sequentially for each destination, estimate the conditional migration ratio.

Assumption-setting, the processes of projecting the future trajectories of migration rates which are then used in population projections, while different from the estimation of current levels (which our method seeks to solve) is frequently connected to it (assumptions often being expressed as additive or multiplicative changes to the jump-off level) and presents similar challenges to the practitioner (e.g. how to manage dimensionality for small spatial scales). The framework we use enables users to handle the setting of assumptions by dividing them into two types (generation and distributional) which can be projected separately. Thus, for example, dynamism of out-flows can be modelled by making the total out-migration rate time-dependent, and dynamism of the distribution of flows can be modelled by making the migration ratios time-dependent.

The main strength of the P-spline method is its combination of flexibility in fitting the variety of age-specific profiles for migration ratios and ability to account for age-dependent sample noise. Another advantage is that it allows the practitioner to either use an automatic smoother such as the \(\mathrm {AICc}\) condition or dial the level of smoothing manually for any origin–destination pair. A current limitation of the method is that it assumes data in the form of single years of age. Often census data on internal migration are published by age groups, commonly 5-year, and even when data are available by single year of age it is sometimes necessary to group it to mitigate the effects of age-heaping (Feeney 1979) or confidentialisation (Thompson et al. 2013). One area for further work, therefore, is to extend the P-spline approach to the distribution part when data is grouped. Another path for further investigation is to adapt the method for experimentally projecting the distribution component or more generally updating the distributional component conditional on partial information, possibly drawing on methods from Plane (1981).