Skip to main content
Log in

Asynchronous parallel algorithms for nonconvex optimization

  • Full Length Paper
  • Series A
  • Published:
Mathematical Programming Submit manuscript

Abstract

We propose a new asynchronous parallel block-descent algorithmic framework for the minimization of the sum of a smooth nonconvex function and a nonsmooth convex one, subject to both convex and nonconvex constraints. The proposed framework hinges on successive convex approximation techniques and a novel probabilistic model that captures key elements of modern computational architectures and asynchronous implementations in a more faithful way than current state-of-the-art models. Other key features of the framework are: (1) it covers in a unified way several specific solution methods; (2) it accommodates a variety of possible parallel computing architectures; and (3) it can deal with nonconvex constraints. Almost sure convergence to stationary solutions is proved, and theoretical complexity results are provided, showing nearly ideal linear speedup when the number of workers is not too large.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

Notes

  1. With a slight abuse of notation, we denote by \(\varvec{\omega }_k\) the k-th element of the sequence \(\omega \in \varOmega \), and by \(\varvec{\omega }^k\) the value taken by the random variable \(\underline{\varvec{\omega }}^k\) over \(\omega \), i.e. \(\underline{\varvec{\omega }}^k(\omega ) = \varvec{\omega }^k\).

References

  1. Baudet, G.M.: Asynchronous iterative methods for multiprocessors. JACM 25(2), 226–244 (1978)

    MathSciNet  MATH  Google Scholar 

  2. Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods, vol. 23. Prentice-Hall Englewood Cliffs, NJ (1989)

    MATH  Google Scholar 

  3. Cannelli, L., Facchinei, F., Kungurtsev, V., Scutari, G.: Asynchronous parallel algorithms for nonconvex big-data optimization—part i: Model and convergence. arXiv preprint arXiv:1607.04818 (2016)

  4. Cannelli, L., Facchinei, F., Kungurtsev, V., Scutari, G.: Asynchronous parallel algorithms for nonconvex big-data optimization. Part ii: Complexity and numerical results. arXiv preprint arXiv:1701.04900 (2017)

  5. Cannelli, L., Facchinei, F., Kungurtsev, V., Scutari, G.: Asynchronous parallel nonconvex large-scale optimization. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4706–4710 (2017)

  6. Cannelli, L., Scutari, G., Facchinei, F., Kungurtsev, V.: Parallel asynchronous lock-free algorithms for nonconvex big-data optimization. In: 2016 50th Asilomar Conference on Signals, Systems and Computers, pp. 1009–1013 (2016)

  7. Chazan, D., Miranker, W.: Chaotic relaxation. Linear Algebra Appl. 2(2), 199–222 (1969)

    MathSciNet  MATH  Google Scholar 

  8. Daneshmand, A., Facchinei, F., Kungurtsev, V., Scutari, G.: Hybrid random/deterministic parallel algorithms for convex and nonconvex big data optimization. IEEE Trans. Signal Process. 63(15), 3914–3929 (2015)

    MathSciNet  MATH  Google Scholar 

  9. Davis, D.: The asynchronous palm algorithm for nonsmooth nonconvex problems. arXiv preprint arXiv:1604.00526 (2016)

  10. Davis, D., Edmunds, B., Udell, M.: The sound of APALM clapping: Faster nonsmooth nonconvex optimization with stochastic asynchronous palm. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29. Curran Associates, Inc., pp. 226–234 (2016)

  11. Durrett, R.: Probability: Theory and Examples. Cambridge University Press, Cambridge (2010)

    MATH  Google Scholar 

  12. Facchinei, F., Lampariello, L., Scutari, G.: Feasible methods for nonconvex nonsmooth problems with applications in green communications. Math. Program. 164(1), 1–36 (2016)

    MathSciNet  MATH  Google Scholar 

  13. Facchinei, F., Scutari, G., Sagratella, S.: Parallel selective algorithms for nonconvex big data optimization. IEEE Trans. Signal Process. 63(7), 1874–1889 (2015)

    MathSciNet  MATH  Google Scholar 

  14. Frommer, A., Szyld, D.B.: On asynchronous iterations. J. Comput. Appl. Math. 123(1), 201–216 (2000)

    MathSciNet  MATH  Google Scholar 

  15. Hong, M.: A distributed, asynchronous and incremental algorithm for nonconvex optimization: an ADMM approach. IEEE Trans. Control Netw. Syst. 5(3), 935–945 (2018)

    MathSciNet  MATH  Google Scholar 

  16. Huo, Z., Huang, H.: Asynchronous stochastic gradient descent with variance reduction for non-convex optimization. arXiv preprint arXiv:1604.03584 (2016)

  17. Iutzeler, F., Bianchi, P., Ciblat, P., Hachem, W.: Asynchronous distributed optimization using a randomized alternating direction method of multipliers. In: 52nd IEEE Conference on Decision and Control, pp. 3671–3676 (2013)

  18. Klenke, A.: Probability Theory: A Comprehensive Course. Springer, Berlin (2013)

    MATH  Google Scholar 

  19. Leblond, R., Pedregosa, F., Lacoste-Julien, S.: ASAGA: asynchronous parallel SAGA. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, pp. 46–54 (2017)

  20. Lian, X., Huang, Y., Li, Y., Liu, J.: Asynchronous parallel stochastic gradient for nonconvex optimization. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28. Curran Associates, Inc., pp. 2719–2727 (2015)

  21. Liu, J., Wright, S.J.: Asynchronous stochastic coordinate descent: parallelism and convergence properties. SIAM J. Optim. 25(1), 351–376 (2015)

    MathSciNet  MATH  Google Scholar 

  22. Liu, J., Wright, S.J., Ré, C., Bittorf, V., Sridhar, S.: An asynchronous parallel stochastic coordinate descent algorithm. J. Mach. Learn. Res. 16(1), 285–322 (2015)

    MathSciNet  MATH  Google Scholar 

  23. Mania, H., Pan, X., Papailiopoulos, D., Recht, B., Ramchandran, K., Jordan, M.I.: Perturbed iterate analysis for asynchronous stochastic optimization. SIAM J. Optim. 27(4), 2202–2229 (2017)

    MathSciNet  MATH  Google Scholar 

  24. Nedić, A., Bertsekas, D.P., Borkar, V.S.: Distributed asynchronous incremental subgradient methods. Stud. Comput. Math. 8(C), 381–407 (2001)

    MathSciNet  MATH  Google Scholar 

  25. Niu, F., Recht, B., Re, C., Wright, S.J.: Hogwild: a lock-free approach to parallelizing stochastic gradient descent. In: Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 24. Curran Associates, Inc., pp. 693–701 (2011)

  26. Pedregosa, F., Leblond, R., Lacoste-Julien, S.: Breaking the nonsmooth barrier: a scalable parallel method for composite optimization. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30. Curran Associates, Inc., pp. 56–65 (2017)

  27. Peng, Z., Xu, Y., Yan, M., Yin, W.: Arock: an algorithmic framework for asynchronous parallel coordinate updates. SIAM J. Sci. Comput. 38(5), A2851–A2879 (2016)

    MathSciNet  MATH  Google Scholar 

  28. Peng, Z., Xu, Y., Yan, M., Yin, W.: On the convergence of asynchronous parallel iteration with unbounded delays. J. Oper. Res. Soc. China 7(1), 5–42 (2019)

    MathSciNet  MATH  Google Scholar 

  29. Robbins, H., Siegmund, D.: A convergence theorem for non negative almost supermartingales and some applications. In: Herbert Robbins Selected Papers, pp. 111–135 (1985)

  30. Scutari, G., Facchinei, F., Lampariello, L.: Parallel and distributed methods for constrained nonconvex optimization-part i: theory. IEEE Trans. Signal Process. 65(8), 1929–1944 (2017)

    MathSciNet  MATH  Google Scholar 

  31. Scutari, G., Facchinei, F., Lampariello, L., Sardellitti, S., Song, P.: Parallel and distributed methods for constrained nonconvex optimization-part ii: applications in communications and machine learning. IEEE Trans. Signal Process. 65(8), 1945–1960 (2017)

    MathSciNet  MATH  Google Scholar 

  32. Scutari, G., Facchinei, F., Song, P., Palomar, D.P., Pang, J.-S.: Decomposition by partial linearization: parallel optimization of multi-agent systems. IEEE Trans. Signal Process. 62(3), 641–656 (2014)

    MathSciNet  MATH  Google Scholar 

  33. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B (Methodol.) 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  34. Tseng, P.: On the rate of convergence of a partially asynchronous gradient projection algorithm. SIAM J. Optim. 1(4), 603–619 (1991)

    MathSciNet  MATH  Google Scholar 

  35. Tsitsiklis, John, Bertsekas, Dimitri, Athans, Michael: Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Trans. Autom. Control 31(9), 803–812 (1986)

    MathSciNet  MATH  Google Scholar 

  36. Wei, E., Ozdaglar, A.: On the o (1 = k) convergence of asynchronous distributed alternating direction method of multipliers. In: Global Conference on Signal and Information Processing (GlobalSIP), pp. 551–554 (2013)

  37. Wright, S.J.: Coordinate descent algorithms. Math. Program. 151(1), 3–34 (2015)

    MathSciNet  MATH  Google Scholar 

  38. Young, W.H.: On classes of summable functions and their Fourier series. Proc. R. Soc. Lond. Ser. A Math. Phys. Sci. 87(594), 225–229 (1912)

    MATH  Google Scholar 

  39. Yun, H., Yu, H.-F., Hsieh, C.-J., Vishwanathan, S.V.N., Dhillon, I.: Nomad: non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion. Proc. VLDB Endowment 7(11), 975–986 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francisco Facchinei.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work of Cannelli and Scutari was supported by the USA NSF under Grants CIF 1564044, and CIF 1719205; and the Army Research Office under Grant W911NF1810238. Facchinei was partially supported by the Italian Ministry of Education, Research and University, under the PLATINO (PLATform for INnOvative services in future internet) PON project, Grant Agreement No. PON01\(\_\)01007. Kungurtsev was supported by the Czech Science Foundation Project 17-26999S and the OP VVV project CZ.02.1.01/0.0/0.0/16_019/0000765 “Research Center for Informatics”.

Part of this work has been presented to the 50th Asilomar Conference on Signals, Systems, and Computers [6] and the 42nd IEEE International Conference on Acoustics, Speech, and Signal Processing [5]. A two-part preliminary technical report was posted on arxiv on July 2016 [3] and January 2017 [4].

Appendix

Appendix

1.1 Preliminaries

Hereafter, we simplify the notation using \(\tilde{\mathbf {x}}^k\triangleq \mathbf {x}^{k-\mathbf {d}^k}\).

1. On conditional probabilities In our developments, we will consider the conditional expectation of random variables \(\mathbf {Z}\) on \(\varOmega \) of the type \(\mathbb {E}(\mathbf {Z}|\mathcal{F}^k)\). The following simple fact holds.

Proposition 1

Let \({\mathbf {Z}}\) be a random variable defined on \(\varOmega \), and let \(\mathcal{F}^k\) be defined in (5). Then

$$\begin{aligned} \mathbb {E}\left( \mathbf {Z}|\mathcal{F}^k\right) = \sum _{(i,\mathbf {d}) \in \mathcal{N}\times \mathcal{D}} p\left( (i,\mathbf {d}\right) \,|\varvec{\omega }^{0:k})\mathbf {Z}\left( (i,\mathbf {d}),\varvec{\omega }^{0:k}\right) . \end{aligned}$$
(16)

Proof

Recall that \(\mathcal{F}^k\) is the \(\sigma \)-algebra generated by \(\mathcal{C}^{k}\), which is a finite partition of \(\varOmega \). Therefore, one can write [11, Example 5.1.3]

$$\begin{aligned} \mathbb {E}\left( \mathbf {Z}|\mathcal{F}^k\right) = \frac{\mathbb {E}\left( \mathbf {Z};C^k(\varvec{\omega }^{0:k})\right) }{\mathbb {P}\left( C^k(\varvec{\omega }^{0:k})\right) }. \end{aligned}$$

The thesis follows readily from (6) and the fact that \(\mathbf {Z}\) depends only on \(\varvec{\omega }^{0:k+1}\) and takes a finite number of values. \(\square \)

  • 2. Properties of the best response \(\hat{\mathbf {x}}(\cdot )\) We introduce next some basic properties of the best-response maps defined in (3) and (15).

Proposition 2

[13] Given the best-response map \(\hat{\mathbf {x}}(\cdot )\triangleq (\hat{\mathbf {x}}_i(\cdot ))_{i=1}^N\), with \(\hat{\mathbf {x}}_i(\cdot )\) defined in (3). Under Assumptions A-B, the following hold.

  • (a) [Optimality]: for any \(i \in \mathcal {N}\) and \(\mathbf {y} \in \mathcal {X}\),

    $$\begin{aligned} (\hat{\mathbf {x}}_i(\mathbf {y}) - \mathbf {y}_i)^T \nabla _{\mathbf {y}_i} f(\mathbf {y}) + g_i(\hat{\mathbf {x}}_i(\mathbf {y})) - g_i(\mathbf {y}_i) \le - c_{\tilde{f}}\Vert \hat{\mathbf {x}}_i(\mathbf {y}) - \mathbf {y}_i\Vert _2^2; \end{aligned}$$
    (17)
  • (b) [Lipschitz continuity]: for any \(i \in \mathcal {N}\) and \(\mathbf {y}, \mathbf {z}\in \mathcal {X}\),

    $$\begin{aligned} \Vert \hat{\mathbf {x}}_i(\mathbf {y}) - \hat{\mathbf {x}}_i(\mathbf {z})\Vert _2 \le L_{\hat{\mathbf {x}}}\Vert \mathbf {y} - \mathbf {z}\Vert _2, \end{aligned}$$
    (18)

    with \(L_{\hat{\mathbf {x}}}={L_B}/{c_{\tilde{f}}}\);

  • (c) [Fixed-point characterization]: the set of fixed-points of \(\hat{\mathbf {x}}(\cdot )\) coincides with the set of stationary solutions of Problem (P). Therefore \(\hat{\mathbf {x}}(\cdot )\) has at least one fixed point.

Proposition 3

[30] Given the best-response map \(\hat{\mathbf {x}}(\cdot )\triangleq (\hat{\mathbf {x}}_i(\cdot ))_{i=1}^N\), with \(\hat{\mathbf {x}}_i(\cdot )\) defined in (15). Under Assumptions A\(^\prime \)-B-D, the following hold.

  • (a) [Optimality]: For any \(i \in \mathcal {N}\) and \(\mathbf {y} \in \mathcal {K}\),

    $$\begin{aligned} (\hat{\mathbf {x}}_i(\mathbf {y}) - \mathbf {y}_i)^T \nabla _{\mathbf {y}_i} f(\mathbf {y}) + g_i(\hat{\mathbf {x}}_i(\mathbf {y})) - g_i(\mathbf {y}_i) \le - c_{\tilde{f}}\Vert \hat{\mathbf {x}}_i(\mathbf {y}) - \mathbf {y}_i\Vert _2^2; \end{aligned}$$
    (19)
  • (b) [Lipschitz continuity]: For any \(i \in \mathcal {N}\) and \(\mathbf {y}, \mathbf {z}\in \mathcal {K}\),

    $$\begin{aligned} \Vert \hat{\mathbf {x}}_i(\mathbf {y}) - \hat{\mathbf {x}}_i(\mathbf {z})\Vert _2 \le \tilde{L}_{\hat{\mathbf {x}}}\Vert \mathbf {y} - \mathbf {z}\Vert _2^{1/2}, \end{aligned}$$
    (20)

    with \(\tilde{L}_{\hat{\mathbf {x}}}>0\).

  • 3. Young’s Inequality [38] For any \(\alpha , \mu _1, \mu _2 > 0\), there holds

    $$\begin{aligned} \mu _1\mu _2 \le \frac{1}{2}(\alpha \mu _1^2 + \alpha ^{-1}\mu _2^2). \end{aligned}$$
    (21)
  • 4. Representation of \(\tilde{\mathbf {x}}^k\) Since at each iteration only one block of variables is updated, \(\tilde{\mathbf {x}}^k\) can be written as

    $$\begin{aligned} \tilde{\mathbf {x}}^k_i=\mathbf {x}^k_i+\sum \limits _{l\in \bar{\mathcal {K}}_i^k}(\mathbf {x}^l_i-\mathbf {x}^{l+1}_i), \end{aligned}$$
    (22)

    where \(\bar{\mathcal {K}}^k_i\) is defined in Sect. 4.

1.2 Proof of Theorem 1

In this section, the best-response map \(\hat{\mathbf {x}}(\cdot )\) is the one defined in (3).

For any given realization \(\omega \in \varOmega \) and \(k\ge 0\), the following holds:

$$\begin{aligned}&F(\mathbf {x}^{k+1}) \, =\, f(\mathbf {x}^{k+1}) + g(\mathbf {x}^{k+1})\nonumber \\&\quad {\mathop {=}\limits ^{\text {(a)}}}\, f(\mathbf {x}^{k+1}) + \sum \limits _{i \ne i^k} g_i(\mathbf {x}_i^k) + g_{i^k}(\mathbf {x}_{i^k}^{k+1})\nonumber \\&\quad {\mathop {\le }\limits ^{\text {(b)}}} \,f(\mathbf {x}^k) + \gamma \nabla _{\mathbf {x}_{i^k}} f (\tilde{\mathbf {x}}^k)^\text {T}(\hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \mathbf {x}_{i^k}^k)+ \sum \limits _{i \ne i^k} g_i(\mathbf {x}_i^k)+ g_{i^k}(\mathbf {x}_{i^k}^{k+1})\nonumber \\&\qquad \,+(\nabla _{\mathbf {x}_{i^k}} f (\mathbf {x}^k)-\nabla _{\mathbf {x}_{i^k}}f(\tilde{\mathbf {x}}^k))^\text {T}(\gamma (\hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \mathbf {x}_{i^k}^k))+\frac{\gamma ^2L_f}{2}\,\Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \mathbf {x}_{i^k}^k\Vert _2^2\nonumber \\&\quad {\mathop {\le }\limits ^{\text {(c)}}}\, f(\mathbf {x}^k) +\gamma \nabla _{\mathbf {x}_{i^k}} f (\tilde{\mathbf {x}}^k)^\text {T}(\hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \tilde{\mathbf {x}}_{i^k}^k)\nonumber \\&\qquad \,+(\nabla _{\mathbf {x}_{i^k}}f(\mathbf {x}^k)-\nabla _{\mathbf {x}_{i^k}}f(\tilde{\mathbf {x}}^k))^\text {T}(\gamma (\hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \tilde{\mathbf {x}}_{i^k}^k))+ \frac{\gamma ^2L_f}{2}\,\Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \tilde{\mathbf {x}}_{i^k}^k\Vert _2^2\nonumber \\&\qquad \,+ \sum \limits _{i \ne i^k} g_i(\mathbf {x}_i^k) + \gamma g_{i^k}(\hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k))+g_{i^k}(\mathbf {x}_{i^k}^k)-\gamma g_{i^k}(\tilde{\mathbf {x}}_{i^k}^k)\nonumber \\&\quad {\mathop {\le }\limits ^{\text {(d)}}} F(\mathbf {x}^k) -\gamma \,\left( c_{\tilde{f}}-\frac{\gamma L_f}{2}\right) \,\Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k)-\tilde{\mathbf {x}}_{i^k}^k\Vert _2^2\nonumber \\&\qquad \,+L_f\,\Vert \mathbf {x}^k-\tilde{\mathbf {x}}^k\Vert _2\Vert \gamma (\hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \tilde{\mathbf {x}}_{i^k}^k)\Vert _2\nonumber \\&\quad {\mathop {\le }\limits ^{\text {(e)}}} F(\mathbf {x}^k)-\gamma \,\left( c_{\tilde{f}}-\gamma L_f\right) \,\Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k)-\tilde{\mathbf {x}}_{i^k}^k\Vert _2^2+\frac{L_f}{2}\,\Vert \mathbf {x}^k-\tilde{\mathbf {x}}^k\Vert _2^2\nonumber \\&\quad {\mathop {=}\limits ^{\text {(f)}}} F(\mathbf {x}^k)-\gamma \,\left( c_{\tilde{f}}-\gamma L_f\right) \,\Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k)-\mathbf {x}_{i^k}^k\Vert _2^2+\frac{L_f}{2}\,\Vert \mathbf {x}^k-\tilde{\mathbf {x}}^k\Vert _2^2, \end{aligned}$$
(23)

where (a) follows from the updating rule of the algorithm; in (b) we used the Descent Lemma on f; (c) comes from the convexity of \(g_i\) and C3; in (d) we used Proposition 2 and A3; (e) is due to the Young’s inequality; and (f) is due to C3.

We bound \(\Vert \mathbf {x}^k-\tilde{\mathbf {x}}^k\Vert _2^2\) as follows:

$$\begin{aligned}&\Vert \mathbf {x}^k-\tilde{\mathbf {x}}^k\Vert _2^2\, {\mathop {\le }\limits ^{\text {(a)}}}\,\left( \sum \limits _{l=k-\delta }^{k-1}\Vert \mathbf {x}^{l+1}-\mathbf {x}^l\Vert _2\right) ^2\, {\mathop {\le }\limits ^{\text {(b)}}}\,\delta \sum \limits _{l=k-\delta }^{k-1}\Vert \mathbf {x}^{l+1}-\mathbf {x}^l\Vert _2^2\nonumber \\&\quad =\delta \left( \sum \limits _{l=k-\delta }^{k-1}\left( l-(k-1)+\delta \right) \,\Vert \mathbf {x}^{l+1}-\mathbf {x}^l\Vert _2^2 -\sum \limits _{l=k+1-\delta }^{k}\left( l-k+\delta \right) \,\Vert \mathbf {x}^{l+1}-\mathbf {x}^l\Vert _2^2\right) \nonumber \\&\qquad +\delta ^2\gamma ^2\,\Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k)-\mathbf {x}_{i^k}^k\Vert _2^2, \end{aligned}$$
(24)

where (a) comes from (22); and (b) is due to the Jensen’s inequality.

Using (24) in (23), the Lyapunov function (8), and rearranging the terms, the following holds: for all \(k\ge 0\),

$$\begin{aligned}&\tilde{F}(\mathbf {x}^{k+1}\ldots ,\mathbf {x}^{k+1-\delta })\nonumber \\&\quad \le \tilde{F}(\mathbf {x}^k,\ldots ,\mathbf {x}^{k-\delta })-\gamma \,\left( c_{\tilde{f}}-\gamma \,\left( L_f+\frac{\delta ^2L_f}{2}\right) \right) \,\Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k)-\mathbf {x}^k_{i^k}\Vert _2^2; \end{aligned}$$
(25)

and

$$\begin{aligned}&\tilde{F}(\mathbf {x}^{k+T}\ldots ,\mathbf {x}^{k+T-\delta }) \le \tilde{F}(\mathbf {x}^{k+T-1},\ldots ,\mathbf {x}^{k+T-1-\delta })\nonumber \\&\qquad -\gamma \,\left( c_{\tilde{f}}-\gamma \,\left( L_f+\frac{\delta ^2L_f}{2}\right) \right) \,\Vert \hat{\mathbf {x}}_{i^{k+T-1}}(\tilde{\mathbf {x}}^{k+T-1})-\mathbf {x}^{k+T-1}_{i^{k+T-1}}\Vert _2^2\nonumber \\&\quad \le \tilde{F}(\mathbf {x}^k,\ldots ,\mathbf {x}^{k-\delta })-\gamma \left( c_{\tilde{f}}-\gamma \,\left( L_f+\frac{\delta ^2L_f}{2}\right) \right) \,\sum \limits _{t=k}^{k+T-1}\Vert \hat{\mathbf {x}}_{i^t}(\tilde{\mathbf {x}}^t)-\mathbf {x}^t_{i^t}\Vert _2^2. \end{aligned}$$
(26)

Taking conditional expectation both sides we have that the following holds a.s.:

$$\begin{aligned}&\mathbb {E}\left( \tilde{F}(\underline{\mathbf {x}}^{k+T}\ldots ,\underline{\mathbf {x}}^{k+T-\delta }) |\mathcal {F}^{k-1}\right) \le \tilde{F}(\underline{\mathbf {x}}^k,\ldots ,\underline{\mathbf {x}}^{k-\delta })\nonumber \\&\quad -\gamma \left( c_{\tilde{f}}-\gamma \,\left( L_f+\frac{\delta ^2L_f}{2}\right) \right) \,\sum \limits _{t=k}^{k+T-1}\mathbb {E}\left( \Vert \hat{\mathbf {x}}_{\underline{i}^t}(\underline{\tilde{\mathbf {x}}}^t)-\mathbf {x}^t_{\underline{i}^t}\Vert _2^2|\mathcal {F}^{t-1}\right) . \end{aligned}$$
(27)

Using (9), (27), A5, and the Martingale’s theorem [29], we deduce that (i) \(\{\tilde{F}(\underline{\mathbf {x}}^k,\ldots ,\underline{\mathbf {x}}^{k-\delta })\}_{k\in \mathbb {N}_+}\), and thus \(\{{F}(\underline{\mathbf {x}}^k,\ldots ,\underline{\mathbf {x}}^{k-\delta })\}_{k\in \mathbb {N}_+}\) converge a.s., (ii) \(\{\underline{\mathbf {x}}^k\}_{k\in \mathbb {N}_+}\) is bounded on \(\mathcal {X}\) a.s., and (iii)

$$\begin{aligned} \lim \limits _{k\rightarrow +\infty }\sum \limits _{t=k}^{k+T-1}\mathbb {E}\left( \Vert \hat{\mathbf {x}}_{\underline{i}^t}(\underline{\tilde{\mathbf {x}}}^t)-\underline{\mathbf {x}}_{\underline{i}^t}^t\Vert _2|\mathcal {F}^{t-1}\right) =0,\quad \text { a.s.} \end{aligned}$$
(28)

From (28), it follows that there exists a set \(\bar{\varOmega }\subseteq \varOmega \), with \(\mathbb {P}(\bar{\varOmega })=1\), such that for any \(\omega \in \bar{\varOmega }\),

$$\begin{aligned}&\sum \limits _{t=k}^{k+T-1}\mathbb {E}\left( \Vert \hat{\mathbf {x}}_{\underline{i}^t}(\underline{\tilde{\mathbf {x}}}^t) -\mathbf {x}^t_{\underline{i}^t}\Vert _2^2|\mathcal {F}^{t-1}\right) \nonumber \\&\quad {\mathop {=}\limits ^{(a)}}\sum \limits _{t=k}^{k+T-1}\sum \limits _{(i,\mathbf {d})\in \mathcal {N}\times \mathcal {D}}p\left( (i,\mathbf {d})|\varvec{\omega }^{0:t-1}\right) \,\Vert \hat{\mathbf {x}}_i(\tilde{\mathbf {x}}^t)-\mathbf {x}^t_i\Vert _2\nonumber \\&\quad {\mathop {\ge }\limits ^{(b)}}p_{\text {min}}\sum \limits _{i=1}^N\Vert \hat{\mathbf {x}}_i(\tilde{\mathbf {x}}^{k+t_k(i)})-\mathbf {x}_i^{k+t_k(i)}\Vert _2, \end{aligned}$$
(29)

where in (a) we used (16); and in (b) we used C2 and defined \(t_k(i)\triangleq \min \{t\in [0;T]|p(i|\omega ^{0:t+k-1})\ge p_{\text {min}}\}\). We also have:

$$\begin{aligned}&\Vert \hat{\mathbf {x}}(\mathbf {x}^k)-\mathbf {x}^k\Vert _2 \le \sum \limits _{i=1}^N\Vert \hat{\mathbf {x}}_i(\mathbf {x}^k)-\mathbf {x}_i^k\Vert _2\nonumber \\&\quad {\mathop {\le }\limits ^{(a)}}\sum \limits _{i=1}^N\left( \Vert \hat{\mathbf {x}}_i(\tilde{\mathbf {x}}^{k+t_k(i)})-\mathbf {x}_i^{k+t_k(i)}\Vert _2+(1+L_{\hat{\mathbf {x}}})\,\Vert \tilde{\mathbf {x}}^{k+t_k(i)}-\mathbf {x}^k\Vert _2\right) \nonumber \\&\quad {\mathop {\le }\limits ^{(b)}}\sum \limits _{i=1}^N\Biggl (\Vert \hat{\mathbf {x}}_i(\tilde{\mathbf {x}}^{k+t_k(i)})-\mathbf {x}_i^{k+t_k(i)}\Vert _2+(1+L_{\hat{\mathbf {x}}})\biggl (\Vert \mathbf {x}^{k+t_k(i)}-\mathbf {x}^k\Vert _2\nonumber \\&\qquad +\sum \limits _{l=k+t_k(i)-\delta }^{k+t_k(i)-1}\Vert \mathbf {x}^{l+1}-\mathbf {x}^l\Vert _2\biggr )\Biggr )\nonumber \\&\quad {\mathop {\le }\limits ^{(c)}}\sum \limits _{i=1}^N\left( \Vert \hat{\mathbf {x}}_i(\tilde{\mathbf {x}}^{k+t_k(i)})-\mathbf {x}_i^{k+t_k(i)}\Vert _2+2\gamma (1+L_{\hat{\mathbf {x}}})\sum \limits _{l=k-\delta }^{k+T-1}\Vert \hat{\mathbf {x}}_{i^l}(\tilde{\mathbf {x}}^l)-\mathbf {x}_{i^l}^l\Vert _2\right) , \end{aligned}$$
(30)

where in (a) we used Proposition 2; (b) comes from (22); and (c) from the updating rule of the algorithm. We deduce from (28), (29), and (30), that

$$\begin{aligned} \lim \limits _{k\rightarrow +\infty }\Vert \hat{\mathbf {x}}(\mathbf {x}^k)-\mathbf {x}^k\Vert _2=0. \end{aligned}$$
(31)

Since the sequence \(\{\mathbf {x}^k\}_{k\in \mathbb {N}_+}\) is bounded, it has at least one limit point \(\bar{\mathbf {x}}\) that belongs to \(\mathcal {X}\). By the continuity of \(\hat{\mathbf {x}}(\cdot )\) (see Proposition 2) and (31), it must be \(\hat{\mathbf {x}}(\bar{\mathbf {x}})=\bar{\mathbf {x}}\), and thus by Proposition 2\(\bar{\mathbf {x}}\) is a stationary solution of Problem (P). Since (31) holds for any \(\omega \in \bar{\varOmega }\), the previous results hold a.s..

Let us now define:

$$\begin{aligned} \hat{\mathbf {y}}_i(\mathbf {x}^k)=\underset{\mathbf {y}_i\in \mathcal {X}_i}{\text {argmin}}\left\{ \nabla _{\mathbf {x}_i}f(\mathbf {x}^k)^\text {T}(\mathbf {y}_i-\mathbf {x}_i^k)+g_i(\mathbf {y}_i)+\frac{1}{2}\,\Vert \mathbf {y}_i-\mathbf {x}_i^k\Vert ^2_2\right\} , \end{aligned}$$
(32)

and note that \(M_F(\mathbf {x})=[\mathbf {x}_1^k-\hat{\mathbf {y}}_1(\mathbf {x}^k),\ldots ,\mathbf {x}_N^k-\hat{\mathbf {y}}_N(\mathbf {x}^k)]^\text {T}\). It is easy to check that \(\hat{\mathbf {y}}(\cdot )\) is \(L_{\hat{\mathbf {y}}}\)-Lipschitz continuous on \(\mathcal {X}\), with \(L_{\hat{\mathbf {y}}}\triangleq L_f+1\). Fix a realization \(\omega \in \varOmega \). The optimality of \(\hat{\mathbf {y}}_{i^k}(\mathbf {x}^k)\) along with the convexity of \(g_{i^k}\), leads

$$\begin{aligned}&\left( \nabla _{\mathbf {x}_{i^k}} f(\mathbf {x}^k) + \hat{\mathbf {y}}_{i^k}(\mathbf {x}^k)- \mathbf {x}^k_{i^k}\right) ^\text {T}\left( \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \hat{\mathbf {y}}_{i^k}(\mathbf {x}^k)\right) \end{aligned}$$
(33)
$$\begin{aligned}&\qquad +g_{i^k}(\hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k)) - g_{i^k}(\hat{\mathbf {y}}_{i^k}(\mathbf {x}^k)) \ge 0. \end{aligned}$$
(34)

Similarly, one can write for \(\hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k)\):

$$\begin{aligned} \nabla \tilde{f}_{i^k}(\hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k);\tilde{\mathbf {x}}^k)^\text {T}\left( \hat{\mathbf {y}}_{i^k}(\mathbf {x}^k) - \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k)\right) + g_{i^k}(\hat{\mathbf {y}}_{i^k}(\mathbf {x}^k)) - g_{i^k}(\hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k))) \ge 0. \qquad \end{aligned}$$
(35)

Summing (34) and (35), adding and subtracting \(\hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k)\), and

using the gradient consistency B2, yield

$$\begin{aligned}&\left( \nabla \tilde{f}_{i^k}(\mathbf {x}^k_{i^k};\mathbf {x}^k) - \nabla \tilde{f}_{i^k}(\hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k);\tilde{\mathbf {x}}^k) + \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \mathbf {x}^k_{i^k}\right) ^\text {T}\nonumber \\&\quad \left( \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \hat{\mathbf {y}}_{i^k}(\mathbf {x}^k)\right) \ge \Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \hat{\mathbf {y}}_{i^k}(\mathbf {x}^k)\Vert _2^2. \end{aligned}$$
(36)

Summing and subtracting \(\nabla \tilde{f}_{i^k}(\hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k);\mathbf {x}^k)\) and using the triangular inequality, the LHS of (36) can be upper bounded as

$$\begin{aligned}&\Vert \nabla \tilde{f}_{i^k}(\hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k);\mathbf {x}^k) - \nabla \tilde{f}_{i^k}(\hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k);\tilde{\mathbf {x}}^k)\Vert _2\nonumber \\&\qquad + \Vert \nabla \tilde{f}_{i^k}(\mathbf {x}^k_{i^k};\mathbf {x}^k) - \nabla \tilde{f}_{i^k}(\hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k);\mathbf {x}^k)\Vert _2 + \Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \mathbf {x}^k_{i^k}\Vert _2 \nonumber \\&\quad \ge \Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \hat{\mathbf {y}}_{i^k}(\mathbf {x}^k)\Vert _2. \end{aligned}$$
(37)

We can further upper-bound the left hand side invoking B3 and B4, and write:

$$\begin{aligned} \Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \hat{\mathbf {y}}_{i^k}(\mathbf {x}^k)\Vert _2 \le (1+L_E)\,\Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \mathbf {x}^k_{i^k}\Vert _2 +L_B\,\Vert \mathbf {x}^k-\tilde{\mathbf {x}}^k\Vert _2\, . \end{aligned}$$
(38)

Finally, squaring both sides, we get

$$\begin{aligned} \Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \hat{\mathbf {y}}_{i^k}(\mathbf {x}^k)\Vert _2^2&\le \; (1+L_E)^2\,\Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \mathbf {x}^k_{i^k}\Vert _2^2 + L_B^2\,\Vert \mathbf {x}^k-\tilde{\mathbf {x}}^k \Vert _2^2\nonumber \\&\qquad + 2L_B(1+L_E)\,\Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \mathbf {x}^k_{i^k}\Vert _2\,\Vert \mathbf {x}^k-\tilde{\mathbf {x}}^k\Vert _2. \end{aligned}$$
(39)

We bound next the term \(\Vert \mathbf {x}_{i^k}^k - \hat{\mathbf {y}}_{i^k}(\mathbf {x}^k)\Vert _2^2\). We write

$$\begin{aligned}&\Vert \mathbf {x}_{i^k}^k - \hat{\mathbf {y}}_{i^k}(\mathbf {x}^k)\Vert _2^2 = \Vert \mathbf {x}_{i^k}^k - \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) +\hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k)- \hat{\mathbf {y}}_{i^k}(\mathbf {x}^k)\Vert _2^2 \nonumber \\&\quad \le \;2\,\left( \Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \mathbf {x}_{i^k}^k\Vert _2^2 + \Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k)- \hat{\mathbf {y}}_{i^k}(\mathbf {x}^k)\Vert _2^2\right) \nonumber \\&\quad {\mathop {\le }\limits ^{(a)}} \left( 2+2(1+L_E)^2\right) \,\Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \mathbf {x}_{i^k}^k\Vert _2^2+ 2L_B^2\,\Vert \mathbf {x}^k-\tilde{\mathbf {x}}^k \Vert _2^2\nonumber \\&\qquad + 4L_B(1+L_E)\,\Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k) - \mathbf {x}^k_{i^k}\Vert _2\,\Vert \mathbf {x}^k-\tilde{\mathbf {x}}^k \Vert _2\nonumber \\&{\mathop {\le }\limits ^{(b)}} 2\left( 1+(1+L_E)(1+L_B+L_E)\right) \,\Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k)-\mathbf {x}^k_{i^k}\Vert _2^2\nonumber \\&\quad \qquad +2L_B(1+L_B+L_E)\,\Vert \mathbf {x}^k-\tilde{\mathbf {x}}^k\Vert _2^2, \end{aligned}$$
(40)

where (a) comes from (39); and (b) follows from the Young’s inequality. Note that

$$\begin{aligned} \Vert \mathbf {x}^k-\tilde{\mathbf {x}}^k\Vert _2^2&=\sum \limits _{i=1}^N\Vert \mathbf {x}^k_i-\tilde{\mathbf {x}}^k_i\Vert _2^2\, {\mathop {\le }\limits ^{(a)}}\,\sum \limits _{i=1}^N\left( \sum \limits _{l\in \bar{\mathcal {K}}_i^k}\Vert \mathbf {x}^{l+1}-\mathbf {x}^l\Vert _2\right) ^2\nonumber \\&\quad {\mathop {\le }\limits ^{(b)}}\sum \limits _{i=1}^N\bar{M}_i^k\sum \limits _{l\in \bar{\mathcal {K}}_i^k}\Vert \mathbf {x}^{l+1}-\mathbf {x}^l\Vert _2^2\, =\,\gamma ^2\sum \limits _{i=1}^N\bar{M}_i^k\sum \limits _{l\in \bar{\mathcal {K}}_i^k}\Vert \hat{\mathbf {x}}_{i^l}(\tilde{\mathbf {x}}^l)-\mathbf {x}_{i^l}^l\Vert _2^2, \end{aligned}$$
(41)

where (a) comes from (22); and in (b) we used the Jensen’s inequality and defined \(\bar{M}_i^k\triangleq |\bar{\mathcal {K}}_i^k|\). Combining (40) and (41), we get:

$$\begin{aligned} \Vert \mathbf {x}_{i^k}^k - \hat{\mathbf {y}}_{i^k}(\mathbf {x}^k)\Vert _2^2&\le 2\left( 1+(1+L_E)(1+L_B+L_E)\right) \,\Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k)-\mathbf {x}^k_{i^k}\Vert _2^2\nonumber \\&\quad +2\gamma ^2L_B(1+L_B+L_E)\sum \limits _{i=1}^N\bar{M}_i^k\sum \limits _{l\in \bar{\mathcal {K}}_i^k}\Vert \hat{\mathbf {x}}_{i^l}(\tilde{\mathbf {x}}^l)-\mathbf {x}_{i^l}^l\Vert _2^2.\nonumber \\ \end{aligned}$$
(42)

We take now the conditional expectation of the term on the LHS of (42), and obtain

$$\begin{aligned}&\sum \limits _{t=k}^{k+T}\mathbb {E}\left( \Vert \mathbf {x}_{\underline{i}^t}^t - \hat{\mathbf {y}}_{\underline{i}^t}(\mathbf {x}^t)\Vert _2^2|\mathcal {F}^{t-1}\right) (\omega )\, {\mathop {=}\limits ^{(a)}}\,\sum \limits _{t=k}^{k+T}\sum \limits _{i=1}^Np(i|\omega ^{0:t-1})\,\Vert \mathbf {x}_i^t - \hat{\mathbf {y}}_i(\mathbf {x}^t)\Vert _2^2\nonumber \\&\quad {\mathop {\ge }\limits ^{(b)}}\sum \limits _{i=1}^Np_{\text {min}}\,\Vert \mathbf {x}_i^{k+t_k(i)}-\hat{\mathbf {y}}_i(\mathbf {x}^{k+t_k(i)})\Vert _2^2\nonumber \\&\quad {\mathop {\ge }\limits ^{(c)}} p_{\text {min}}\sum \limits _{i=1}^N\left( \Vert \mathbf {x}_i^k-\hat{\mathbf {y}}_i(\mathbf {x}^k)\Vert _2-\Vert \mathbf {x}_i^{k+t_k(i)}-\hat{\mathbf {y}}_i(\mathbf {x}^{k+t_k(i)})-\mathbf {x}_i^k+\hat{\mathbf {y}}_i(\mathbf {x}^k)\Vert _2\right) ^2\nonumber \\&\quad \ge p_{\text {min}}\sum \limits _{i=1}^N\Bigl (\Vert \mathbf {x}_i^k-\hat{\mathbf {y}}_i(\mathbf {x}^k)\Vert _2^2 \nonumber \\&\quad \quad -2\Vert \mathbf {x}_i^k-\hat{\mathbf {y}}_i(\mathbf {x}^k)\Vert _2\,\Vert \mathbf {x}_i^{k+t_k(i)} -\hat{\mathbf {y}}_i(\mathbf {x}^{k+t_k(i)})-\mathbf {x}_i^k+\hat{\mathbf {y}}_i(\mathbf {x}^k)\Vert _2\Bigr ), \end{aligned}$$
(43)

where in (a) we used (16); (b) follows from C2; and in (c) we used the reverse triangle inequality. By (43) and (42), we obtain:

$$\begin{aligned}&p_{\text {min}}\sum \limits _{i=1}^N\Vert \mathbf {x}_i^k-\hat{\mathbf {y}}_i(\mathbf {x}^k)\Vert _2^2\, =\, p_{\text {min}}\,\Vert M_F(\mathbf {x}^k)\Vert ^2_2\nonumber \\&\quad \le \sum \limits _{t=k}^{k+T}\Biggr (2\left( 1+(1+L_E)(1+L_B+L_E)\right) \,\mathbb {E}\left( \Vert \hat{\mathbf {x}}_{\underline{i}^t}(\underline{\tilde{\mathbf {x}}}^t)-\mathbf {x}^t_{\underline{i}^t}\Vert _2^2|\mathcal {F}^{t-1}\right) (\omega )\nonumber \\&\qquad +2\gamma ^2L_B(1+L_B+L_E)\sum \limits _{i=1}^N\bar{M}_i^t\sum \limits _{l\in \bar{\mathcal {K}}_i^t}\Vert \hat{\mathbf {x}}_{i^l}(\tilde{\mathbf {x}}^l)-\mathbf {x}_{i^l}^l\Vert _2^2\Biggl )\nonumber \\&\qquad +2p_{\text {min}}\sum \limits _{i=1}^N\Vert \mathbf {x}_i^k-\hat{\mathbf {y}}_i(\mathbf {x}^k)\Vert _2\,\Vert \mathbf {x}_i^{k+t_k(i)}-\hat{\mathbf {y}}_i(\mathbf {x}^{k+t_k(i)})-\mathbf {x}_i^k+\hat{\mathbf {y}}_i(\mathbf {x}^k)\Vert _2\nonumber \\&\quad {\mathop {\le }\limits ^{(a)}}2\left( 1+(1+L_E)(1+L_B+L_E)\right) \,\sum \limits _{t=k}^{k+T}\mathbb {E}\left( \Vert \hat{\mathbf {x}}_{\underline{i}^t}(\underline{\tilde{\mathbf {x}}}^t)-\mathbf {x}^t_{\underline{i}^t}\Vert _2^2|\mathcal {F}^{t-1}\right) (\omega )\nonumber \\&\qquad +2T\gamma ^2L_B(1+L_B+L_E)\,\sum \limits _{i=1}^NM_i^k\sum \limits _{l\in \mathcal {K}_i^k}\Vert \hat{\mathbf {x}}_{i^l}(\tilde{\mathbf {x}}^l)-\mathbf {x}_{i^l}^l\Vert _2^2 +p_{\text {min}}\alpha \,\Vert M_F(\mathbf {x}^k)\Vert _2^2\nonumber \\&\qquad +p_{\text {min}}\alpha ^{-1}\sum \limits _{i=1}^N\Vert \mathbf {x}_i^{k+t_k(i)}-\hat{\mathbf {y}}_i(\mathbf {x}^{k+t_k(i)})-\mathbf {x}_i^k+\hat{\mathbf {y}}_i(\mathbf {x}^k)\Vert _2^2\nonumber \\&\quad {\mathop {\le }\limits ^{(b)}}2\left( 1+(1+L_E)(1+L_B+L_E)\right) \sum \limits _{t=k}^{k+T}\mathbb {E}\left( \Vert \hat{\mathbf {x}}_{\underline{i}^t}(\underline{\tilde{\mathbf {x}}}^t)-\mathbf {x}^t_{\underline{i}^t}\Vert _2^2|\mathcal {F}^{t-1}\right) (\omega )\nonumber \\&\qquad +2T\gamma ^2L_B(1+L_B+L_E)\sum \limits _{i=1}^NM_i^k\sum \limits _{l\in \mathcal {K}_i^k}\Vert \hat{\mathbf {x}}_{i^l}(\tilde{\mathbf {x}}^l)-\mathbf {x}_{i^l}^l\Vert _2^2 +p_{\text {min}}\alpha \,\Vert M_F(\mathbf {x}^k)\Vert _2^2\nonumber \\&\qquad +2p_{\text {min}}\alpha ^{-1}\sum \limits _{i=1}^N\left( \Vert \mathbf {x}^{k+t_k(i)}_i-\mathbf {x}_i^k\Vert _2^2+\Vert \hat{\mathbf {y}}_i(\mathbf {x}^{k+t_k(i)})-\hat{\mathbf {y}}_i(\mathbf {x}^k)\Vert _2^2\right) \nonumber \\&\quad {\mathop {\le }\limits ^{(c)}}2\left( 1+(1+L_E)(1+L_B+L_E)\right) \sum \limits _{t=k}^{k+T}\mathbb {E}\left( \Vert \hat{\mathbf {x}}_{\underline{i}^t}(\underline{\tilde{\mathbf {x}}}^t)-\mathbf {x}^t_{\underline{i}^t}\Vert _2^2|\mathcal {F}^{t-1}\right) (\omega )\nonumber \\&\qquad +2T\gamma ^2L_B(1+L_B+L_E)\sum \limits _{i=1}^NM_i^k\sum \limits _{l\in \mathcal {K}_i^k}\Vert \hat{\mathbf {x}}_{i^l}(\tilde{\mathbf {x}}^l)-\mathbf {x}_{i^l}^l\Vert _2^2 +p_{\text {min}}\alpha \,\Vert M_F(\mathbf {x}^k)\Vert _2^2\nonumber \\&\qquad +2\gamma ^2p_{\text {min}}\alpha ^{-1}(1+L_{\hat{\mathbf {y}}}^2)\sum \limits _{i=1}^N\sum \limits _{l=k}^{k+t_k(i)-1}\Vert \hat{\mathbf {x}}_{i^l}(\tilde{\mathbf {x}}^l)-\mathbf {x}^l_{i^l}\Vert _2^2 \nonumber \\&\quad \le 2\left( 1+(1+L_E)(1+L_B+L_E)\right) \sum \limits _{t=k}^{k+T}\mathbb {E}\left( \Vert \hat{\mathbf {x}}_{\underline{i}^t}(\underline{\tilde{\mathbf {x}}}^t)-\mathbf {x}^t_{\underline{i}^t}\Vert _2^2|\mathcal {F}^{t-1}\right) (\omega )\nonumber \\&\qquad +2T\gamma ^2L_B(1+L_B+L_E)\sum \limits _{i=1}^NM_i^k\sum \limits _{l\in \mathcal {K}_i^k}\Vert \hat{\mathbf {x}}_{i^l}(\tilde{\mathbf {x}}^l)-\mathbf {x}_{i^l}^l\Vert _2^2 +p_{\text {min}}\alpha \,\Vert M_F(\mathbf {x}^k)\Vert _2^2\nonumber \\&\qquad +2\gamma ^2p_{\text {min}}\alpha ^{-1}(1+L_{\hat{\mathbf {y}}}^2)N\sum \limits _{l=k}^{k+T-1}\Vert \hat{\mathbf {x}}_{i^l}(\tilde{\mathbf {x}}^l)-\mathbf {x}^l_{i^l}\Vert _2^2, \end{aligned}$$
(44)

where in (a) we used the Young’s inequality and the definition of \(M_i^k\) (cf. Sect. 4); in (b) we used the triangle and Jensen’s inequalities; and (c) comes from the updating rule of the algorithm. Rearranging the terms and taking expectation of both sides, we get:

$$\begin{aligned}&\mathbb {E}\left( \Vert M_F(\underline{\mathbf {x}}^k)\Vert ^2_2\right) \nonumber \\&\quad \le \frac{2\left( 1+(1+L_E)(1+L_B+L_E)+\gamma ^2Np_{\text {min}}\alpha ^{-1}(1+L_{\hat{\mathbf {y}}}^2)\right) }{p_{\text {min}}-\alpha p_{\text {min}}}\nonumber \\&\qquad \sum \limits _{t=k}^{k+T}\mathbb {E}\left( \Vert \hat{\mathbf {x}}_{\underline{i}^t}(\underline{\tilde{\mathbf {x}}}^t)-\mathbf {x}^t_{\underline{i}^t}\Vert _2^2\right) \nonumber \\&\qquad +\frac{2T\gamma ^2 L_B(1+L_B+L_E)}{p_{\text {min}}-\alpha p_{\text {min}}}\,\mathbb {E}\left( \sum \limits _{i=1}^N\underline{M}_i^k\sum \limits _{t\in \underline{\mathcal {K}}_i^k}\Vert \hat{\mathbf {x}}_{\underline{i}^t}(\underline{\tilde{\mathbf {x}}}^t)-\mathbf {x}^t_{\underline{i}^t}\Vert _2^2\right) . \end{aligned}$$
(45)

Invoking (9) and (25), we can write

$$\begin{aligned}&\Vert \hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k)-\mathbf {x}^k_{i^k}\Vert _2^2\nonumber \\&\quad \le \frac{1}{\gamma \left( c_{\tilde{f}}-\gamma \left( L_f+\frac{\delta ^2L_f}{2}\right) \right) }\left( \tilde{F}(\mathbf {x}^k,\ldots ,\mathbf {x}^{k-\delta })-\tilde{F}(\mathbf {x}^{k+1}\ldots ,\mathbf {x}^{k+1-\delta })\right) . \end{aligned}$$
(46)

Using this bound in (45), we get

$$\begin{aligned}&\mathbb {E}\left( \Vert M_F(\underline{\mathbf {x}}^k)\Vert _2^2\right) \le C_1\sum \limits _{t=k}^{k+T}\mathbb {E}\left( \tilde{F}(\underline{\mathbf {x}}^t,\ldots ,\underline{\mathbf {x}}^{t-\delta })-\tilde{F}(\underline{\mathbf {x}}^{t+1},\ldots ,\underline{\mathbf {x}}^{t+1-\delta })\right) \nonumber \\&\quad +\gamma ^2C_2\,\mathbb {E}\left( \sum \limits _{i=1}^N\underline{M}_i^k\sum \limits _{t\in \underline{\mathcal {K}}_i^k}\left( \tilde{F}(\underline{\mathbf {x}}^t,\ldots ,\underline{\mathbf {x}}^{t-\delta })-\tilde{F}(\underline{\mathbf {x}}^{t+1},\ldots ,\underline{\mathbf {x}}^{t+1-\delta })\right) \right) . \end{aligned}$$
(47)

Finally,

$$\begin{aligned} K_\epsilon \epsilon&\le \sum \limits _{k=0}^{K_\epsilon }\mathbb {E}\left( \Vert M_F(\underline{\mathbf {x}}^k)\Vert _2^2\right) \nonumber \\&\le C_1\sum \limits _{k=0}^{K_\epsilon }\mathbb {E}\left( \tilde{F}(\underline{\mathbf {x}}^{k},\ldots ,\underline{\mathbf {x}}^{k-\delta })-\tilde{F}(\underline{\mathbf {x}}^{k+T+1},\ldots ,\underline{\mathbf {x}}^{k+T+1-\delta })\right) \nonumber \\&\quad +\gamma ^2C_2\,\mathbb {E}\left( \sum \limits _{i=1}^N\underline{M}_i^k\sum \limits _{t\in \underline{\mathcal {K}}_i^k}\left( \tilde{F}(\underline{\mathbf {x}}^t,\ldots ,\underline{\mathbf {x}}^{t-\delta })-\tilde{F}(\underline{\mathbf {x}}^{t+1},\ldots ,\underline{\mathbf {x}}^{t+1-\delta })\right) \right) \nonumber \\&\quad \le C_1(T+1)(F(\mathbf {x}^0)-F^*)\nonumber \\&\quad +C_2\gamma ^2\sum \limits _{k=0}^{K_\epsilon }\mathbb {E}\left( \sum \limits _{i=1}^N\underline{M}_i^k\sum \limits _{t\in \underline{\mathcal {K}}_i^k}\left( \tilde{F}(\underline{\mathbf {x}}^t,\ldots ,\underline{\mathbf {x}}^{t-\delta })-\tilde{F}(\underline{\mathbf {x}}^{t+1},\ldots ,\underline{\mathbf {x}}^{t+1-\delta })\right) \right) . \end{aligned}$$
(48)

This completes the proof.

1.3 Proof of Theorem 2

In this section, the best-response map \(\hat{\mathbf {x}}(\cdot )\) is the one defined in (15).

Statement (ii) of the theorem follow readily from the feasibility of \(\mathbf {x}^0\in \mathcal {K}\) and the fact that \(\mathbf {x}^{k+1}_{i^k}=\mathbf {x}^k_{i^k}+\gamma (\hat{\mathbf {x}}_{i^k}(\tilde{\mathbf {x}}^k)-\mathbf {x}^k_{i^k})\) is a convex combinations of points in \(\mathcal {K}_{i^k}(\mathbf {x}^k_{i^k})\).

To prove statement (ii), let us fix a realization \(\omega \in \varOmega \). Following the steps from (23) to (27), one can prove that the following holds a.s.:

$$\begin{aligned}&\mathbb {E}\left( \tilde{F}(\underline{\mathbf {x}}^{k+T}\ldots ,\underline{\mathbf {x}}^{k+T-\delta })|\mathcal {F}^{k-1}\right) \le \tilde{F}(\underline{\mathbf {x}}^k,\ldots ,\underline{\mathbf {x}}^{k-\delta })\nonumber \\&\quad -\gamma \left( c_{\tilde{f}}-\gamma \left( L_f+\frac{\delta ^2L_f}{2}\right) \right) \sum \limits _{t=k}^{k+T-1}\mathbb {E}\left( \Vert \hat{\mathbf {x}}_{\underline{i}^t}(\underline{\tilde{\mathbf {x}}}^t)-\mathbf {x}^t_{\underline{i}^t}\Vert _2^2|\mathcal {F}^{t-1}\right) . \end{aligned}$$
(49)

Using (9), (49) and A5’, we deduce that (i) \(\left\{ {F}(\underline{\mathbf {x}}^k,\ldots ,\underline{\mathbf {x}}^{k-\delta })\right\} _{k\in \mathbb {N}_+}\) converges a.s., and (ii)

$$\begin{aligned} \lim \limits _{k\rightarrow +\infty }\sum \limits _{t=k}^{k+T-1}\mathbb {E}\left( \Vert \hat{\mathbf {x}}_{\underline{i}^t}(\underline{\tilde{\mathbf {x}}}^t)-\underline{\mathbf {x}}_{\underline{i}^t}^t\Vert _2|\mathcal {F}^{t-1}\right) =0\quad \text {a.s.} \end{aligned}$$
(50)

It follows from (50) and C2 that

$$\begin{aligned} \lim \limits _{k\rightarrow +\infty }\sum \limits _{i=1}^N\Vert \hat{\mathbf {x}}_i(\tilde{\mathbf {x}}^{k+t_k(i)})-\mathbf {x}_i^{k+t_k(i)}\Vert _2=0\quad \text { a.s.} \end{aligned}$$
(51)

Therefore, there exists a set \(\bar{\varOmega }\subseteq \varOmega \), with \(\mathbb {P}(\bar{\varOmega })=1\), such that, for any \(\omega \in \bar{\varOmega }\),

$$\begin{aligned}&\Vert \hat{\mathbf {x}}(\mathbf {x}^k)-\mathbf {x}^k\Vert _2\,\le \,\sum \limits _{i=1}^N\Vert \hat{\mathbf {x}}_i(\mathbf {x}^k) -\mathbf {x}_i^k\Vert _2\, {\mathop {\le }\limits ^{(a)}}\,\sum \limits _{i=1}^N\biggl (\Vert \hat{\mathbf {x}}_i(\tilde{\mathbf {x}}^{k+t_k(i)})-\mathbf {x}_i^{k+t_k(i)}\Vert _2\nonumber \\&\qquad +\Vert \tilde{\mathbf {x}}^{k+t_k(i)}-\mathbf {x}^k\Vert _2^{1/2}\,\left( \Vert \tilde{\mathbf {x}}^{k+t_k(i)}-\mathbf {x}^k\Vert _2^{1/2}+\tilde{L}_{\hat{\mathbf {x}}}\biggr )\right) \nonumber \\&\quad {\mathop {\le }\limits ^{(b)}}\sum \limits _{i=1}^N\Biggl (\Vert \hat{\mathbf {x}}_i(\tilde{\mathbf {x}}^{k+t_k(i)})-\mathbf {x}_i^{k+t_k(i)}\Vert _2+\biggl (\Vert \mathbf {x}^{k+t_k(i)}-\mathbf {x}^k\Vert _2^{1/2}\nonumber \\&\qquad +\sum \limits _{l=k+t_k(i)-\delta }^{k+t_k(i)-1}\Vert \mathbf {x}^{l+1}-\mathbf {x}^l\Vert _2^{1/2}\biggr )\nonumber \\&\qquad \biggl (\Vert \mathbf {x}^{k+t_k(i)}-\mathbf {x}^k\Vert _2^{1/2}+\sum \limits _{l=k+t_k(i)-\delta }^{k+t_k(i)-1}\Vert \mathbf {x}^{l+1}-\mathbf {x}^l\Vert _2^{1/2}+\tilde{L}_{\hat{\mathbf {x}}}\biggr )\Biggr )\nonumber \\&\quad {\mathop {\le }\limits ^{(c)}}\sum \limits _{i=1}^N\Biggl (\Vert \hat{\mathbf {x}}_i(\tilde{\mathbf {x}}^{k+t_k(i)})-\mathbf {x}_i^{k+t_k(i)}\Vert _2\nonumber \\&\qquad +2\sqrt{\gamma }\sum \limits _{l=k-\delta }^{k+T-1}\Vert \hat{\mathbf {x}}_{i^l}(\tilde{\mathbf {x}}^l)-\mathbf {x}_{i^l}^l\Vert _2^{1/2}\,\left( 2\sqrt{\gamma }\sum \limits _{l=k-\delta }^{k+T-1}\Vert \hat{\mathbf {x}}_{i^l}(\tilde{\mathbf {x}}^l)-\mathbf {x}_{i^l}^l\Vert _2^{1/2}+\tilde{L}_{\hat{\mathbf {x}}}\right) \Biggr ), \end{aligned}$$
(52)

where in (a) we used Proposition 3; (b) comes from (22); and in (c) we used the updating rule of the algorithm. Using (50), (51) and (52), we conclude that

$$\begin{aligned} \lim \limits _{k\rightarrow +\infty }\Vert \hat{\mathbf {x}}(\mathbf {x}^k)-\mathbf {x}^k\Vert _2=0. \end{aligned}$$
(53)

A straightforward generalization of [30, Theorem 14] together with (53) proves that every limit point of \(\{{\mathbf {x}}^k\}_{k\in \mathbb {N}_+}\) is a stationary solution of Problem (P\(^\prime \)). Since (53) holds for any given realization \(\omega \in \bar{\varOmega }\), the above results hold a.s..

Iteration complexity can be proved following the steps (32)–(48) and using the convexification of the nonconvex constraint sets where needed; details are omitted.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cannelli, L., Facchinei, F., Kungurtsev, V. et al. Asynchronous parallel algorithms for nonconvex optimization. Math. Program. 184, 121–154 (2020). https://doi.org/10.1007/s10107-019-01408-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10107-019-01408-w

Keywords

Mathematics Subject Classification

Navigation