This page is under construction. You got here from the machine learning link on <a href='http://www.few.vu.nl/~jhulshof/'>my home page</a>. 
Don't expect too much but see Chapter 29 of my <a href='http://www.few.vu.nl/~jhulshof/2024.pdf'>Fundamentals Of Analysis</a> notes,
 and also Sections 13.7 and 13.8 for entropy related stuff.
Lagrange multipliers and Kuhn-Tucker theory appear in Chapter 29. 
The date shown is the last time I edited and uploaded. I kept an older version <a href='http://www.few.vu.nl/~jhulshof/2023.pdf'>here</a>. Support vector machines and kernels appear at the end of Chapter 29. This is as far as I got while reading the bachelor thesis of Anh Van Giang in Langlade during the summer of '22. The new pdf with the old Chapter 29 has some background for the kernels in the new Section 19.5.

<br>
<br>
Chapter 29 starts with the method of Lagrange and Lagrange multipliers for finding stationairy points of a given function subject to constraints formulated by equalities.
The implicit function perpective is essential as described in the most simple and in the most general case first in 29.1. Lagrange multipliers are explained in 29.2. In 29.3 I deal with constraints given by inequalities. 

The resulting Kuhn-Tucker method is used later in the context of optimal transport in 29.6 and in the context of support vector machines in 29.8. Unfortunately the tex files for  sections 29.3 and 29.8 were lost in the cloud. These sections are therefore missing in <a href='http://www.few.vu.nl/~jhulshof/2024.pdf'>Fundamentals Of Analysis</a>, in which I continue. Will have to fix this later. Chapter 29 will return.

<br> <br>In 29.5 I explain what I learnt from talking to Lotte Bijsterbosch about her internship project at Mobiquity just before the first lockdown. 
The derivation of the appropriate chain rule for 'back propagation' via Lagrange multipliers with the 'dynamics' of the neural network formulated as constraints was new to me and most of the people I talked to after understanding what was going on. After (29.13) I inserted "also follows from the chain rule", but not before I had written much of what followed on the gradient flow of the loss function in parameter space when supervising Freddrick van der Meer during his internship with Achmea. 
From him I also learnt about the space of neural network functions.

<br><br>The residual networks in 29.5.10 were Lotte's subject again and to my surprise led to what is essentially a reformulation of control theory in the continuous limit.
This limit can be examined with more mathematical rigour than I do here.

<br><br>
Here's a try with some simple formula vi-editing in html. Consider points or vectors z<sub>n</sub> 
defined by z<sub>n</sub>=f(z<sub>n-1</sub>,&theta;<sub>n</sub>). This is conform the notation in the machine learning section 29.5. To be continued.