We imagine there is some data that is generated via independent sources.
We also have a mixing matrix that mixes the sources to produce the observed data .
Repeated observations of the data give us a set of training samples and we want to find the unmixing matrix that allows us to reconstruct the sources.
We suppose that the distribution of each source is given by and is independent of the other sources. Therefore the joint distribution of the sources is:
For any two vectors that are linearly related by , the absolute value of the determinant of the transformation matrix gives the factor by which the volume of any region in the space changes.
Therefore, if we have a probability distribution , the probability distribution of the transformed variables is given by:
Since , the distribution of the sources in terms of the data is:
Therefore the distribution of the data in terms of the sources is:
We want to choose a monotonically increasing function that increases from to to be the CDF of our probability distribution. The derivative of the CDF will give us our probability density function. Choosing Sigmoid to be our CDF, we get:
The log-likelihood of the data is then given by:
Taking the derivative of the log-likelihood with respect to and setting it to 0, we get the following gradient descent update rule:
There are certain scenarios where Independent Component Analysis (ICA) might not work well:
If a row in the unmixing matrix is scaled by a constant , this will just result in the corresponding source being scaled by . There is no way for us to know if scaling has occurred. Therefore, we won't be able to retrieve the true amplitude of our signal.
If the data follows a gaussian distribution, then our sources will also follow a gaussian distribution. And gaussian distributions are symmetric in nature. Therefore, if our unmixing matrix is multiplied by a rotation or reflection matrix , there is no way for us to know about it.
Moreover, we have assumed that our data points are independent and identically distributed. This is however, not true for time-series data.
Despite all these limitations, ICA still works very well given enough data.
This content is best viewed on a laptop or desktop device.