I've been playing around a little with artificial intelligence. I've put a little project together called JNeural. This is a summary of what I have learned.

In my little test project, I've built a perceptron, a simple piece of code that can be used for supervised learning. The basic algorithm is:

Single perceptrons cannot solve problems that are not linearly separable. What does this mean?

Linearly separable problems are where a straight line can divide the problem space for example:

A less abstract problem space that is linearly separable is the truth table for AND:

since we can draw a straight line delineating one answer from another.

XOR is not linearly separable as it looks like this:

Perceptrons working in concert can solve this problem but not a single perceptron by itself.

Information Theory, Inference, and Learning Algorithms - available free here.

Wikipedia.

__Perceptrons__In my little test project, I've built a perceptron, a simple piece of code that can be used for supervised learning. The basic algorithm is:

- Take a vector of
*inputs*. - Find the dot product of the inputs and a vector of
*weights*. - Feed this result into an
*activation function*. Your answer will come out of the other side.

The simplest activation function returns a boolean given by evaluating: is the dot product greater than a

*threshold*?__Teaching a Perceptron__

The secret sauce is the weight vector. This is calculated by repeatedly giving the algorithm input vectors and asking it to compare its results with an expected answer. It goes like this:

- Given the input/weight dot product, calculate the
*delta*of this result from the expected value. - If the delta is greater than a
*threshold*(and the expected value was no less than the threshold in the first place), then we're done. - Otherwise, we got the answer wrong. So, let's calculate the the
*error*, and adjust all the weights by it.

We do this either for a fixed number of steps or until our deviation for all the training sessions is within a certain margin.

__Almost there...__

This is not the whole story. There are two constants that we need to include.

The first is the

*learning rate*. This adjusts the error correction by a factor. It's needed because correction granularity might be too great. That is, we keep over shooting the mark. However, if it's too small, the algorithm takes much longer to learn.
The second is the

*bias*. This is needed because imagine we had a problem space that was made of Cartesian co-ordinates and the question was "does a given point map to 1 or 0?". If we had an input of {0, 0}, no matter what the weights the output will always be 0.__Linearly Separable__Single perceptrons cannot solve problems that are not linearly separable. What does this mean?

Linearly separable problems are where a straight line can divide the problem space for example:

A less abstract problem space that is linearly separable is the truth table for AND:

since we can draw a straight line delineating one answer from another.

XOR is not linearly separable as it looks like this:

Perceptrons working in concert can solve this problem but not a single perceptron by itself.

__References____The Nature of Code, Daniel Shiffman - available free online here.__

Information Theory, Inference, and Learning Algorithms - available free here.

Wikipedia.