Artificial Intelligence (AI) has seen quite a resurgence in the last decade. We have gone from KI, which is mostly useless, to ruining our lives in a dark and opaque way. We even gave AI the job of crashing our cars for us.
AI experts will tell us that we only need larger neural networks and the cars are likely to stop colliding. You can get there by adding more graphics cards to an AI, but power consumption is too high. The ideal solution would be a neural network that can process and forward data at almost zero energy costs. This might be the case when we focus on optical neural networks. Speaking of which, a good GPU uses 20 PicoJoules (1
How did the researchers come to this conclusion?  Layered like Onions
Let's start with an example of how a neural network works. A series of inputs is distributed over a series of neurons. Each input to a neuron is weighted and added, then the output from each neuron is amplified. Stronger signals are stronger amplified than weak signals, which makes the differences larger. This combination of multiplication, addition, and boost occurs in a single neuron, and neurons are arranged in layers, with the output from one layer becoming the next. As signals propagate through layers, this structure reinforces some and suppresses others.
In order for this system to perform useful computations we need to pre-set the weighting of all inputs in all slices as well as the boost function (more precisely the boost function) non-linear function). These weights are usually determined by giving the neural network a training record for processing. During training, the weight and function parameters develop into good values through repeated failure and occasional success.
There are two basic consequences here. First, a neural network requires many neurons to have the flexibility to cope with complicated problems. Second, a neural network must be able to adjust its parameters as it accumulates new data. Here our theoretical optical neural network bends its developing muscles.
All optical AI
In the optical neural network, the inputs are split light pulses. The weights are adjusted by changing the brightness. When set in physical hardware, they often can not be changed, which is undesirable. However, according to the researcher's scheme, the weights come from a second set of optical impulses, which makes it much more flexible.
In a single neuron, all optical pulses collide and add by interference. The glitches hit a photodetector to perform the multiplication. Then the electrical output of the photodetector can have any boost function that we want to apply electronically. The resulting final value is then emitted as light and forwarded to the next neural network layer.
The cool thing is that the weight is an optical impulse that can be continuously adjusted and lead to an optical neural network with the flexibility of a computer-aided neural network, but that works much faster.
Remarkably, the researchers suggest that all this be done in free space and not on optical integrated circuits. They argue that combinations of diffractive optical element elements for the complex manipulation of optical beam and photodiode arrays are much more precise and scalable than the photonic circuits that we can print on chips.
You could be right. The reliability of manufacturing is currently the problem of photonic circuits. I can not imagine successfully creating a large-scale photonic circuit using today's techniques. So I'm ready to buy that argument, and I even agree that scaling to millions of neurons across multiple levels is possible. It looks good.
You have to count all the energy.
But I do not buy the energy argument at all. The researchers have calculated how much energy the optical pulses need to ensure the accuracy of the output signal of the detection level of a single neuron. This results in an impressive-sounding value of 50zJ per operation.
That may be true, but ignores a lot of pretty important things. How much energy for the boost function? How much energy to turn the electrons back into light? Some of these have been given numbers by the researchers, but their calculations essentially show that the per-op energy is not easy to calculate because the required electronics are not available.
Instead, the big win is in energy conservation for data transport. A large neural network can be spread across multiple GPUs. This entails two major energy costs: shuffling data on the GPU itself and shuffling data between GPUs. The optical architecture virtually eliminates these costs and can scale to a larger number of gates at the same time.
Also in terms of size, the optics will not be much worse than a box full of GPUs. I think you could put a large optical neural network into a desk of average size. However, it may be important to turn off the light during operation.
So where is this going? I assume that the researchers will demonstrate this neural network in the next year or two. It consumes more than a lot of energy per operation on the photodiode alone. Taking into account the load-bearing structure, the total energy costs are well over 1 pJ per operation. Ultimately, they will have a high degree of scalability but no significant energy savings.
Physical Review X, 2019, DOI: 10.1103 / PhysRevX.9.021032 (About DOIs)