According to Google, Translatotron uses a sequence-to-sequence network model that uses voice input, processes it as a spectrogram – a visual representation of frequencies – and generates a new spectrogram in a target language. The result is a much faster translation with less chance of losing something. The tool also works with an optional speaker component that preserves a speaker's voice. The translated language is still synthesized and sounds a bit robotic, but can effectively retain some of the elements of a speaker's voice. You can hear examples of Translatotron's attempts to maintain a speaker's voice while completing translations on Google Research's GitHub page. Some are certainly better than others, but it's a start.