# Can Machine Learning Discover the Laws of Nature?

Johannes Kepler (1571–1630) was in some sense a tremendously successful data scientist. While many astronomers have worked hard over the years to gather accurate experimental data regarding the motion of planets, Kepler’s genius was in the way that he analyzed data collected by others. As many of us learned in our first introduction to the study of the solar system, “When Tycho Brahe suddenly died in 1601, all of his data was given to Johannes Kepler and it became his responsibility to finish Tycho Brahe’s work. For the next 11 years, Kepler investigated mathematical patterns in the data, making and testing hypotheses until he developed an even better understanding of the arrangement and movement of our solar system than anything that had gone before.”

In modern data science terminology, Kepler’s “investigation of mathematical patterns in the data” was a form of nonlinear dimensionality reduction. He realized that high dimensional observations (namely the motion of all planets in our solar system) form a one-dimensional **manifold**: they are actually a function of a single latent variable, which is the distance of the planet from the sun. The astronomical models that came before Kepler can also be seen as nonlinear dimensionality reduction. Ptolemy’s model had a latent dimension of two and was accurate enough to be used in navigation for many years but Kepler’s model was the first to recover the true mapping from the observations to the latent code. This allowed him to accurately calculate the distance of planets from the sun, just by observing their motions.

Given the amazing progress in Deep Learning in recent years, it is tempting to ask whether machines can automatically perform nonlinear dimensionality reduction and thus discover the laws of nature. This is particularly promising in scientific settings where the amount of available data is huge. While Kepler performed his dimensionality-reduction on a small number of high-dimensional measurements (a handful of planets), in modern settings we can easily obtain hundreds of thousands of high dimensional vectors that are assumed to lie in a low-dimensional manifold. Notable examples include the firing rates of neurons in a behaving animal’s brain and the expression of different genes in a large population of cells.

A naive approach to nonlinear dimensionality reduction is to train an **autoencoder**: a deep neural network that gets as input the high dimensional measurements (e.g. planetary motions), maps this input to low-dimensional latent variables, and then reconstructs the measurements from the low-dimensional representation. In recent years, the latent dimension learned by such deep auto-encoders on experimental data is increasingly being used for scientific discovery.

*Read more data science articles on **OpenDataScience.com**, including tutorials and guides from beginner to advanced levels! **Subscribe to our weekly newsletter here** and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our **Ai+ Training** platform. Subscribe to our fast-growing Medium Publication too, the **ODSC Journal**, and inquire about becoming a writer.*