# Draw an elephant with a single parameter

## Reproduce any animal with finite numbers using a single function

Suppose we have data and we look for a simple explanation for this data, or at least a model that allows us to reproduce it. We will start, for example, from measurement values which, for **x = 0, 1, …, n,** give **y₀, y₁, …, yₙ**. If we find a function f which satisfies **f (0) = y₀, f (1) = y₁, …, f (n) = yₙ**, we will be quite satisfied. We will use **f** to predict **f (n + 1), f (n + 2)**, … and we will analyze f to understand the data studied and perhaps to formulate a theory explaining them.

Obviously, we are looking for the simplest possible functions **f**. Polynomials, i.e. functions of the form **f (x) = a₀ + a₁x + a₂x² + … + aₙxⁿ **are among the simplest. We will, therefore, be quite satisfied if we find a polynomial that reproduces **y₀, y₁, …, yₙ**.

However, the mathematical world is generous and often allows us to do whatever we want. A nice result, called *“Lagrange interpolation theorem”*, indicates that whatever the data **y₀, y₁, …, yₙ**, there is a polynomial of degree at most n which gives **y₀, y₁, …, yₙ **for **x = 0, 1, …, n**. But this polynomial is overfitting: we give ourselves too many parameters, here the coefficients of the polynomial, and what we find is not really of interest because we were sure to get there!

Interpolation:Imagine that we want to find a polynomial

P(x)which takes the value7forx=0, the value2forx=1, and the value18forx=2. Here is a well-known method that allows us to construct it and which generalizes to any number of data.We write:

P(x)=7(x — 1)(x — 2)/[(0–1)(0–2)]+2(x — 0)(x — 2)/[(1–0 )(1–2)]+18(x — 0)(x — 1)/[(2–0) (2–1)].The first term is zero for

x=1andx=2, the second forx=0andx=2, the third forx=0andx=1. Therefore, only the first term is not zero forx=0, only the second is not zero forx=1, only the third is not zero forx=2.In addition, factors

7,2,or18at the start of each term ensure that the non-zero term forx=0takes the value7, forx=1takes the value2, forx=2takes the value18. C ‘ is what we were looking for. Developing and simplifying, we find:P (x) = 21 x2 / 2 –31 x / 2 + 7.We can easily verify that

P (0) = 7, P (1) = 2, P (2) = 18.The method is generalized and allows, if

y₀, y₁, …, yₙare given values, to construct without difficulty a polynomial which will take the value y₀ forx=0, y₁forx=1, …, yₙforx=n. This is the “Lagrange interpolation method”.It shows that allowing a lot of parameters, here the coefficients of the polynomial, allows without difficulty to pass a polynomial through the desired values, whatever they are.

However, in general, the function found will oscillate a lot, whereas we would prefer a function as regular and stable as possible: its adjustment to the desired values is exact, but unsatisfactory. There are better methods for approaching the desired values by limiting the oscillations of the result function.

Keep in mind that interpolating exactly is easy, but that it generally does not satisfy the desire to find smooth and regular laws.

This logic of the Lagrange interpolation is pushed to the absurd in the results presented in the main text: the method discovered by Steven Piantadosi, much more powerful than the Lagrange interpolation, allows, by changing a single parameter p of a function

fₚ(x), to obtain that for any valuesy₀, y₁, …, yₙ,we have:fₚ (0) = y0, fₚ(1) = y1, …, fₚ(n) = yₙ.

Also, the calculation of the Lagrange interpolation polynomial will only very rarely allow us to anticipate what happens for **n + 1, n + 2, etc.**, and this polynomial will not be useful on a theoretical level. Fortunately, it is possible that the polynomial found is of low degree. If this is the case, it means that its discovery would have resulted from taking into account fewer data (as much as the degree of the polynomial, plus one unit) and therefore that the polynomial finds the other data. In such a situation, it will not be absurd to believe that the calculated polynomial makes it possible to anticipate **f(n + 1)**,** f (n + 2)**, etc.

The risk of over-adjustment is a significant problem in the area of deep learning in artificial intelligence, as it leads to neural networks that are well suited to a large dataset … and nothing else.

About the choice of parameters in a model, the Hungarian-born mathematician John von Neumann made a funny remark: “With four parameters, I can make an elephant, and with five, I can move its horn. “

# A single parameter is sufficient for any adjustment:

A recent article by Steven Piantadosi,

from the Department of Psychology at the University of California at Berkeley, showed that von Neumann could have replaced the number “four” in his sentence with the number “one”. The article is, of course, a game, but what it shows, by implementing fairly simple mathematics, is remarkable and should serve as a warning: a single parameter allows you to adjust a function so that it gives back a thousand data or a billion if you do it right.

The statement demonstrated by Steven Piantadosi is as follows.

- We set a precision
**e**, for example,**1/1000**, which means that we will seek to have equalities with an error less than e. - There exists a function
**fₚ(x)**depending on a real parameter**p**, such that, whatever the data**y₀, y₁, …, yₙ**with finite number, there exists a value p of the parameter for which**fₚ(0) = Y₀, fₚ(1) = y₁, …, fₚ(n) = yₙ,**with an error less than e for each equality.

Clearly, by varying the parameter **p**, the function **fₚ(x)** adjusts with precision **e **to any set of data fixed in advance, however large it may be.

Steven Piantadosi’s miraculous function is: **fₚ(x)=sin²(2ᵏˣarcsin(℘∠p))**. Recall that **arcsin(x)** is the inverse function of **sin(x)**, which means that for all** x** between — ≠/2 and ≠/2 , **y=sin(x) ⇔ x=arcsin(y)**.

The value **k** that appears depends on the precision **e** desired, but once **e **is fixed, **k** is fixed too. Obviously, the researcher calculated the parameter **p** which gives an elephant.

One parameter for the elephantSteven Piantadosi proposed the function:

fₚ(x)=sin²(2ᵏˣarcsin(℘∠p)), where the real numberpis a parameter and showed that it has an extraordinary property of adjustment. The value ofkis fixed once and for all and indicates the precision with which the data will be approached. We consider for example values y₀, y₁, …, yₙsuch that the coordinate points(0, y₀), (1, y₁), …, (n, yₙ)draw an elephant, tightening the abscissa and drawing fairly large points.Steven Piantadosi’s method then makes it possible to determine the value of the parameter

pso that:fₚ(0)=y0, fₚ(1) = y1, …, fₚ(n)= yₙ, in other words, so that the graph offₚ(x)draws an elephant. A single parameter is enough to have an elephant! Of course, you can get all the designs you want by changing the value ofp.Laurent Boué wrote programs available on the internet that allow the calculation of

pfor the desired drawings and did the calculations for a bird and a turtle.

# Where does the exploit come from?

Understanding in detail how the function **fₚ(x)** achieves this feat, and how to go about finding the right values for the parameter **p**, is a math game that requires a bit of computation. However, the general idea is simple: we will hide in the parameter p which, because it is a real number, can contain an infinity of information, the numbers **y₀, y₁, …, yₙ**. Then, the function **fₚ(x)** for **x=0, 1, …, n** will make them go up one by one: when **x** will be worth **0**, **y₀** will go up, when **x** will be worth **1**, **y₁** will go up, etc.

Box 3 explains that this rise in approximate values of **y₀, y₁, …, yₙ** takes place indirectly thanks to the **2x** function. When we apply the **2x** function to a real number written in base **2**, for example, **x=0.1100111011112,** it brings up all the figures to the left: **2x=1.100111011112 **(the **2** in subscript indicates that it is of writing in base **2**).

A first miraculous function

Let D be the function such that

D(x) = 2x for 0 ≤ x <1/2 and D(x)=2x — 1for1/2 ≤ x <1. This can write:D(x)=2xmod 1.The function

x mod 1, sometimes called the fractional part ofx, keeps only what is after the decimal point:2.37becomes0.37;219.2222becomes0.2222.Applied to a number written in binary, for example

x=0.110111101₂, the functionD(x)gives a number with the same binary development as x, except that the first digit after the decimal point has disappeared:D (0.1101111012) = 0.101111012.The explanation is simple. On the one hand,

2xshifts the binary development of a digit to the left (it’s likebase 10when you multiply a number by10), and therefore0.1101111012becomes1.101111012; on the other hand,2x mod 1removes the1in front of the decimal point if there is one.Now consider the function Dk obtained by applying k times the function

D: Dk (x) = D (D (… (D (x)…)),ktimes. The functionDk (x)slides the numbers binaries ofxofkplaces to the left and eliminates the firstkdigits:Dk (x) = 2k x mod 1. For example,D4 (0.110111101₂) = 0.111101₂.Let be numbers

y₀, y₁, …, including between0and1, each written inbase 2withrdigits after the decimal point. Example forr=5andn = 2: y0 = 0.101112; y₁= 0.11001₂ ; y₂ = 0.01011₂Let

p = 0.1011111001010112obtained by putting the digits ofy₀,y₁andy₂one behind the other. We have :

p = D⁰(p) = 0.1011111001010112

D¹(p) = 0.011111001010112

D²(p) = 0.11111001010112

D³(p) = 0.111001010112

D⁴(p) = 0.111001010112

D⁵(p) = 0.11001010112, etc.Thus,

D⁰(p)corresponds toy⁰on the first5binary digits,D⁵(p)corresponds toy₁on the first5binary digits,D¹⁰(p)corresponds toy²on the first 5 binary digits, etc.This process therefore places in the binary digits of p the binary digits of the data

y₀,y₁, …,yₙwith a precision of r binary digits for each, which makes it possible, by applyingrtimes functionD, have successivelyy₀,y₁, …,yₙeach time with an accuracy ofrbinary digits.In summary,

pcontains, hidden in its binary digits, the numbersy₀, y₁, …, yₙwith a precision of r binary digits for each, and the functionD, applied repeatedly top, finds these hidden numbers.Let now be the function

Eₚdefined byEₚ(x)=Dʳˣ(p), whose variable isxand which depends on the parameterp. We have :

-Eₚ(0)=y0 with a precision of r binary digits,

-Eₚ(1)=y1 with a precision of r binary digits, …,

-Eₚ(n)=yn with a precision of r binary digits.By choosing the parameter

pwell, the functionEₚ(x) is therefore capable of giving the numbersy₀, y₁, …, yₙwith a precision ofrbinary digits. If the data is not between0and1, we come back to it by dividing everything by a fairly large number.Thus, by properly adjusting its parameter

p, the functionEₚ(x)is sufficient to find any finite set of data. This is not the result announced concerning the functionfₚ(x)=sin²(2ᵏˣarcsin(℘∠p)), but we are almost there.

This property is equivalent, for **base 2**, to what we know well in **base 10**: multiplying by **10** shifts all decimals by one digit to the left. For example,** 10 × 3.1415926… = 31.415926…**

By using the function that removes what is before the decimal point, the function (**x mod 1**), sometimes called *“fractional part of x”*, we then have a way to bring up the binary digits of a given number **x **by erasing those which are placed at the head. Let us indeed put **D(x)=2x mod 1**, and take for example** x = 0.1100111011110101₂**. We then have:

**-D(x) = 0.100111011110101₂,- D(D(x)) = 0.00111011110101₂,- D(D(D(x))) = 0.0111011110101₂, etc.**

By exploiting this means of hiding information in a real parameter **p**, as well as some other tricks, we arrive at the function which, with the right choice of **p**, gives **y₀** for **x = 0**, **y₁** for **x = 1**, … , **yₙ **for **x = n** with a precision fixed in advance. This shows that a single parameter **p** and a single function fₚ(x) allow to approach all the data that we want, whatever their number, as long as it is finished!

The function of PiantadosiGiven what has been said already, on the

Dfunction, we now consider the three functions:

ϕ(x) = sin²(2 ≠ x);

ϕ⁻¹(x) = (arcsin℘ -x) / 2 ≠;

L(x) = 4x (1-x).By definition,

ϕ(ϕ⁻¹(x))=xfor0 ≤ x ≤ 1andϕ⁻¹(ϕ (x))=xfor0 ≤ x ≤ 1/4.Here are the inevitable calculations to arrive at the miracle.

ϕ (D (x))=ϕ(2x)=sin²(4≠x)=4sin²(2≠x)(1-sin²(2 ≠ x))=L (ϕ (x)) (we used sin (2a) = 2 sin a cos a,cos²a=1 -sin² a and sin(x+2≠)=sin(x)).Taking

x = ϕ⁻¹ (p), i.e.p=ϕ(x), we have:ϕ(D(ϕ⁻¹(p)))=L(p), from which we deduce:ϕ(Dᵏ (ϕ⁻¹(p))) = Lᵏ(p).In other words, to know the effect of

Lappliedktimes top, just applyϕ⁻¹top, then applyktimesD, then applyϕto the result.We will now understand which parameter

pto choose so thatfₚ(x)givesy₀, y₁, …, yₙforx = 0, 1, …, n. We first note thatfₚ(x)=sin²(2ᵏˣarcsin (√ -p)) = ϕ (Dᵏˣ(ϕ⁻¹(p))) = Lᵏˣ(p).If we want

fₚ(x)to give approximatelyy₀, y₁, …, yₙforx = 0, 1, …, n, then it suffices thatDᵏˣ(ϕ⁻¹(p))gives approximatelyϕ⁻¹(y₀),ϕ⁻¹(y₁), …,ϕ⁻¹(yn) forx = 0, 1, …, n. According to what has been already told, it suffices therefore thatϕ⁻¹(p)be made of k binary digits ofϕ⁻¹(y₀), followed bykbinary digits ofϕ⁻¹(y₁), …, followed ofkbinary digits ofϕ⁻¹(yₙ).

Conclusion:the parameterpwe are looking for must be the image byϕofkbinary digits ofϕ⁻¹(y₀), followed bykbinary digits ofϕ⁻¹(y₁), …, followed bykbinary digits ofϕ⁻¹(yₙ). The largerkis, the more precisely theyᵢwill be given byfₚ(0),fₚ(1), …,fₚ(n).Take for example

y₀ = 0.7, y₁ = 0.4, y₂ = 0.5

andk = 12. We calculateϕ⁻¹(0.7),ϕ⁻¹(0.4)andϕ⁻¹(0.5)with12significant binary digits:

ϕ⁻¹(0.7)= 0.0010100001102;

ϕ⁻¹(0.4)= 0.0001101111102;

ϕ⁻¹(0.5)= 0.0010000000002.We put the binary digits found one after the other:

q = 0.001010000110000110111110001000000000₂=

0.157741434872150421142578.We calculate

p = ϕ(q) = 0.6999652447… and we have what is expected:

fₚ(0)= 0.6999652447, or about 0.7,.

fₚ(1)= 0.3996355201, or about 0.4,

fₚ(2) = 0.4999999999, or about 0.5

Thus, no need for four parameters to draw an elephant: with a single parameter, we can arrange, and the same function will always be used, by changing **p**, to draw a fish, a turtle, a bird, or all that we want.

# Number of parameters and complexity of models

We sometimes measure the complexity of a model by counting the number of parameters used by the model. This is a procedure that cannot be justified without additional details on the nature of the model. Indeed, the function **fₚ(x)** of Steven Piantadosi shows that we can always be reduced to a single parameter. If we want to give a meaning to the counting of the parameters of a model, we must detail the constraints that we impose on the model, for example, “only involve polynomials” which will prohibit accepting the function **fₚ(x)** or one of its variants.

It should be emphasized that the article by Steven Piantadosi does not claim to propose a new method for constructing models approximating data using a function with a single parameter. The article is only a math game that exploits the ability of a real number to hold an infinite amount of information. What makes the article particularly interesting and makes it amazing is the overall simplicity of the method. I do not believe that we published before the work of Steven Piantadosi such an elementary method allowing to determine and effectively calculate a parameter **p** which accumulates information in unlimited number, and to bring back what we hid in **p** in using only one function constructed from the elementary functions of classical analysis (sin, arcsin, exponential, etc.).

**The disappointed illusion of over-adjustment.**

Laurent Boué took 95 values from the American S&P 500 stock market index. He calculated the value of **p** which allows the miraculous function of Piantadosi,**fₚ(x)=sin²(2ᵏˣarcsin(℘-p))**, to restore these values. He then looked at how it worked for the next values in the index (beyond the green dotted vertical bar).

If the approximation method **fₚ(x) **was a good way to discover a hidden law in the variations of this stock market index, the following values for the calculated **p** would have been close to those of the real data.

Not surprisingly, this is not at all the case. In the drawing, the red curve is what **fₚ(x)** gives and the blue dots are the actual values. The function **fₚ(x)** allows, by changing **p**, to adjust to any set of data, but it is by no means a means of discovering hidden laws. This is a perfect example of unnecessary overfitting.

Physicists use real numbers, but never take more than 20 or 30 decimals into account. So they will not make sense of models like the one suggested by the prodigious function of Steven Piantadosi, which only works if one accepts to manipulate hundreds of significant figures.

If we look closely at the function **fₚ(x)** when the parameter **p **has been set, we can see that it oscillates violently and that the values it takes at whole numbers **0, 1, …, n **do not are in no way representative of what happens for **x **between **0** and** 1**, between **1** and **2**, between **2** and **3**, etc.

If we draw (as much as we can do) the entire plot of the function**fₚ(x)** between** 0** and **n**, we do not see at all the image produced by the drawing when we only retain the values of **fₚ(x)** for** x = 0, 1, …, n**. We see a tight zigzag which darkens almost the entire rectangle **[0, n] × [0, 1]**. The plot of the function **fₚ(x)** passes very close to the coordinate points **(0, y0)**, **(1, y1), …, (n, yₙ)**, but it does so without limiting its variations, whereas this is what we expect from a good interpolation function: we want the latter to be as smooth as possible.

Let us quote the comments of Steven Piantadosi concerning his miraculous function:

The existence of such a simple function with such freedom of behavior illustrates a more fundamental problem. It shows that the complexity of a model cannot be assessed by counting its parameters.

More generally, the non-critical use of a parameter counting approach ignores that a single parameter with real value potentially contains an unlimited amount of information since, to fully specify a real number, an infinite number of bits must to be, which incidentally is problematic in itself. Another notable fact is that the set of real numbers which can be described with a finite number of bits (for example by having them produced by a Turing machine) is countable. This set of real numbers therefore has a zero measure in the set of all real numbers. We also know that there are injective functions (i.e. not taking twice the same value) from ℝⁿ (space of dimension n) to ℝ. This confirms that the number of parameters in a model cannot be a reasonable measure of its complexity.

# Universality

The function **fₚ(x)=sin²(2ᵏˣarcsin(℘-p))** can be qualified as universal since it adapts to any series of data **y0, y1, …, yₙ**. It is not the first universal object to be encountered in mathematics.

The simplest is probably the number **≠**. We conjecture that it is a universe number and the calculation of several thousands of billions of its decimals confirms it: each possible sequence of decimal digits (for example **243, 21145 or 929292**), as long as we want, would thus appear in its decimal development. If **≠** is really a universe number, we can see it as a function which, when we fix an integer **n**, displays for example 1 million decimal digits starting from the n-th decimal. With this million digits, we can draw an image of 1000 × 1000 pixels with 10 gray levels. If **≠ **is a universe number, then for each image of this format, your portrait for example, there exists at least one integer** n** such that image number **n **taken from **≠** produces it. In this sense, **≠** would be at least as universal as **fₚ(x)**, since capable of producing all the images of the indicated format.

However, the amazing function of Piantadosi has a double advantage over ≠:

**(1)-**we are certain that it is universal, it is not a conjecture;

**(2)-**we know how to calculate the value of the parameter **p **necessary to obtain data **y0, y1, …, yₙ,** and this calculation is feasible, which is not the case for **n** giving your portrait in **≠**.

**Another universal object: Riemann’s Zeta function**

Another completely unexpected universal object was encountered in mathematics in 1975 by the Russian mathematician Sergei Voronin. It concerns the famous Riemann zeta function.

This function noted ζ (s) is defined, for any real number s> 1, by the formula: **ζ(s) =1+1/2s+1/3s+1/4s+…**

We can give it a meaning for all the values of s, including when s is a complex number (except in s = 1). We can see it as a function which, at any point on the plane (complex numbers can be assimilated to points on the plane), associates another point on the plane. This function plays a central role in number theory and, in particular, the position of the values of s which cancels it is linked to the distribution of prime numbers.

When we consider functions of this type, which produce complex numbers from complex numbers, the functions playing the role of differentiable functions are called “holomorphic functions”. The usual functions like **xⁿ**, s**in (x), cos(x), log(x),** and all those that we obtain by combining them are of course holomorphic.

In the part of the plane corresponding to a vertical band **B** of points of coordinates **(x, y)** with **1/2 <x <1**, Sergei Voronin demonstrated an incredible property of universality of **ζ (s)**. For any part **P** of this band without hole and compact (that is to say of finite size, and which contains the limit of all its convergent sequences), and for any holomorphic function f which does not cancel in **P**, there exists a real number **r** such that** ζ (s + r)** takes the same values in **P** as **f**, except **e** (**e**, as before, is a precision that is fixed once and for all).

In other words: whatever the holomorphic function **f** defined on an area **P** without a hole in the band **B**, the function **ζ** takes almost the same values as **f **when we examine its values in a version shifted upwards from **P**. It is a bit as if **ζ (s)**, on its own, could imitate all the holomorphic functions! In ≠, there are probably all finite sequences of decimals, but in **ζ (s)**, we are sure that there are all holomorphic functions! Note that Sergei Voronin’s result has been extended to other functions than the **ζ (s)** function.

Another fascinating universal object has already been mentioned in this section, **the Rado universal graph**. This infinite graph indeed contains as subgraphs all finite graphs and all infinite graphs; moreover, it has the property that when we remove a finite part, what remains is identical to what we started with.

The infinite world of mathematics conceals wonders and gives astonishments far beyond, at least it is my point of view of mathematician, of all that one can find in the physical world which, in comparison, sometimes appears dull and boring!