Problem 40

An irrational decimal fraction is created by concatenating the positive integers:


0.123456789101112131415161718192021…

It can be seen that the 12th digit of the fractional part is 1.

If dn represents the n-th digit of the fractional part, find the value of the following expression.

d1 × d10 × d100 ×  d1000 × d10000 ×  d100000 × d1000000

 

Step 0: Disclaimer

TL;DR: A «summary» of the algorithm is in section 7.

I’m not a mathematician nor a computer scientist. The methods and discoveries below may be trivial and/or convoluted to someone who knows what they’re doing, has studied this formally or learned it through actually working in the field.

I do not claim to have a unique insight or method into this problem. I write down my thought process so I can learn from it in the future. I publish it because it may lead to either helping someone else, however implausible that may be. Corrections and improvements are most welcome.

Node your homework.

Step 1: Defining Champernowne’s constant

Champernowne’s constant can be (loosely) defined as the number obtained by concatenating all the natural numbers as a decimal expansion. Like so:


CC = 0.1234567891011121314...
(1)

Step 2: Analyzing the search space

At first glance it may seem easy to just generate Champernowne’s constant concatenating the first million integers, but that is frankly overkill. Ideally, we should «generate» a maximum of 1 million decimal places. But, how much is that?


Simple counting will reveal that within this decimal expansion there are:

  • 9 strings of length 1 (1–9),
  • 90 strings of length 2 (10–99),
  • 900 strings of length 3 (100–999),
  • etc.

Therefore, if we concatenate all the integers up to a number of the form 10k − 1, we will have the following number of characters:

  • 1 × 9 × 100 for 1–9, plus
  • 2 × 9 × 101 for 10–99, plus
  • 3 × 9 × 102 for 100–999, plus
  • (...)
  • k × 9 × 10k − 1

In other words, if chr(k) defines how many characters are there when concatenating the first 10k − 1 numbers, its value can be calculated as follows:


chr(k) = 9 × ((1×100)+(2×101)+(3×102)+…+(k×10k − 1))
(2)

Solving for a few values of k it can be seen that it is not necessary to concatenate all the way to n in order to get a string of length n. For example, concatenating only the numbers from 1–99 the decimal part will have 189 digits. Concatenating up to 999,999 will give us a string with 5,888,889 digits; almost 6 times larger than needed.

Step 3: Reducing the search space

This analysis also gives us an important hint: we can see that Champernowne’s constant (CC from now on) is «built» up of blocks of size 9, 180, 2700 and so on (for the 1–, 2- and 3-digit numbers respectively)

So if we can somehow figure out in which block does the n-th character appear, there’s no need to construct the constant from the very beginning. We could start only from the beginning of the appropriate block and count from there.

Fortunately, this is relatively simple to do. The following table shows the appropriate information:

Table 1: Values of k and associated parameters
Block number (k) Number range Block size (indices), Bk First index, Fk Last index (inclusive), Lk
1 1–9 9 1 9
2 10–99 180 10 189
3 100–999 2700 190 2889
k > 1 10k − 1 to 10k − 1 k × 9 × 10k − 1 Lk − 1 + 1 Bk + Lk − 1

Step 3.1: An example

Let’s say we want to find what digit is the 234th decimal in CC. We can tell from the table that it must be in the 3rd block, because the blocks 1 and 2 only hold 189 numbers.

Because it’s in the 3rd block, we can also know that the digit is part of a 3-digit number. Which one? We don’t know yet, but that will change soon.

Step 4: Counting

Now what? We could construct a partial CC and count from there, right? In our example, that would mean constructing the string:


CC3 = 100101102103104105....
(3)

…and counting from there. This is a good idea, but it can be refined.

First of all, to what number should we count? It’s not 234, because we’re not starting from the very beginning of CC, but from the beginning of CC’s 3rd block. Since the 3rd block already holds 189 numbers, our starting point is index 190.

(We’ll later make a slight change here, so remember to come back here when I tell you ok?)

In other words, 190 is has the sub-index 1 in CC3. 191 is has sub-index 2 in CC3… So in order to know this sub-index, all we have to do is to take our initial index and substract however many indices were up to the last block (in this case, 189). So, our sub-index in CC3 should be:


234 − 189 = 45
(4)

The 45th digit in CC3! So, we can construct and see:


CC3 = 100101102103104105106107108109110111112113114…
(5)

The 45th digit is 4. Great! So, case closed, right?

5. Counting efficiently

In this example we were lucky, our target number was close to the beginning of CC3, but what if it wasn’t?

Luckily, we still have information. Since we’re in the 3rd block, we know this block is made up of 3-digit numbers (100–999). So we can further reduce the search by dividing our sub-index by 3.

Why? Think about it. The sub-indices 1, 2 and 3 are all part of the number 100 used right at the beginning. The number 101 holds sub-indices 4, 5 and 6, and so on. If we divide the sub-index by 3, we can instantly figure out which number it’s part of… sorta.

What we need is what is usually called integer division or floor division. It may sound complicated, but it’s actually easier than regular division. In integer division we care only for, well, the integer and we discard the decimals or remainder.

Let’s say that a//b is the integer division of a by b. As mentioned, in this division we only care for the integer part. So, for instance 10//3 = 3 because we don’t care for the decimals of regular division, where 10/3 = 3.33333….

So what happens if we divide the sub-index by 3? Let’s see what happens with the first few sub-indices:


1//3 = 0
(6)
2//3 = 0
(7)
3//3 = 1
(8)
4//3 = 1
(9)
5//3 = 1
(10)
6//3 = 2
(11)

Well Andy–you say–This is no good. You told me that indices 1 and 2 should go with the first number, but this fancy division of yours tells me it’s zero. And then, it says that indices 3, 4 and 5 go with the first number in the block. What gives?

Well… this is another example of why sometimes Computer Science sounds like Greek to someone not used to it. We’ve been counting ‘wrong’ this whole time.

5.1: Starting from scratch

I won’t beat around the bush: instead of counting from one we should be counting from zero.

Remember up there when I said I would be making adjustments? This is what I meant and why it sounds weird. The “first” index of CC3 should instead be the “zeroth” index, and the first number in CC3 should instead be the “zeroth”.

What’s good about this weird counting system?–I hear you say–It’s confusing and hardly useful at all! It’s weird all right, but that’s because it’s new. As for its usefulness, let me show you.

When we count starting from 0 and not from 1 the trick about integer division gives us not one but two pieces of information, all we need to finish this puzzle.

Now the zeroth number used to construct CC3 is still 100; the first number is 101 and the second is 102. So far, still the same:


CC3 = 100 (concat) 101 (concat) 102 (concat) 103...

And now the zeroth index in CC3 is 1; the first index is 0 and the second index is 0. So far still the same. But what happens when we repeat the integer division (equations 6–11) above, but using the new system?


0//3 = 0
(12)
1//3 = 0
(13)
2//3 = 0
(14)
3//3 = 1
(15)
4//3 = 1
(16)
5//3 = 1
(17)

Now we get the result we wanted! The result immediately tells us that indices 0, 1 and 2 are part of the 0-th number in CC3; indices 3, 4 and 5 are part of the 1-st number and so on.

This trick may seem trivial for small results, but it becomes more useful the larger your numbers get. Remember when we tried to find the 234-th number in CC? The equation 4 should really be:


234 − 190 = 44
(18)

Which still gives us the same result as before (i. e. the decimal formerly known as the 45th in CC3, which is now the 44th). How is this helping us? Let’s try the integer division on this result:


44//3 = 14
(19)

So the number we’re looking for is in the 14th (again, starting from 0) number used to construct CC3, which is…

5.2: Let’s start at the very beginning… of the block

What’s the 14th number used to construct CC3? Well

Table 2: Numbers used in the construction of CC3
Order of construction in CC3 Number
0 100
1 101
2 102
3 103
14 114

So the digit we’re looking for is one of the ones in ‘114’, but which one? We know it’s 4, but what if we didn’t know? What if we couldn’t easily check?

Here’s where the integer division comes in. Or rather, its sister operation: modulo or remainder. Remember that in integer division we didn’t care about the remainder? In the modulo operation it’s the exact reverse, we care about the remainder alone and we don’t care for the quotient.

We’ll use the ‘%’ sign to express modulo or remainder. If we apply it to our desired index, we can get its exact position. how? Let’s get back to the 44-th sub-index of CC3. The modulo operation says that


44%3 = 2
(20)

So, our desired digit is the second in ‘114’, which perfectly coincides with what we saw before (remember that the zeroth and the first digits of ‘114’ are both 1).

In summary, once we figure out the sub-index of our block, all we need to do is to divide it by the length of the regular strings and we get the result immediately. Using regular division we get:


44/3 = 14 With remainder 2
(21)

6. Why all this fuzz, then?

Many problems in Project Euler are not meant to be brute forced. Sure, you could generate a million characters of CC and then search along that string for the desired d1, d10, d100, …, d1000000. But this naive procedure does not (necessarily) scale well for even larger values. Moreover, it also has the disadvantage of being naive: it doesn’t teach its users anything fundamentally new.

So let’s summarize this broad algorithm and see it in action for a really large value. What is the 987 654 321 123 456 789-th digit in CC?

7. A less verbose summary of the algorithm

  1. Remember to count from zero
    We need to find the i = 987 654 321 123 456 788-th index, starting from zero
  2. Figure out which block (K) to use
    There are many ways of doing this. The most simple one is to try with several values of K and iterate like in the table 1 above. As soon as we get a value of Lk > i, we can stop. Even for a number this big, k doesn’t grow up too quick. In our example, it can be calculated that k = 17 is enough for our purposes (L17 = 1 688 888 888 888 888 889 > i)
  3. Figure out where to start (the sub-index)
    The sub-index is the index in the particular block CCk found. The 1st index in each block is in the table 1 above, as Fk, but astute readers should remember that that table was built when we were still counting from 1 so the zeroth index per block should really be Fk − 1. So, in k = 3 the zeroth index should be 189. The subindex ik then is calculated as ik = i − Fk. For our gigantic index, we can see that i17 = i − F17 = 828 765 432 234 567 899
  4. Figure out the base number in CCk’s construction
    We know that block CCk is constructed wih numbers k-digits long. So, by calculating ik//k we can figure out the base number. This base number is the n − th number we would write down if we wrote down CCk. Also, we know that the zeroth base number in CCk is always 10k − 1 because that’s what we used to define the blocks in the first place (see the ‘Number range’ column in table 1). While difficult to write, this simply means that the base number is just bik = (ik//k) + 10k − 1 which in our case is bi17 = i17//7 + 1016 = 58 750 907 778 503 994
  5. Figure out which digit of the base to use
    Almost there! Instead of the gigantic CC, we only need to search in the relatively tiny 17-digit base number. But, which number to use? Simple, the one resulting from the modulo division bik%k. In this example, it can be calculated that 58 750 907 778 503 994%17 = 1, which means it’s the first digit of bi17, which is…?

8. It’s obvious I don’t understand this problem

Some people say that you don’t really understand a problem until you’re able to explain it to a newbie with clarity and precision. From my diatribe above it’s clear to me that I’m not yet fluent enough with this kind of knowledge, but I’m proud to have been able to solve it without external help by good old fashioned reasoning, pen and paper and four cups of coffee.

The actual program is quite fast, I didn’t expect it to be so good. There might be many improvements upon it, but those will have to be learned the hard way, too.

I’m not showing the actual meat of the code, because that goes against the spirit of the rules of Project Euler. Instead, I want to show you the last few lines of code and what my console spat out.

start_time = time.time()
print(getChampernowneIndex(100000000000000000))
print(getChampernowneIndex(987654321123456788))
print("--- %s seconds ---" % (time.time() - start_time))

In  1 : runfile('???/Euler/40.py', wdir='???/Euler')
          CC index is  100000000000000000
          K is  16
          Sub-index is  85111111111111111
          Base number is  6319444444444444
          Remainder is  7
          Answer is 4
          ----------
          CC index is  987654321123456788
          K is  17
          Sub-index is  828765432234567899
          Base number is  58750907778503994
          Remainder is  1
          Answer is 8
          --- 0.001008749008178711 seconds ---

(Yes, the first digit of bi17 is 8; the zeroth is 5)

← Project Euler Guide: Problem 39 | Project Euler Guide: Problem 40 | Project Euler Guide: Problem 41 →