Problem 40

An irrational decimal fraction is created by concatenating the positive integers:
0.123456789101112131415161718192021…
It can be seen that the 12th digit of the fractional part is 1. If d_{n} represents the nth digit of the fractional part, find the value of the following expression. d_{1} × d_{10} × d_{100} × d_{1000} × d_{10000} × d_{100000} × d_{1000000}

Step 0: Disclaimer
TL;DR: A «summary» of the algorithm is in section 7.
I’m not a mathematician nor a computer scientist. The methods and discoveries below may be trivial and/or convoluted to someone who knows what they’re doing, has studied this formally or learned it through actually working in the field.
I do not claim to have a unique insight or method into this problem. I write down my thought process so I can learn from it in the future. I publish it because it may lead to either helping someone else, however implausible that may be. Corrections and improvements are most welcome.
Node your homework.
Step 1: Defining Champernowne’s constant
Champernowne’s constant can be (loosely) defined as the number obtained by concatenating all the natural numbers as a decimal expansion. Like so:
CC = 0.1234567891011121314...
(1)
At first glance it may seem easy to just generate Champernowne’s constant concatenating the first million integers, but that is frankly overkill. Ideally, we should «generate» a maximum of 1 million decimal places. But, how much is that?
Simple counting will reveal that within this decimal expansion there are:
 9 strings of length 1 (1–9),
 90 strings of length 2 (10–99),
 900 strings of length 3 (100–999),
 etc.
Therefore, if we concatenate all the integers up to a number of the form 10^{k} − 1, we will have the following number of characters:
 1 × 9 × 10^{0} for 1–9, plus
 2 × 9 × 10^{1} for 10–99, plus
 3 × 9 × 10^{2} for 100–999, plus
 (...)
 k × 9 × 10^{k − 1}
In other words, if chr(k) defines how many characters are there when concatenating the first 10^{k} − 1 numbers, its value can be calculated as follows:
chr(k) = 9 × ((1×10^{0})+(2×10^{1})+(3×10^{2})+…+(k×10^{k − 1}))
(2)
Solving for a few values of k it can be seen that it is not necessary to concatenate all the way to n in order to get a string of length n. For example, concatenating only the numbers from 1–99 the decimal part will have 189 digits. Concatenating up to 999,999 will give us a string with 5,888,889 digits; almost 6 times larger than needed.
Step 3: Reducing the search space
This analysis also gives us an important hint: we can see that Champernowne’s constant (CC from now on) is «built» up of blocks of size 9, 180, 2700 and so on (for the 1–, 2 and 3digit numbers respectively)
So if we can somehow figure out in which block does the nth character appear, there’s no need to construct the constant from the very beginning. We could start only from the beginning of the appropriate block and count from there.
Fortunately, this is relatively simple to do. The following table shows the appropriate information:
Table 1: Values of k and associated parameters
Block number (k) 
Number range 
Block size (indices), B_{k} 
First index, F_{k} 
Last index (inclusive), L_{k} 
1 
1–9 
9 
1 
9 
2 
10–99 
180 
10 
189 
3 
100–999 
2700 
190 
2889 
k > 1 
10^{k − 1} to 10^{k} − 1 
k × 9 × 10^{k − 1} 
L_{k − 1} + 1 
B_{k} + L_{k − 1} 
Step 3.1: An example
Let’s say we want to find what digit is the 234th decimal in CC. We can tell from the table that it must be in the 3rd block, because the blocks 1 and 2 only hold 189 numbers.
Because it’s in the 3rd block, we can also know that the digit is part of a 3digit number. Which one? We don’t know yet, but that will change soon.
Step 4: Counting
Now what? We could construct a partial CC and count from there, right? In our example, that would mean constructing the string:
CC_{3} = 100101102103104105....
(3)
…and counting from there. This is a good idea, but it can be refined.
First of all, to what number should we count? It’s not 234, because we’re not starting from the very beginning of CC, but from the beginning of CC’s 3rd block. Since the 3rd block already holds 189 numbers, our starting point is index 190.
(We’ll later make a slight change here, so remember to come back here when I tell you ok?)
In other words, 190 is has the subindex 1 in CC_{3}. 191 is has subindex 2 in CC_{3}… So in order to know this subindex, all we have to do is to take our initial index and substract however many indices were up to the last block (in this case, 189). So, our subindex in CC_{3} should be:
234 − 189 = 45
(4)
The 45th digit in CC_{3}! So, we can construct and see:
CC_{3} = 100101102103104105106107108109110111112113114…
(5)
The 45th digit is 4. Great! So, case closed, right?
5. Counting efficiently
In this example we were lucky, our target number was close to the beginning of CC_{3}, but what if it wasn’t?
Luckily, we still have information. Since we’re in the 3rd block, we know this block is made up of 3digit numbers (100–999). So we can further reduce the search by dividing our subindex by 3.
Why? Think about it. The subindices 1, 2 and 3 are all part of the number 100 used right at the beginning. The number 101 holds subindices 4, 5 and 6, and so on. If we divide the subindex by 3, we can instantly figure out which number it’s part of… sorta.
What we need is what is usually called integer division or floor division. It may sound complicated, but it’s actually easier than regular division. In integer division we care only for, well, the integer and we discard the decimals or remainder.
Let’s say that a//b is the integer division of a by b. As mentioned, in this division we only care for the integer part. So, for instance 10//3 = 3 because we don’t care for the decimals of regular division, where 10/3 = 3.33333….
So what happens if we divide the subindex by 3? Let’s see what happens with the first few subindices:
1//3 = 0
(6)
2//3 = 0
(7)
3//3 = 1
(8)
4//3 = 1
(9)
5//3 = 1
(10)
6//3 = 2
(11)
Well Andy–you say–This is no good. You told me that indices 1 and 2 should go with the first number, but this fancy division of yours tells me it’s zero. And then, it says that indices 3, 4 and 5 go with the first number in the block. What gives?
Well… this is another example of why sometimes Computer Science sounds like Greek to someone not used to it. We’ve been counting ‘wrong’ this whole time.
5.1: Starting from scratch
I won’t beat around the bush: instead of counting from one we should be counting from zero.
Remember up there when I said I would be making adjustments? This is what I meant and why it sounds weird. The “first” index of CC_{3} should instead be the “zeroth” index, and the first number in CC_{3} should instead be the “zeroth”.
What’s good about this weird counting system?–I hear you say–It’s confusing and hardly useful at all! It’s weird all right, but that’s because it’s new. As for its usefulness, let me show you.
When we count starting from 0 and not from 1 the trick about integer division gives us not one but two pieces of information, all we need to finish this puzzle.
Now the zeroth number used to construct CC_{3} is still 100; the first number is 101 and the second is 102. So far, still the same:
CC_{3} = 100 (concat) 101 (concat) 102 (concat) 103...
And now the zeroth index in CC_{3} is 1; the first index is 0 and the second index is 0. So far still the same. But what happens when we repeat the integer division (equations 6–11) above, but using the new system?
0//3 = 0
(12)
1//3 = 0
(13)
2//3 = 0
(14)
3//3 = 1
(15)
4//3 = 1
(16)
5//3 = 1
(17)
Now we get the result we wanted! The result immediately tells us that indices 0, 1 and 2 are part of the 0th number in CC_{3}; indices 3, 4 and 5 are part of the 1st number and so on.
This trick may seem trivial for small results, but it becomes more useful the larger your numbers get. Remember when we tried to find the 234th number in CC? The equation 4 should really be:
234 − 190 = 44
(18)
Which still gives us the same result as before (i. e. the decimal formerly known as the 45th in CC_{3}, which is now the 44th). How is this helping us? Let’s try the integer division on this result:
44//3 = 14
(19)
So the number we’re looking for is in the 14th (again, starting from 0) number used to construct CC_{3}, which is…
What’s the 14th number used to construct CC_{3}? Well
Table 2: Numbers used in the construction of CC_{3}
Order of construction in CC_{3} 
Number 
0 
100 
1 
101 
2 
102 
3 
103 
… 
… 
14 
114 
So the digit we’re looking for is one of the ones in ‘114’, but which one? We know it’s 4, but what if we didn’t know? What if we couldn’t easily check?
Here’s where the integer division comes in. Or rather, its sister operation: modulo or remainder. Remember that in integer division we didn’t care about the remainder? In the modulo operation it’s the exact reverse, we care about the remainder alone and we don’t care for the quotient.
We’ll use the ‘%’ sign to express modulo or remainder. If we apply it to our desired index, we can get its exact position. how? Let’s get back to the 44th subindex of CC_{3}. The modulo operation says that
44%3 = 2
(20)
So, our desired digit is the second in ‘114’, which perfectly coincides with what we saw before (remember that the zeroth and the first digits of ‘114’ are both 1).
In summary, once we figure out the subindex of our block, all we need to do is to divide it by the length of the regular strings and we get the result immediately. Using regular division we get:
44/3 = 14 With remainder 2
(21)
6. Why all this fuzz, then?
Many problems in Project Euler are not meant to be brute forced. Sure, you could generate a million characters of CC and then search along that string for the desired d_{1}, d_{10}, d_{100}, …, d_{1000000}. But this naive procedure does not (necessarily) scale well for even larger values. Moreover, it also has the disadvantage of being naive: it doesn’t teach its users anything fundamentally new.
So let’s summarize this broad algorithm and see it in action for a really large value. What is the 987 654 321 123 456 789th digit in CC?
7. A less verbose summary of the algorithm
 Remember to count from zero
 We need to find the i = 987 654 321 123 456 788th index, starting from zero
 Figure out which block (K) to use
 There are many ways of doing this. The most simple one is to try with several values of K and iterate like in the table 1 above. As soon as we get a value of L_{k} > i, we can stop. Even for a number this big, k doesn’t grow up too quick. In our example, it can be calculated that k = 17 is enough for our purposes (L_{17} = 1 688 888 888 888 888 889 > i)
 Figure out where to start (the subindex)
 The subindex is the index in the particular block CC_{k} found. The 1st index in each block is in the table 1 above, as F_{k}, but astute readers should remember that that table was built when we were still counting from 1 so the zeroth index per block should really be F_{k} − 1. So, in k = 3 the zeroth index should be 189. The subindex i_{k} then is calculated as i_{k} = i − F_{k}. For our gigantic index, we can see that i_{17} = i − F_{17} = 828 765 432 234 567 899
 Figure out the base number in CC_{k}’s construction
 We know that block CC_{k} is constructed wih numbers kdigits long. So, by calculating i_{k}//k we can figure out the base number. This base number is the n − th number we would write down if we wrote down CC_{k}. Also, we know that the zeroth base number in CC_{k} is always 10^{k − 1} because that’s what we used to define the blocks in the first place (see the ‘Number range’ column in table 1). While difficult to write, this simply means that the base number is just b_{ik} = (i_{k}//k) + 10^{k − 1} which in our case is b_{i17} = i_{17}//7 + 10^{16} = 58 750 907 778 503 994
 Figure out which digit of the base to use
 Almost there! Instead of the gigantic CC, we only need to search in the relatively tiny 17digit base number. But, which number to use? Simple, the one resulting from the modulo division b_{ik}%k. In this example, it can be calculated that 58 750 907 778 503 994%17 = 1, which means it’s the first digit of b_{i17}, which is…?
8. It’s obvious I don’t understand this problem
Some people say that you don’t really understand a problem until you’re able to explain it to a newbie with clarity and precision. From my diatribe above it’s clear to me that I’m not yet fluent enough with this kind of knowledge, but I’m proud to have been able to solve it without external help by good old fashioned reasoning, pen and paper and four cups of coffee.
The actual program is quite fast, I didn’t expect it to be so good. There might be many improvements upon it, but those will have to be learned the hard way, too.
I’m not showing the actual meat of the code, because that goes against the spirit of the rules of Project Euler. Instead, I want to show you the last few lines of code and what my console spat out.
start_time = time.time()
print(getChampernowneIndex(100000000000000000))
print(getChampernowneIndex(987654321123456788))
print(" %s seconds " % (time.time()  start_time))
In 1 : runfile('???/Euler/40.py', wdir='???/Euler')
CC index is 100000000000000000
K is 16
Subindex is 85111111111111111
Base number is 6319444444444444
Remainder is 7
Answer is 4

CC index is 987654321123456788
K is 17
Subindex is 828765432234567899
Base number is 58750907778503994
Remainder is 1
Answer is 8
 0.001008749008178711 seconds 
(Yes, the first digit of b_{i17} is 8; the zeroth is 5)
← Project Euler Guide: Problem 39  Project Euler Guide: Problem 40  Project Euler Guide: Problem 41 →