Actually, most high end
cpu's today are a mixture of Harvard and von Neumann architectures.
On the outside, they are von Neumann, with a single bus for data and program memory. Superficially, the memory is byte structured; however, the actual bus is probably word structured, for higher throughput. Some even require or prefer program and data to be word aligned.
Misalignment carries the penalty of either decreased performance or a bus error, depending on the architecture.
Internally, the cpu is Harvard architecture.
The core of the cpu is buffered from the
outside bus by high speed cache memory and a cache controller. For higher performance, the cpu internally has
a separate program bus and data bus, each with their
own cache. The cache controller for the data
cache does bus snooping to update the copy in cache if other devices on the bus change main memory. However, the
instruction memory is typically not kept up to date,
as this both reduces circuitry in the cpu and increases
performance.
As a consequence of this, once in cache, the program memory
is essentially read only, which is typical of Harvard architecture. This makes self modifying code tricky, as the program must be modified before the code is ever run, or the cache must be flushed (either by an explicit invalidation or by waiting for the cache line to be replaced with another) before the modifications become effective.