ADDENDUM : here's an illustration of how the special LZMA modes help on structured data. Say you have a file of structs; the structs are 72 bytes long. Within each struct are a bunch of uint32, floats, stuff like that. Within something like a float, you will have a byte which is often very correlated, and some bytes that are near random. So we might have something like :
[00,00,40,00] [7F 00 3F 71] ... 72-8 bytes ... [00,00,40,00] [7E 00 4C 2F]
... history ... * start here
we will encode :
00,00,40,00 :
4 byte match at offset 72
(offset 72 is probably offset0 so this is a rep0 match)
7E :
delta literal
encode 7E ^ 7F = 1
00 :
one byte match to offset 72 (rep0)
4C :
delta literal
encode 4C ^ 3F = 0x73
2F :
regular literal
Also because of the position and state-based coding, if certain literals occur often in the same spot in the pattern, that will be captured very well.
Note that this is not really the "holy grail" of compression which is a compressor that figures out the state-structure of the data and uses that, but it is much closer than anything in the past. (eg. it doesn't actually figure out that the first dword of the structure is a float, and you could easily confuse it, if your struct was 73 bytes long for example, the positions would no longer work in simple bottom-bits cycles).