Если интересно, почему "VAX загнулся" и при чем тут сложные для реализации инструкции - есть Long Read:
https://yarchive.net/comp/vax.html
Вот цитата, которая объясняет невозможность реализации Out-Of-Order (OOO) execution в VAX:
ILP, NORMAL INSTRUCTIONS, and IA-32 VERSUS VAX
Consider the normal unprivileged instructions that need to be executed
quickly, meaning with high degrees of ILP, and with minimal stalls from
memory system.
RISC instructions make 0-1 memory reference per operation. Despite the
messy encodings, *most* IA-32 instructions (dynamic count) can be
directly decoded into a fixed, small number of RISC-like micro-ops,
with register references renamed onto the (larger) set of physical
registers. Both IA-32 and VAX allow unaligned operations, so I'll
ignore that extra source of complexity in the load/store unit.
In an OOO design, the front-end provides memory references to a
complex, highly-asynchronous load/store/cache control unit, and then
goes on. In one case, [string instructions with REP prefix], IA-32
needs the equivalent of a microcode loop to issue a stream of micro-ops
whose number is dependent on an input register, or dynamically, on
repeated tests of operands. Such operations tend to lessen the
parallelism available, because the effect is of a microcode loop that
needs to tie together front-end, rename registers, and load/store unit
into something like a lock-step. Although this doesn't require that
all earlier instructions be retired before the first string micro-ops
are issued, it is likely a partial serializer, because it's difficult
do much useful work beyond an instruction that can generate arbitrary
numbers of memory references (especially stores!) during its execution.
However, the VAX has more cases, and some frequent ones, where the
instruction bits alone (or even with register values) are insufficient
to know even the number of memory references that will be made, and
this is disruptive of normal OOO flow, and is likely to force difficult
[read: complex, high-gate-count or long-wire] connections among
functional blocks on a chip. Hence, while the VAX decoding complexity
can be partially ameliorated by a speculative OOO design with decoded
cache [I alluded to this in the RISC CISC 1991 posting], it doesn't
fix the other problems, which either create microcode lock-steps
between decode, load/store, and other execution units, or require other
difficult solutions. In some VAX instructions, it can take a dependent
chain of 2 memory references to find a length!
VAX EXAMPLES [1], [2], especially compared to IA-32 [10] and sometimes
S/360.
Specific areas are:
- Decimal string ops
- Character string ops
- Indirect addressing interactions with above
- VAX Condition Codes (maybe)
- Function calls, especially CALL*/RET, PUSHR/POPR.
DECIMAL STRING OPERATIONS: MOVP, CMPP, ADDP, SUBP, MULP, DIVP, CVT*,
ASHP, and especially EDITPC: are really, really difficult without
looping microcode. [S/360 has same problem, which is why (efficient)
non-microcoded implementations generally omitted them. The VAX
versions, especially the 3-address forms, are even more complex than
the 2-address ones on S/360, and there are weird cases. DIVP may
allocate 16-bytes on the stack, and then restore the SP later.




Ответить с цитированием