RISC-V的独到之处 (How is RISC-V different from what I know)

Due to my PhD program, this term I have to be the TA of an undergraduate course one more time. Fortunately since the course is started by my supervisor and the students I meet are all carefully selected as the most clever students, it gives me much freedom to try new architecture ideas and help me gain teaching skills significantly. This year is the second time since we opened this course, so we decided to switch the teaching instruction set architecture (ISA) from MIPS to RISC-V. Therefore, I have started to learn RISC-V from scratch recently. In this post I want to put up my thoughts of for what I am surprised by the popular RISC-V, as a role of MIPS learner for many years.

Reference books

I learned RISC-V from mostly the following books:

RISC-V Special designs

I will briefly introduce what I think RISC-V is different from other instructions, in the high-level design. Now in China there are lots of praises to RISC-V, announcing it is a totally different ISA. Most are talking about its open-source. I want to know more about its technical novelties.

Modularism

RISC-V intentionally reserves some opcodes for extension. The basic set (i.e., RV32I) has only tens of instructions, even excluding floating-point operations! RISC-V consists of the basic I set and lots of optional standard extension sets: multiplication and division (extension M), floating-point (extension F and D for both accuracy), atomic operation (extension A), compressed instruction (extension C). More developing extensions include instructions about bit-level operations, just-in-time translations, SIMD, vector processing, user-level interruptions and more. For a general computer, the extension pack G=IMAFD can be used. For a general For some embedded devices, they can throw out the floating-point part.

We can even build our own instruction types and extension pack. Here is an example. I have to say this is a great point for an undergraduate course lab, changing from compiler to ISA. But I cannot find a representative extension for them to do.

Advantages:

  • I would say this optimizes the instruction development, as people can keep discussing the best format of one functionality, while freezing existing sets.
  • Keep the basic set small. Good news for embedded devices

Drawbacks:

  • RISC-V binary is not universal, even there is no compatibility. Different extension selections cause a different ISA.
  • May cause chaos for extension selecting. Extensions with conflicted opcodes cannot be selected at the same time. People may be unaware about this.
  • More ABIs. Multiple ABIs are intended to simulate certain non-exist instructions on target machines.But it confuses the programmer and compilers. RISC-V has three popular ABIs for integer, single floating and double floating respectively.

Vector processing implementation

For a long time, SIMD technique was implemented in MMX, SSE and AVX ways. This brings a lot of troubles. This blog details more and I totally agree with him. RISC-V recommends using vector instructions to implement SIMD on CPU processors.

Advantage:

  • It allows the same binary to execute with the best performance on all kinds of RV32V ISAs.
  • It simplifies the designs of compilers.

Drawbacks

Other good but not novel points

The following interesting design points are not formed totally owing to triers and errors of its predecessors, or just low-level improvements.

  • Virtualization: the original x86 does not support any virtual instructions (not their faults!) but RISC-V does. But I do not think this should not give the special merits to RISC-V.
  • More elegant compressed instructions. This is carefully selected from the beginning of RV32I design. It uses a narrower opcode field for the same function. The final binary is much smaller, competitive with CISC x86.
  • No at register of implementing pseudo instructions. Because RISCV requires all immediate fields to be signed extended. More about this are on the HW1 problem2 of this-year arch course.
  • No unconditional jump instruction. Use jalr x0 for unconditional jump. Never think of that way. One opcode save!
  • Same instruction format for ld/st instructions. It helps decoding rd and rs before calculating address offset.
  • wfi (wait-for-interrupt) instruction. This turns the processor to low-energy mode. Interesting!
  • No math exceptions. Overflow will not start exceptions. The compiler can add detection instruction if exception instruction is in need.
  • auipc to get or update PC directly. In contrast, x86/mips needs extra memory accesses (https://stackoverflow.com/questions/15331033/how-to-get-current-pc-register-value-on-mips-arch). Arm is too open since each R-type may modify PC register.

Outlook

Undergraduate architecture course lab summary

Personally I think we have done a great job last year, where I designed a novel course lab program to give the students the intuition on software acceleration and more. However it seems the feedback from them was not quite good. This experience deserves another new post beyond this scope, and I will write up at another time.

Tailored instruction set

This can be viewed as another variant from RISC-V modularism. I am expecting a novel way of executing a program, whose execution is only specialized by its user’s profiling information, in other word, a tailored instruction set. The most feature of all current ISAs are that they are universal. However, the demand of programmers and users is getting more and more various these days. But a good classical ISA should be “static” - avoiding any change for destroying back-compatibility. For example, for controlling-dominant environments (like front-end servers), more controlling opcodes are expected to be used, to get a smaller binary and faster execution. But for computing-dominant environments (like SIMD users of scientific computing), they care more about registers. The current 31 available registers are not suitable for them.

To make the applications portable, maybe open-source code, interpreter, (Just-In-Time) JIT techniques and reprogrammable fabric are important. Imagine the following workflow! Open-source code gets distributed from the same copy, then compiled to a base ISA and executed for the first time. At the same time the OS starts collecting profiling information. After a complex mechanism, it decides to evolve to a customized ISA to have more registers. Finally the ISA is automatically generated, the hardware is reprogrammed in epoch, and the JIT compiler migrates the software to be on this ISA.

Let me end up here with what Frances Elizabeth said:

The only way to realistically realize the performance goals and make them accessible to the user was to design the compiler and the computer at the same time. In this way features would not be put in the hardware which the software could not use or which the software could support more effectively.

---以上---