Branch Prediction #1308

splinedrive · 2025-03-09T19:01:30Z

splinedrive
Mar 9, 2025

I really like your Core Wally design as a reference for structuring an SoC.

However, I noticed that you are using synchronous RAM for branch prediction, which introduces a one-cycle delay in obtaining the prediction. Why didn't you use LUTRAM (distributed RAM) instead, which would allow for an immediate prediction in the fetch stage?

I haven't simulated your SoC, but I have analyzed the code. Does it really make sense to always lose one cycle for branch prediction in a 5-stage pipeline CPU? Or did I misinterpret the code?

I understand that with BRAM, you get more lines than with LUTRAM. Why did you decide to do it this way, if my assumption is correct?

rosethompson · 2025-03-09T19:33:21Z

rosethompson
Mar 9, 2025
Collaborator

Wally is also intended for ASIC implementation and not just FPGA. Using a synchronous RAM makes the read path compatible with SRAM. Given the large disparity in bit density we decided to optimize for area. The same trade off exists for both the instruction and data caches. Fortunately we work around the extra cycle of latency so that the BTB and the direction predictor output their prediction for the matching instruction in the Fetch stage. The next PC (PCNextF) is sent to the I$ and the branch predictor before the rising clock edge so that during the Fetch stage we get the prediction result for the corresponding instruction without any delay. In other words we aren't sending PCF to the branch predictor.

0 replies

splinedrive · 2025-03-09T19:48:26Z

splinedrive
Mar 9, 2025
Author

Hello,

thank you for the response. That means this could be seen as a prefetch branch prediction mechanism?

0 replies

rosethompson · 2025-03-09T19:57:02Z

rosethompson
Mar 9, 2025
Collaborator

It's not so much as a prefetch but as the address setups the cycle before. It's functionally equivalent to a flip-flop or LUTRAM asynchronous read with PCF driving the read address port of the branch predictor. You could think of it as if the register was pushed from the PCNextF to PCF register into the Branch Predictor's SRAM.

0 replies

splinedrive · 2025-03-09T20:00:32Z

splinedrive
Mar 9, 2025
Author

Thank you very much for the explanation, you have helped me a lot. Thanks for the great work—you are an awesome team!

0 replies

rosethompson · 2025-03-09T21:51:03Z

rosethompson
Mar 9, 2025
Collaborator

You are welcome. Happy to help anytime.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Branch Prediction #1308

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Branch Prediction #1308

Uh oh!

splinedrive Mar 9, 2025

Replies: 5 comments

Uh oh!

rosethompson Mar 9, 2025 Collaborator

Uh oh!

splinedrive Mar 9, 2025 Author

Uh oh!

rosethompson Mar 9, 2025 Collaborator

Uh oh!

splinedrive Mar 9, 2025 Author

Uh oh!

rosethompson Mar 9, 2025 Collaborator

splinedrive
Mar 9, 2025

rosethompson
Mar 9, 2025
Collaborator

splinedrive
Mar 9, 2025
Author

rosethompson
Mar 9, 2025
Collaborator

splinedrive
Mar 9, 2025
Author

rosethompson
Mar 9, 2025
Collaborator