### Z80 optimizing

It's funny how you can look at some code and think, yeah, that's as good as it gets, then suddenly you get a bright idea and realize that - no, it wasn't!

I am working on yet another little demo at the moment, on a system that uses the Z80 CPU (not saying any more than that at the moment! :))

For one of the effects, a single iteration of an unrolled loop looked like this:

That was my first, off-the-top-of-my-head implementation made a couple of days ago. I was convinced that it was possible to do much better, but I couldn't figure anything out.

Then yesterday I made some progress and came up with this, by swapping around some registers and realizing that in this case

I sat and stared at the solution for a quite long, and thought, yeah, that's optimized!

But today I discovered that by rearranging my data and using the nice

So NOW it must be as optimized as possible, right? :)

I am working on yet another little demo at the moment, on a system that uses the Z80 CPU (not saying any more than that at the moment! :))

For one of the effects, a single iteration of an unrolled loop looked like this:

exx ; 4

ld a,(bc) ; 7

inc bc ; 6

ld l,a ; 4

ld d,(hl) ; 7

ld a,(bc) ; 7

inc bc ; 6

ld l,a ; 4

ld a,(hl) ; 7

sla a ; 8

or d ; 4

exx ; 4

ld (de),a ; 7

inc de ; 6

---

81 T-STATES

That was my first, off-the-top-of-my-head implementation made a couple of days ago. I was convinced that it was possible to do much better, but I couldn't figure anything out.

Then yesterday I made some progress and came up with this, by swapping around some registers and realizing that in this case

**rlca**worked just as well as**sla a**and saved 4 T-states:

exx ; 4

ld c,(hl) ; 7

inc hl ; 6

ld a,(bc) ; 7

ld d,a ; 4

ld c,(hl) ; 7

inc hl ; 6

ld a,(bc) ; 7

rlca ; 4

or d ; 4

exx ; 4

ld (de),a ; 7

inc de ; 6

---

73 T-STATES

I sat and stared at the solution for a quite long, and thought, yeah, that's optimized!

But today I discovered that by rearranging my data and using the nice

**ex de,hl**instruction, there were cycles to be saved, and I ended up with this:

ld e,(hl) 7

inc hl ; 6

ld a,(de) ; 7

rlca ; 4

ld e,(hl) 7

inc hl ; 6

ex de,hl ; 4

or (hl) ; 7

ex de,hl ; 4

ld (bc),a ; 7

inc bc ; 6

---

65 T-STATES

So NOW it must be as optimized as possible, right? :)

## 3 Comments:

Donald Knuth: "premature optimization is the root of all evil"

By Hauk, at 08:51

gameboy!

By kisstank, at 12:53

Nope, not gameboy, as that lacks the EXX instruction (as it lacks the shadow registers).

I suspect this demo has since been released. With that said, a more suitable replacement for SLA A would have been ADD A,A, which off the top of my head takes up the same amount of cycles as RLCA but works OK if you have the upper bit set.

By iamgreaser, at 06:18

Post a Comment

<< Home