Andreas Gustafsson's Blog

Thursday, August 13, 2009

Z80 optimizing

It's funny how you can look at some code and think, yeah, that's as good as it gets, then suddenly you get a bright idea and realize that - no, it wasn't!
I am working on yet another little demo at the moment, on a system that uses the Z80 CPU (not saying any more than that at the moment! :))
For one of the effects, a single iteration of an unrolled loop looked like this:

exx ; 4
ld a,(bc) ; 7
inc bc ; 6
ld l,a ; 4
ld d,(hl) ; 7
ld a,(bc) ; 7
inc bc ; 6
ld l,a ; 4
ld a,(hl) ; 7
sla a ; 8
or d ; 4
exx ; 4
ld (de),a ; 7
inc de ; 6
---
81 T-STATES

That was my first, off-the-top-of-my-head implementation made a couple of days ago. I was convinced that it was possible to do much better, but I couldn't figure anything out.

Then yesterday I made some progress and came up with this, by swapping around some registers and realizing that in this case rlca worked just as well as sla a and saved 4 T-states:

exx ; 4
ld c,(hl) ; 7
inc hl ; 6
ld a,(bc) ; 7
ld d,a ; 4
ld c,(hl) ; 7
inc hl ; 6
ld a,(bc) ; 7
rlca ; 4
or d ; 4
exx ; 4
ld (de),a ; 7
inc de ; 6
---
73 T-STATES

I sat and stared at the solution for a quite long, and thought, yeah, that's optimized!
But today I discovered that by rearranging my data and using the nice ex de,hl instruction, there were cycles to be saved, and I ended up with this:

ld e,(hl) 7
inc hl ; 6
ld a,(de) ; 7
rlca ; 4
ld e,(hl) 7
inc hl ; 6
ex de,hl ; 4
or (hl) ; 7
ex de,hl ; 4
ld (bc),a ; 7
inc bc ; 6
---
65 T-STATES


So NOW it must be as optimized as possible, right? :)

Labels: ,