Z80 optimizing
It's funny how you can look at some code and think, yeah, that's as good as it gets, then suddenly you get a bright idea and realize that - no, it wasn't!
I am working on yet another little demo at the moment, on a system that uses the Z80 CPU (not saying any more than that at the moment! :))
For one of the effects, a single iteration of an unrolled loop looked like this:
That was my first, off-the-top-of-my-head implementation made a couple of days ago. I was convinced that it was possible to do much better, but I couldn't figure anything out.
Then yesterday I made some progress and came up with this, by swapping around some registers and realizing that in this case rlca worked just as well as sla a and saved 4 T-states:
I sat and stared at the solution for a quite long, and thought, yeah, that's optimized!
But today I discovered that by rearranging my data and using the nice ex de,hl instruction, there were cycles to be saved, and I ended up with this:
So NOW it must be as optimized as possible, right? :)
I am working on yet another little demo at the moment, on a system that uses the Z80 CPU (not saying any more than that at the moment! :))
For one of the effects, a single iteration of an unrolled loop looked like this:
exx ; 4
ld a,(bc) ; 7
inc bc ; 6
ld l,a ; 4
ld d,(hl) ; 7
ld a,(bc) ; 7
inc bc ; 6
ld l,a ; 4
ld a,(hl) ; 7
sla a ; 8
or d ; 4
exx ; 4
ld (de),a ; 7
inc de ; 6
---
81 T-STATES
That was my first, off-the-top-of-my-head implementation made a couple of days ago. I was convinced that it was possible to do much better, but I couldn't figure anything out.
Then yesterday I made some progress and came up with this, by swapping around some registers and realizing that in this case rlca worked just as well as sla a and saved 4 T-states:
exx ; 4
ld c,(hl) ; 7
inc hl ; 6
ld a,(bc) ; 7
ld d,a ; 4
ld c,(hl) ; 7
inc hl ; 6
ld a,(bc) ; 7
rlca ; 4
or d ; 4
exx ; 4
ld (de),a ; 7
inc de ; 6
---
73 T-STATES
I sat and stared at the solution for a quite long, and thought, yeah, that's optimized!
But today I discovered that by rearranging my data and using the nice ex de,hl instruction, there were cycles to be saved, and I ended up with this:
ld e,(hl) 7
inc hl ; 6
ld a,(de) ; 7
rlca ; 4
ld e,(hl) 7
inc hl ; 6
ex de,hl ; 4
or (hl) ; 7
ex de,hl ; 4
ld (bc),a ; 7
inc bc ; 6
---
65 T-STATES
So NOW it must be as optimized as possible, right? :)