Instead of using a struct member to hold the last barrel shifter carry output, which is expansive as it needs to be saved/loaded from memory, I now pass the carry around as an INOUT parameter.
Using perf anotate really shows high perctile of samples being spent on reading/writing `self.bs_carry_out`
Since this is a rather "surgical" changeset, I have made sure to run it against eggvance test suite, mGBA test suite and some games as well.
I actually saw better improvements than what the benchmark measured, but 7% is decent enough :)
```
run_60_frames time: [180.18 ms 180.45 ms 180.77 ms]
change: [-7.2464% -6.9081% -6.6324%] (p = 0.00 < 0.05)
Performance has improved.
```
Former-commit-id: 7cd7105a07aa0b78cab9dc8bbae3682b02b7ab7c
Former-commit-id: c68514beb3fa6c34f5f65544acbead21e527dbb0
Some big refactors:
* improve scheduler performance by using a BinaryHeap
* refactor the scheduler API
* arm7tdmi
* Change struct arm7tdmi::Core struct layout so frequently accesses fields would benefit from CPU cache
* Simplify and cleanup cycle counting by implementing a MemoryInterface trait
* Still not passing many cycle accuracy tests, but I believe it's because I don't have the prefetch buffer yet.
* Timer overflows are now scheduled
* This fixes#111 and fixes#112
*
Former-commit-id: 17989e841a1ea88c2a7e14f4c99b31790a43c023
Former-commit-id: 109d98d824a464de347f6590a6ffe9af86b4b4ea
A more robust cycle aware event scheduling, to easily implement serial-io, dmg audio channels and improve accuracy.
This brings a slight performance hit :/
I also ran dos2unix on some of the files :D
Former-commit-id: 62f4ba33e3a083b7976d6512ba6f5720ec493aa0
Former-commit-id: a4b3a92cd1eb156bbe9fd0ef77fbb0e7a45660cb
Mainly convert mainloop and audio thread into native code for
performance increase. (Calling into JNI every frame was costy)
The code was cleaned up quite a bit, but I may have introduced new bugs
in this process :<
Former-commit-id: fdbc21b5ab39f3d2e36647fd1177dc9a84a16980
Former-commit-id: ac765dbee8c994e1b69cc694846511837c2685b9