Instead of using a struct member to hold the last barrel shifter carry output, which is expansive as it needs to be saved/loaded from memory, I now pass the carry around as an INOUT parameter.
Using perf anotate really shows high perctile of samples being spent on reading/writing `self.bs_carry_out`
Since this is a rather "surgical" changeset, I have made sure to run it against eggvance test suite, mGBA test suite and some games as well.
I actually saw better improvements than what the benchmark measured, but 7% is decent enough :)
```
run_60_frames time: [180.18 ms 180.45 ms 180.77 ms]
change: [-7.2464% -6.9081% -6.6324%] (p = 0.00 < 0.05)
Performance has improved.
```
Former-commit-id: 7cd7105a07aa0b78cab9dc8bbae3682b02b7ab7c
Former-commit-id: c68514beb3fa6c34f5f65544acbead21e527dbb0
Fulfill TODO from long ago, I used perf-record (--call-graph dwarf) and detected that add_cycles() was hot enough,
I added 2 optimizations:
- Removed bound checks from array accesses
- Increase the LUT size to include dummy entries for open-bus to eliminate the if check
run_60_frames time: [183.65 ms 183.69 ms 183.73 ms]
change: [-9.4414% -9.2849% -9.1315%] (p = 0.00 < 0.05)
Performance has improved.
Former-commit-id: 1cbb596b856e604ad6c48eb0d47771e7cee44d1e
Former-commit-id: 9f15e35237f343d0c816fd9d51d81081736d9e17
Also, clean up some of the cfg(debug) mess
benchmark report:
run_60_frames time: [110.61 ms 111.01 ms 111.42 ms]
change: [-7.1171% -6.5425% -5.9132%] (p = 0.00 < 0.05)
Performance has improved.
Former-commit-id: 9cb82e483c8a78632d0deae20adca9fc1843a76b
Former-commit-id: 6d9be5ddaf72f2b9c02063fa067f2ffbaea4fdb6
* Convert gpu bitfield!() registers to unpacked form, and defer pack/unpack to bus read/write operations
Former-commit-id: 26e7d7d62d6418ce7bcdb8e414cabe5ddb56333d
Former-commit-id: 716ddd9fe2b7b95b7613fc549a7bee406272478b
run_60_frames time: [158.26 ms 160.84 ms 163.03 ms]
change: [-24.960% -18.808% -12.975%] (p = 0.00 < 0.05)
Performance has improved.
Wasn't expecting such an improvment tbh, but who am I to argue with results
Former-commit-id: a5ba74bffa26d962a232c0767a34a7d67ed8ccb4
Former-commit-id: 1b9b301ba9012e79e66822ac39af51df28c51fa4
It's quite an handy way to log messages from ROMs, so I thought it'd be nice to add it.
Former-commit-id: 6869bdb58cfa883ac1ca6832f0bbeeab0edcf552
Former-commit-id: 89d7d826c7a906bbb68f9f4305bb92cd50bb2296
Fixes#109Fixes#106
Fixes MegaMan email menu freezing and probably some more games
Former-commit-id: ed37520f2bc732b07334261dfe3d23cccf3fc04c
Former-commit-id: d7f206b0f405ffe09a3b36d90268f1d683a64cea
Just directly impl Bus trait for Box<[u8]>
Former-commit-id: 7b8a29972520afb7ff197708b9c2146b293a5f29
Former-commit-id: 0c528165ed899fad14b1e25995fdfe8ae004da2a
A lot of test ROMs i'm using don't bother to calculate the checksum :\
Former-commit-id: da02a70271c34bc26e560ea18b3f5052ee171a65
Former-commit-id: 332d917e47b268ae649574844d14cab2da65197d
Some big refactors:
* improve scheduler performance by using a BinaryHeap
* refactor the scheduler API
* arm7tdmi
* Change struct arm7tdmi::Core struct layout so frequently accesses fields would benefit from CPU cache
* Simplify and cleanup cycle counting by implementing a MemoryInterface trait
* Still not passing many cycle accuracy tests, but I believe it's because I don't have the prefetch buffer yet.
* Timer overflows are now scheduled
* This fixes#111 and fixes#112
*
Former-commit-id: 17989e841a1ea88c2a7e14f4c99b31790a43c023
Former-commit-id: 109d98d824a464de347f6590a6ffe9af86b4b4ea
This breaks the API of GameBoyAdvanvce::save_state and restore_state methods.
Currently as WIP only SDL2 frontend will adjust.
Former-commit-id: 1df15c8697fef0f6adddb07a6d653947c622ba12
Former-commit-id: 2ea339dc6a0d1e7539d167c4df29694b408303da
A more robust cycle aware event scheduling, to easily implement serial-io, dmg audio channels and improve accuracy.
This brings a slight performance hit :/
I also ran dos2unix on some of the files :D
Former-commit-id: 62f4ba33e3a083b7976d6512ba6f5720ec493aa0
Former-commit-id: a4b3a92cd1eb156bbe9fd0ef77fbb0e7a45660cb
Avoid passing ArmInstruction struct to handlers, as accesses to its fields can result in memory operations.
Former-commit-id: 6ea1719e36a0fefa1b30bdae4d6e8ab4dbf3af1a
Former-commit-id: e5855b8258f98d3f4c0819f3aec2fd0f47fef545
The `if let Some(gpio) = &self.gpio` causes a memory read of `self.gpio` for every Bus::read/write_16.
It is better to reverse the order since `is_gpio_access` does not generate and memory reads and thus less costly.
Former-commit-id: bcce7d9c3a2b159a7f6b291d7b08ccf9c4d0db14
Former-commit-id: 69c12db503c9e612faa7cd8a57f6d862694c8370
Mainly convert mainloop and audio thread into native code for
performance increase. (Calling into JNI every frame was costy)
The code was cleaned up quite a bit, but I may have introduced new bugs
in this process :<
Former-commit-id: fdbc21b5ab39f3d2e36647fd1177dc9a84a16980
Former-commit-id: ac765dbee8c994e1b69cc694846511837c2685b9
Serde doesn't like Rc that much :(
Fixes#142
Former-commit-id: e1e8a96b4867e351d103fb7d92d71b0434e8fc31
Former-commit-id: 28366bbb36b3e93b574f397b103a483844fd8131
While saving code re-use, it won't allow flexibility for special casing
specific size bus accesses which are much needed in order to emulate
open-bus and bios reads
Former-commit-id: 952a30a130612d61b3f5047b1f1c3cbda9bd58a8
Former-commit-id: ad3a25c012853399591d79f4f1a4423ea9e6645e