Decoded: The Bisqwit NES emulator A beginner-friendly, line by line walkthrough by MaiZure Original code: http://bisqwit.iki.fi/jutut/kuvat/programming_examples/nesemu1/nesemu1.cc /w line numbers: http://www.maizure.org/projects/decoded-bisqwit-nes-emulator/nesemu1_lines.txt Line Comments 1 Includes stdint.h for platform specific typedefs (ex. uint_least32_t) 2 Includes signal.h for event trapping (see 46) 3 Includes assert.h for condition checking (see 914) 4 Includes cmath for trig/pow functions and extended number formats 5 BLANK 6 Includes Simple Direct Medialayer 1.2 to manage graphics 7 Includes vector container to manage the NES memory abstraction PORTABILITY NOTE: (c)stdio(.h) and (c)string(.h) are not included despite using functions from these libraries (FILE operations and memcpy). Certain platform/compiler combos explicity require these includes (Windows/MinGW),while others do not (Linux/gcc). 8 BLANK 9 COMMENT 10 COMMENT 11 COMMENT 12 BLANK 13 Hardcodes a file name opened later to read simulated user input (see 507) 14 BLANK 15 COMMENT NOTE: Types *_leastN_t ensure that the smallest possible platform-specific primitive that can contain N bits is used. Check stdint.h for your platform typedef resolutions. 16 Typedefs u32 to keep future usage short...the Linux kernel does this too 17 Typedefs u16 18 Typedefs u8 19 Typedefs s8. Note SIGNED 20 BLANK 21 COMMENT RegBit The RegBit object provides a concise abstraction to manage bit-level register access of all possible types. It defines a register size based on a data type, then defines a field of the register as a starting bit and a width. Regbit operates on that field by masking the register bits and using appropriately overloaded assignment and incrementing operators. 22 Sets a template for RegBit. Defaults to 1 bit width of a u8 23 Defines RegBit as a struct. All members have public access 24 BLOCK START - struct RegBit 25 Declares RegBit's base data type (templated) 26 Defines the mask width based on field width. For example: A field of size 3 within an 8-bit register will have a mask of 00000111 because 00000001 << 3 is 0001000. Subtract 1 to reach the final mask 00000111 27 Sets template for possible register assignment values. 28 Define assignment operator for RegBit to return a pointer to itself. 29 BLOCK START - RegBit assignment '=' overload NOTE ON LOGICAL OPERATIONS: Masks are commonly used with logical AND (&) to isolate specific bits of a register. The inverse (~) of the mask can be used to clear the target region. We can also set regions using logical OR (|) of the target field and masked data. Finally, don't forget to bitshift (<< >>) data in to it's proper position. The mask may be stored shifted or unshifted, as in the case of RegBit. 30 Applies the masked and shifted data to the cleared register field. 31 Return a pointer to self after updating. 32 BLOCK END - RegBit assignment '=' overload 33 Function (()) overload. Invoking RegBit returns the field value after unshift/mask from the register. 34 Prefix increment overload. return self after incrementing. 35 Postfix increment overload. Store and return old value after prefix increment 36 BLOCK END - struct RegBit 37 BLANK NAMESPACES NOTE: The code defines six namespaces (including global/std). Each namespace segregates objects and functions related to their purpose within the NES. Normally, these namespaces would be separated in to separate files. Namespace IO - IO components manage visual output and control input using SDL Namespace PPU - Components related to NES graphics (picture processing unit) Namespace APU - Components related to NES audio (audio processing unit) Namespace CPU - Components related to the NES CPU (Ricoh 2A03 / 6502) Namespace GamePak - Components related to NES cartridge access IO The IO namespace couples SDL to the emulated NES hardware. Two key elements are the NTSC decoder and the NES handheld controller. Much of the decoder function is described at http://wiki.nesdev.com/w/index.php/NTSC_video 38 Namespace change to IO. 39 BLOCK START - namespace IO 40 Declares a pointer to an SDL_Surface 41 IO::Init() - responsible for bringing SDL in to the application 42 BLOCK START - IO::Init, Initializes SDL 43 Initialize SDL for video use only 44 Also initializes SDL video -- This is unnecessary since line 43 forwards to this function by default. These two lines may have been used differently during development and simply left in this state. 45 Sets the SDL application to the default resolution generated by the PPU (256 x 240). The application framebuffer is returned and stored in our SDL_Surface pointer. NOTE: SetVideoMode has been removed from modern SDL 2.0 and should be replaced with the SDL_Window abstraction and the surface retrieved via SDL_GetWindowSurface. 46 Assigns your kernel's default handler for SIGINT. Typically this means Ctrl-C will end the process. There isn't a cross-platform guarantee...maybe some OS's ignore (SIG_IGN) SIGINT by default. 47 BLOCK END - IO::Init, Initialized SDL 48 BLANK 49 IO::PutPixel draws the input pixel position to the screen. It includes the decoder to convert the (simulated) analog output of the PPU to the discrete RGB pixel data expected by the SDL_Surface backbuffer. PutPixel is called directly from the end of PPU::render_pixel, ensuring perfect synchonization between PPU and IO. px and py are the input coordinates, pixel holds the color information in a 9-bit format with 4 color bits, 2 level bits, and 3 emphasis bits. Actual format is "eeellcccc" 50 BLOCK START - IO::PutPixel 51 COMMENT 52 COMMENT 53 COMMENT 54 COMMENT 55 COMMENT 56 Declare the palette that will hold all possible outputs from the NTSC YIQ to RGB conversion. 57 COMMENT 58 Check if we need to generate the palette. If this is our first time using PutPixel (e.g. on system startup), then we have to generate the entire palette table. The emulator does not process each pixel on the fly during gameplay, we just look up values from the pre-computed palette cache. NOTE ON THE NTSC VIDEO CALCULATIONS: Beginners really need not worry about this -- this would challenge folks who do DSP for a living. You need to know how NTSC signaling works and should be able to program a working encoder/decoder before attempting to take apart this implementation. Bisqwit took all possible shortcuts here, such as combining the waveform phase, level, and emphasis factors in to a single lookup string. Finally, he confined all results to the integer domain. But to be fair, that's how hardware really works. 59 Outer loop of the NTSC generator used to produce palette offsets 60 2nd loop of NTSC generator that separates channels and adds a source of noise 61 3rd loop of the NTSC generator that cycles through various emphasis states 62 Final loop of the NTSC generator that stores the perturbed color results 63 BLOCK START - Pre-compute RGB palette 64 COMMENT 65 This string serves as a lookup table of voltages except that it doesn't just represent the four lows and highs you typically expect. It also includes (multiplexes?) the effect of the emphasis bits on the voltages. Throw in the fact that it's represented as a character constant and it feels almost cryptographic. 66 Declares and initializes variables for YIQ channels to 0 for each palette calculation 67 Loops 12 times for pixel sampling 68 BLOCK START - NTSC pixel sampling loop 69 COMMENT 70 Chooses a source pixel to pull the color bits from. This is strickly to inject artifacts in to the final result. Roughly one third to one half of the 12 samples choose a neighboring color which produces a flicker. 71 COMMENT 72 Extracts the 4 color bits, 2 level bits, and 3 emphasis bits. Color is a simple modular ring. Emphasis was a 6 bit right shift but using division. And level bits were a combination of both but with the result that only discrete values of 0, 4, 8, and 12 result. These combinations are important so that the proper voltage levels can be indexed cleanly from the signal string. See what I mean about cryptographic? 73 COMMENT 74 Calculates luma by indexing in to the signal string. The color and phase combine to determine a high or low voltage. Generally half (6 of 12 phases) are alternate between high and low, but the waveform itself offsets with each pixel. The emphasis bits also shift the string index, with a slightly different periodicty compared to color. Finally, the level bits shift in steps of 4. 75 COMMENT 76 Additive luma (negative operands are possible). All 12 color samples are combined here. 77 Additive chroma with phase shifted hue on every sample 78 The orthogonal chroma channel from the last 79 BLOCK END - NTSC pixel sampling loop 80 COMMENT 81 Defines a lambda function to apply gamma correction at a hardcoded factor of 1.2 82 Defines a lambda function to clamp a value to a maximum of 255 83 COMMENT 84 Cache red channel of the calculated pixel result after gamma correction and conversion from YIQ via matrix multiplication. Note that this is done in a different iteration than the subsequent channels. 85 Cache the green channel after gamma and YIQ mmult 86 Cache the blue channel after gamma and YIQ mmult 87 BLOCK END - Pre-compute RGB palette 88 COMMENT 89 PutPixel writes the desired RGB pixel to the output SDL surface. During gameplay, this is the only line of consequence executed in PutPixel. 90 Records the most recent pixel 91 BLOCK END - IO::PutPixel 92 IO::FlushScanLine finalizes the current frame and copies the backbuffer to the front if we're on the final line. (Recall 240 lines per screen) 93 BLOCK START - IO::FlushScanLine 94 If we've finished the final line of the screen, swap the back buffer to the front. Note that SDL_Flip() has been removed from modern versions of SDL. The SDL 2.0 equivalent is SDL_UpdateWindowSurface(...) 95 BLOCK END - IO::FlushScanLine 96 BLANK 97 Initialize joystick status field bytes for player 1 and player 2 controllers. We have the current status, upcoming status, and a counter variable used as part of a mask scan. joy_next is a direct feed from the input file and is updated every tick in PPU::tick 98 IO::JoyStrobe is a typical 'strobe' function, which checks and records the current state of the joystick. Since both joysticks are fed in to joy_next variables every tick, a strobe simply copies joy_next to joy_current. Resets the joypos counter 99 BLOCK START - IO::JoyStrobe 100 If the input argument is non-zero, capture Player 1 controller state field in to joy_current and reset the joypos counter 101 If the input argument is non-zero, capture Player 2 controller state field in to joy_current and reset the joypos counter 102 BLOCK END - IO::JoyStrobe 103 IO::JoyRead checks the last strobe of from the joystick for matches against all possible bitfield masks and returns true or false based on matches. 104 BLOCK START - IO::JoyRead 105 Defines the eight possible single-bit fields in a byte 106 Returns true/false against the match of a single mask. The joypos counter iterates through the possible masks across each call to JoyRead. We don't actually see a loop in the emulator to check all possibilities so either it's built in to each emulator as part of the game engine, or it hasn't been programmed. 107 BLOCK END - IO::JoyRead 108 BLOCK END - namespace IO 109 BLANK GamePak The GamePak namespace provides functions supporting the NES ROM bootloader, including allocating, mapping, and accessing memory pages. Much of this namespace is incomplete, especially mapper writes. If your game of choice doesn't work, it's probably due to lack of mapper support. 110 Namespace change to GamePak 111 BLOCK START - namespace GamePak 112 Declares vector containers for PRG ROM and 8KiB of CHR ROM that will be used to hold memory pages 113 Declares integer to hold the mapper id 114 Fixes page size of CHR ROM to 1KiB. then calculates total number of pages based on total memory size (8 Pages) 115 Fixes page size of PRG ROM to 8KiB, then calculates total number of pages based on total memory size (8 Pages) 116 Allocates 4KiB for four 1KiB nametables as NRAM and 8KiB for pattern tables as PRAM 117 Pointer array to the base of the PRG ROM pages. 118 Pointer array to the base of the CHR ROM pages. 119 Pointer array to the base of each 1KiB nametable. 120 BLANK 121 Prepares a template based on number of pages, memory bank pointers, a vector container, and a size constraint per page/bank. 122 GamePak::SetPages constructs pointers to page base address within the memory vectors created earlier. The idea is that all actual data is stored within the vector and the memory banks provide pointers to successive "pages" within the vector. We do this for both PRG ROM (ROM) and CHR ROM (VRAM). 123 BLOCK START - GamePak::SetPages 124 Declares the loop that will set the page pointers in each bank to the underlying memory vector. This ensures that memory (V)banks are properly mapped for general use. v is set to an offset beyond the end of the vector, but isn't part of the loop invarient and is later ringed to the vector anyway. The v definition could be reduced unless other functionality was intended for future use. 125 Defines p as the memory mapped page number in the banks 126 Loop terminates when all pages are mapped. Both tests should be equivalent, but are checked just in case...? 127 Iterates both the current page and the page byte offset 128 Sets the actual page pointer in to the underlying memory vector 129 BLOCK END - GamePak::SetPages 130 Declares the SetROM as a function pointer to the SetPages template using the PRG ROM as the underlying target 131 Declares the SetVROM as a function pointer reference for the SetPages template using the CHR ROM as the underlying target 132 BLANK 133 GamePak::Access provides the inferface to the memory pages and vectors defined earlier. It's one of the possible calls from MemAccess in the CPU namespace. Note that the game ROM has already been loaded in to the respective memory by the time this function is used. Right now, this function is working for read access, but write access is still only a framework for the common mappers. The function returns the byte value at the given address. 134 BLOCK START - GamePak::Access 135 Checks for writes to the GamePak with mapper 7 136 BLOCK START - Write to Memory Mapper 7 137 Sets the memory bank pointers with a 32KiB window 138 Matches all nametable pointers to a selectable NRAM address based on 5th bit of an input value. This is likely to support nametable mirroring specific to mapper 7. 139 BLOCK END - Write to Memory Mapper 7 140 Checks for writes to the GamePak with mapper 2 141 BLOCK START - Write to Memory Mapper 2 142 Sets the memory bank pointers with a 16KiB window 143 BLOCK END - Write to Memory Mapper 2 144 Checks for writes to the GamePak with mapper 3 145 BLOCK START - Write to Memory Mapper 3 146 Calls access to the same address to set the value. For simulating a common bus conflict that exists in several mappers 147 Sets the memory bank pointers with an 8KiB window 148 BLOCK END - Write to Memory Mapper 3 149 Checks for writes to the GamePak with mapper 1. This is the most well-supported mapper in Bisqwit's emulator, but is still incomplete. 150 BLOCK START - Write to Memory Mapper 1 151 Declares and defines the four 5-bit registers used in the MMC1 chip. Register 0 asserts bits 2 and 3 which uses low PRG ROM range (0x8000-0xBFFF) and 16KiB bank switching. This is the reset state of MMC1 152 Checks bit 8 of the input value, which resets control registers and forces a reconfiguration jump to line 157 153 Caches the input value's bit in the counter slot 154 Checks if prefix incremented counter is 5 -- triggers every 5 write attempts. 155 BLOCK START - Mapper 1 configuration 156 Caches and writes the value to the a register based on the value of the top 2 address bits. This simulates associative memory by locking certain address to certain registers. The registers themselves are tied to specific memory blocks. 157 Conditional jump label 158 Clears the counter and the cache 159 Creates 4x4 array of pattern matching bits for nametable management and mirroring 160 Iterates through the nametable pointers and sets their target memory addresses 161 Sets CHR ROM pointer table (does this work yet?) 162 Sets CHR ROM pointer table (does this work yet?) 163 Checks the 2nd and 3rd bits of the control register. Bit 2 selects the memory range while bit 3 selects the memory width. 164 BLOCK START - Memory bank switching 165 If the control register is x00xx or x01xx.. 166 Map PRG ROM to a 32KiB bank starting at the low segment. Set value to the last register's bit 1,2,3 shifted down. (Not sure this is implemented) 167 End case 168 If the control register is x10xx.. 169 Map PRG ROM to 16KiB at the low address range 170 Map PRG ROM to 16KiB at the high address range 171 End case 172 If the control register is x11xx.. 173 Map PRG ROM to 16KiB at the low address range 174 Map PRG ROM to 16KiB at the high address range 175 End case 176 BLOCK END - Memory bank switching 177 BLOCK END - Mapper 1 configuration 178 BLOCK END - Write to Memory Mapper 1 179 If access is in the Save RAM region (0x6000-0x7FFF) then return byte value remapped within PRAM 180 Return PRG ROM mapped by the input address 181 BLOCK END - GamePak::Access 182 Initialize memory banks to valid pointer values 183 BLOCK START - GamePak::Init 184 Sets the CHR ROM Vbank pointers to a default value 185 Sets the PRG ROM bank pointers to a default value 186 BLOCK END - GamePak::Init 187 BLOCK END - namespace GamePak 188 BLANK CPU 189 Namespace change to CPU 190 BLOCK START - namespace CPU 191 Declares the standard 2KiB of NES RAM accessible by the CPU 192 Sets variables for managing CPU contexts and triggers: reset, non-maskable interrupt disable, edge state comparator, and current interrupt state. These are specifically used for interrupt control during Op loops and certain external events within the APU. Most of these variables are already part of processor status register (defined later on line 775). The difference is that the processor registers are (generally) controlled by the running program, and these are controlled by the kernel. 193 BLANK 194 Templated function declaration for accessing memory and defined later on line 742. Note the default argument used for value writes and ignored during reads. This access function will be wrapped on the next 2 lines. 195 CPU::RB reads a byte at a given address. This function is used during user program ops and wraps the MemAccess template. 196 CPU::WB writes the provided value to the given address, again by wrapping the MemAccess template function. 197 Declares a CPU tick, which is defined on line 667 198 BLOCK END - namespace CPU 199 BLANK PPU 200 Namespace change to PPU. 201 BLOCK START - namespace PPU 202 Union regtype definition / reg declaration that holds the fields for PPU registers at addresses 0x2000 through 0x2003. 203 BLOCK START - union regtype 204 u32 default union member initialized during startup 205 COMMENT 206 Declare RegBit accessors for all elements of PPUCTRL (0x2000), PPUMASK (0x2001), and PPUSTATUS (0x2002) 207 Declare RegBit accessors for name table addresses in PPUCTRL, greyscale toggle bit in PPUMASK, Sprite overflow flag in PPUSTATUS 208 Declare RegBit accessors for VRAM increment method in PPUCTRL, Left background toggle bit in PPUMASK, and sprite zero hit bit in PPUSTATUS 209 Declare RegBit accessors for the sprite pattern table address bit in PPUCTRL, Left sprite toggle bit in PPUMASK, and the vertical blanking detection bit in PPUSTATUS 210 Declare RegBit accessors for the background pattern table address bit in PPUCTRL, and the show background bit in PPUMASK. 211 Declare RegBit accessors for the sprite size flag in PPUCTRL, the show sprite bit in PPUMASK and the universal accessor for the OAMADDR register (0x2003). 212 Declare RegBit accessors for the master/slave bit in PPUCTRL, a 2-bit accessor for the background and sprites in PPUMASK, and the single page data offset in OAMADDR 213 Declare RegBit accessors for the NMI vertical blanking toggle in PPUCTRL, the emphasis RGB bits in PPUMASK, and the OAM page index in OAMADDR 214 BLOCK END - union regtype 215 COMMENT 216 Declares memory buffers for palette data and one OAM page 217 COMMENT 218 Defines the structure of a single OAM (sprite) then declares two sets of eight OAMs to buffer sprites for scanline processing. The 8 element limit mirrors the eight sprite per scanline limit of the NES. Struct members include the sprite OAM index reference, the top y position, the tile index, OAM attributes (palette, flip, etc), the x position, and the bitmap pattern. The first and last members of this structure don't make up th original OAM data and are used for quick reference during scanline processing 219 BLANK 220 Defines a union to manage access to scroll and VRAM writes. These bitfields amalgamate registers PPUSCROLL (0x2005) and PPUADDR (0x2006). Note that usage in this emulator doesn't directly match the ports bits in the NES because we have to store both writes separately for later reference. There isn't real hardware backing the register so we separate the bits as the PPU would internally. 221 BLOCK START - union scrolltype 222 Declares accessor to VRAM 223 Declares accessor to the left scroll position 224 Declares accessor to the fine x scroll position. This is the offset within a single tile. A tile is 8 pixels wide so we can address individual pixels with 3 fine bits. 225 Declares accessor to the coarse x scroll position. This identifies the starting tile at the left, when combined with the nametable address. 226 Declares accessor to the coarse y scroll position. 227 Declares accessor to both bits of the nametable index. 228 Declares accessor to the first nametable index bit, associated with horizontal scroll. 229 Declares accessor to the second nametable index bit, associated with vertical scroll. 230 Declares accessor to the fine y scroll position. 231 Declares accessor to the first VRAM write (0x2006) 232 Declares accessor to the second VRAM write (0x2006) 233 BLOCK END - union scrolltype 234 BLANK 235 Declares variables used in PPU output, including the pattern table address, indicies for the OAM the two OAM management structs, and temporary sprite reference 236 Declares 16 bit variables for the tile attributes, tile patterns, and addresses 237 Declares shift registers for the background patterns and attributes 238 BLANK 239 Declares and initializes internal control variables for scanline, boundaries, state, and counters. 240 Declares and initializes variables for internal communication including a byte buffer, a bus state, and a bus timer (for simulating contention) 241 Declares and initializs flags for managing the scanline width alternating between 340 and 341 and for toggling memory access for double-write registers 242 BLANK 243 COMMENT 244 PPU::mmap returns references to PPU memory elements by translating raw addresses in to the local memory structures 245 BLOCK START - PPU::mmap 246 Mask the lower 14-bits of the input memory address. Only 16KiB of the 64KiB of addressable PPU memory is physically backed. The apparent effect is that every 16KiB block of address space mirrors the base 16KiB. 247 Returns palette references that live between 0x3F00 and 3F1F. Any access above 0x3F1F is mirrored in to the 32 byte palette segment. 248 Returns references within the ROM pattern table, which occupies the lower 8KiB PPU memory. 249 The second element of the pattern table from the line above. Reads bytes within the target page 250 Returns a nametable reference by using bits 11, 12 (and 13?) as the index, then offseting the appropriate amount in to that table. The bit 13 mask here may be as issue because including it means that nametable mirror might not be in effect for ranges 0x3000-3EFF. 251 BLOCK END - PPU::mmap 252 COMMENT 253 PPU::Access handles reads and writes to the PPU registers memory mapped to addresses 0x2000-0x2008. This IO interface is normally called from MemAccess and wrapped by CPU::RB and CPU::WB. 254 BLOCK START - PPU::Access 255 We start off by defining a lambda function that returns its input value and sets the bus timer. This is used to retain bus values and set the bus decay timer to 77777, which ticks down for a bit less than an entire frame 256 Set the eventual result to the input value, which may be modified before return 257 If this is a write access, call the lambda function. This puts the value "on the line" so to speak 258 Check the port we're writing to. The input index is already masked to the eight possibl ports. 259 BLOCK START - switch on I/O to one of PPU's 8 MMIO registers 260 CASE: If writing to PPUCTRL (0x2000) then set the entire register to the input value. Update the PPUSCROLL register to reflect possibly new nametable selection bits from PPUCTRL 261 CASE: If writing to PPUMASK (0x2001) then set the entire register to the input value. 262 CASE: Block all writes to PPUSTATUS (0x2000) because this is a read-only register. 263 If we're reading then we need to capture the least 5 significant bits from the last write and the highest three status register bits. Only the top three bits have significance (short of hacks on retained values) 264 Reading the register resets the most significant bit (vertical blanking) 265 It also resets the address selection toggle used during writes to 0x2005 and 0x2006 266 Check if we're in the middle of vertical trace to start the next frame. 267 If so, reads also cancel the retrace. The effect here is that the next frame starts immediately. 268 End case for I/O on 0x2002 269 CASE: If we're writing to OAMADDR (0x2003) then the value represents the OAM (sprite) that we're interested in. This is often used for sprite management between scanline renders. It can also trigger DMA on 0x4014, if supported. 270 CASE: If writing to OAMDATA (0x2004), then set the immediate OAM RAM value to the input value and increment the start address within OAMADDR (0x2003) 271 Otherwise, we must be reading from OAMDATA so pull the data from the OAM through the bus. Bits 2, 3, and 4 do not exist in the PPU so those should always be masked away if we're access byte 2. 272 End case of read from OAMDATA 273 CASE: If accessing PPUSCROLL (0x2005), then block reads. 274 Since 0x2005 is a double write register, we need to check which write we're on. If this is second write then set the y position of the current sprite 275 Otherwise, this is the first write and we set the the x position (coarse and fine) 276 Toggle the latch to finish the final write. 277 End case of write to PPUSCROLL 278 CASE: If accessing PPUADDR (0x2006) then block reads. 279 Otherwise, we want to write a start address (or single target address) to this register. First we check the memory target latch to see if this is the second write. If so, then we write the target low address 280 Otherwise, we write the target high address first, masking out the top two bits 281 Toggle the latch to prepare for the second write 282 End the case 283 CASE: Accessing PPUDATA (0x2007) 284 The result is preemptively set to the contents of an internally preserved read buffer. 285 We get the reference pointed to by the current VRAM address set during the double-write to (0x2006) 286 If we're writing, we overwrite our result with the input value 287 Otherwise we're reading and we want to check if we're accessing the palette 288 If it is the palette, then we want to pull the result from the read buffer. Note that the first read would include the previous addressed value because on the next line... 289 Set the read buffer to the target address. You may have to double read to get the desired result 290 In all cases, we want to pull the read/written result through the bus. 291 Check bit 2 of PPUCTRL to add 1 or 32 to the address register. Adding 1 references the next tile to the right while 32 references the tile one row down. 292 End case for accessing PPUDATA (0x2007) 293 BLOCK END - switch on I/O to one of PPU's 8 MMIO registers 294 Return the result of our port I/O 295 BLOCK END - PPU:Access REFERENCE FOR PPU RENDERING: Thisis fairly complicated so it's best to follow along the code with a more descriptive reference of what's going on. For instance: https://wiki.nesdev.com/w/index.php/PPU_rendering 296 PPU::rendering_tick manages the pixel pipeline which includes loading tiles, loading attributes, then combining and prioritizing sprites to choose an output pixel. Decisions only change every 8 pixels so it interleves various tasks based on current rendering position on the scanline. This function is called from the main PPU::tick when background or sprite processing is enabled and the current scanline is visible. Function entry point from line 482. 297 BLOCK START - PPU:rendering_tick 298 Sets a boolean flag that is asserted when the PPU in positions 0-255 or 320-335. We'll need to know if we're in this range for several operations below. This check is done by using a single bit to mask a field by shifting on a 16 unit basis. If 10FFFF = 0001 0000 1111 1111 1111 1111 then a bit left shifted from position 0 every 16 pixels will be true in this desired range. We use the 'u' integer suffix to avoid complications with sign bits during shifting 299 BLANK 300 COMMENT 301 Switches on the PPU rendering x position to interleave rendering tasks across an eight pixel span. 302 BLOCK START - Switch on every 8 pixels 303 CASE: If we're on the 3rd pixel (from 0) then we should point to the proper attribute table at the end of each nametable. This is where the palette data is stored. 304 The first attribute table starts at 0x23C0 and each subsequent table is 0x400 bytes later (the size of a nametable). We then offset in to the attribute table by the X and Y. We extract the coarse positions by right shifting the overall X & Y positions by 2 (dividing by 4). The Y position is multiplied by 8 since that's the width of the row. 305 If we're currently in the rendering phase or next line prep phase then we're finished. Otherwise we fall through to the next case. 306 CASE: If we're on the first pixel then we should point the nametable. 307 We offset from the base 0x2000 to by the current location 308 COMMENT 309 If this is the very first pixel, then reset the sprite data to start fresh, including the OAM address if sprites are active 310 If background is deactivated then end the case 311 COMMENT 312 On the first scanline, set the PPU VRAM to match the current scroll address. This ensures the screen is scroll aligned on every frame. 313 At the end of the visible line then we update tile positions. 314 Also update the horizontal nametable 315 Reset the sprite rendering index 316 End the case for the first pixel 317 CASE: On the 3rd pixel 318 If we're on the pre-render scanline with active backgrounds and parity asserted, then the scanline should end after 340 pixels 319 COMMENT 320 Fine the current pattern table position by selecting the background (pattern) table base and right shift the current position by 4 before adding back the vertical fine position. 321 If we're not currenting rendering or fetching the next line then end the case 322 COMMENT 323 COMMENT 324 Loads the current tile's pattern bit in to a shift register for later pixel color calculation 325 Loads the current tile's attributes in to a shift register 326 End the case 327 If we're accessing the 4th pixel on the scan line... 328 COMMENT 329 And if we're currently rendering or preparing to render the next line then we need to update the current attributes and possibly move to the next tile 330 BLOCK START - Update attributes and tile coarse position 331 Pull the tile attributes for the current tile. ioaddr is already pointed to the attribute tile, so we just need to shift that address to match the attribute tile index. For example: A PPU (x,y) of 155,34 would be at ioaddr 0x23cd. 243,100 would be at 0x27d8. 51,107 would be at 0x23da. We can't generalize what mmap(ioaddr) would return because it's different depending on the game tile. But depending on the quadrant, we would end up shifting that value by 0, 2, 4, or 6 bytes then mask it to produce an attribute value between 0-3 332 COMMENT 333 If the xcoarse overflows, then we're on the next tile so recalculate the nametable 334 COMMENT 335 Check if we're at the edge of the line and yfine overflows. ycoarse doesn't use all 3 bit values so check the final at 30. 336 Reset the ycoarse value and recalculate vertical nametable 337 BLOCK END - Update attributes and tile coarse position 338 If we're not currently rendering then check if we have more sprites yet to be rendered instead of background pixels. If there are still sprites to render, then we should pre-fetch sprites for the next scanline. 339 BLOCK START - Processing sprite pixels 340 COMMENT 341 Define a pointer to the next available rendering slot in OAM3 - the most forward point in our sprite rendering pipeline 342 Transfer the sprite from the same slot in OAM2. OAM2 contents have already been vetted by upcoming code starting at line 366 343 Define a variable showing how far vertically we've progressed in this sprite's drawing. Intuitively a negative result means we aren't ready to draw and a result greater than the sprite size flag (8 or 16) means the sprite has already finished. In reality, only valid results should appear because non-candidate sprites wouldn't have entered the pipeline due to checks on line 378 344 If the sprite is vertically flipped then invert the value of the y attribute within the mask basis. For those who don't know this trick, XORing a bitfield will invert the value within the mask degree. For example: if we XOR a decimal value of 2 with a mask of 3 bits then the result becomes 5. If the decimal value was 185, it becomes 190. 345 Now we find the pattern table entry in three steps, which vary depending on the chosen sprite size. The large sprite size uses the first index bit to choose between the two main pattern tables (0x0000 or 0x1000). The small uses bit 3 of PPUCTRL (accessed as SPaddr here) 346 Now we go to the actual tile offset in the table by masking out the tile index bits. Again, large sprites ignore the least significant bits. 347 Finally, we find the row within the tile and choose the plane. If the last few lines were confusing, we're simply trying to break apart the pattern table as described here: https://wiki.nesdev.com/w/index.php/PPU_pattern_tables 348 BLOCK END - Processing sprite pixels 349 End the case on choosing BG attributes and SP patterns 350 COMMENT 351 CASE: Choosing the pattern table for the tile. This could either be from the background on line 320 or the sprite on line 347. 352 Send the pattern address to mmap to return the pattern value from the gamepak CHR ROM. 353 End case for selecting a pattern table. 354 CASE: Set up the OAM pattern so that the two byte planes can be read in sequentually during rendering. Recall that we have to bounce back and forth between two bytes to determine the final pixel color. This routine allows us to simply shift one and mask on line 422. 355 Combine the two separate pattern bytes 356 Swap the middle two nibbles 357 Keep 1st, 4th, 5th, and 8th bit pairs and swap remaining pair neighbors 358 Keep nibble bookends and swap central bits 359 Save our work: 0123 4567 89AB CDEF ---> 0819 2A3B 4C5D 6E7F Now each bit can be read sequentually with it's pair that used to be 8 bytes later. 360 COMMENT 361 Check that we're not current rendering and we still have sprites to process 362 Set the final output OAM pattern to the newly calculated value and increment the render count index 363 364 BLOCK END - Switch on every 8 pixels 365 COMMENT 366 Bring sprites in to the rendering pipeline. Check the current line position to and a masked & increment OAMaddr to create a control sequence. Note that this is actually a multipurpose mechanism used to not only determine the action, but to populate a temporary OAM struct below. Case 0 populates the y element, case 1 on the next byte populates the index, 2 is the attr, etc. The struct was define back on line 218. 367 BLOCK START - Switching on line position for sprite pipeline management 368 CASE: Focus on the next sprite to consider 369 COMMENT 370 Set the temporary sprite to the current OAM for consideration to enter the rendering pipeline 371 End default case 372 CASE: Test current sprite for validity 373 If we've tested all the sprites then reset the target register to begin again and end the case 374 Advance our index in to the OAM table 375 If we still have room in our pipeline (8 sprites max), then set the next available slot to the temporary slot's y. Note that since we're on case 0, sprtemp should be pointing to the y element in the rom OAM. 376 If we still have room in the pipeline, then set the sprite index to the masked OAMaddr (OAMindex) 377 Begin a code block for testing candidate sprite's y positions. Set the first temporary y value to the current set another temp y value for a size equal to the expected sprite size via PPUCTRL setting. 378 Check if the the current scanline does not fall within the valid range 379 Skip by setting OAMaddr to the next sprite address, note that sprite index 2 hardcodes its failure case memory address. (TODO: why? I remember this being an issue...like a hidden or unusable sprite in certain emulations) 380 End case for OAM testing 381 CASE: We have a valid sprite, continue populating OAM pipeline 382 Set the index byte to the index byte now pointed to by sprtmp 383 End case 384 CASE: We have a valid sprite, continue populating OAM pip 385 Set the attribute table reference to the sprtmp reference 386 End case 387 CASE: Finalize processing new entry in to the sprite render pipeline 388 Match the x member from original OAM to pipeline OAM 389 Test for 8-sprite limit and set overflow if required 390 Again, hardcode the sprite index 2 subsequent address 391 End case 392 BLOCK END - Switching on line position for sprite pipeline management 393 BLOCK END - PPU::rendering_tick 394 PPU::render_pixel works out the actual pixel values for backgrounds and sprites and sends them to IO::PutPixel to be drawn to our SDL-implemented backbuffer 395 BLOCK START - PPU::render_pixel 396 Sets a variable to true when we're rendering within 8 pixels of the horizonal screen edge on both sides. (250+8 == 2 == true) 397 Tests if rendering for backgrounds is active. Recall that PPUMASK has a bit specifically for drawing borders. We have to make sure that BGs are enabled overall as well as during the edge case. 398 Same test for rendering sprites as the line above for backgrounds. 399 BLANK 400 COMMENT 401 Sets the position within a tile. Recall that if the screen starts partially within a tile, then we need to offset each tile in the row. The tile boundaries no longer occur at x positions 0, 8, 16, etc. We have the xfine attribute to help us with that. Get xfine and then calculate the shift required in to the pattern shift register from line 324. 402 BLANK 403 Sets temporary variables for the pattern and attributes that we'll calculate shortly. 404 Check if background drawing is enabled 405 BLOCK START - Calculate background 406 Shift the save pattern by 2 bits per pixel that we're into the target tile. Mask the lower 3 bits to determine the color. 0 means transparent 407 Do the same shift procedure to the attribute shift registers to isolate the quadrant 408 BLOCK END - Calculate background 409 If backgrounds aren't enabled then check if VRAM is pointed to the palette and that bacgrounds are intentionally off 410 If so, then set the pixel to the target palette...possibly the universal background color. 411 BLANK 412 COMMENT 413 Check if sprite drawing is enabled 414 If enabled, loop through all of the sprites processed for rendering on the scanline. (At most 8) 415 BLOCK START - Scanline rendering loop 416 Create a pointer to the current OAM object 417 COMMENT 418 Compare the current x location with the sprite's starting x location 419 If we're not within eight pixels, then skip this sprite (skips loop iteration) 420 COMMENT 421 Otherwise, check if the sprite is inverted and flip relative position along x. 422 Get a temporary pixel by shifting to the proper position in the sprite's pattern table and mask for the correct pixel value. 423 Pixel values of 0 mean transparent so skip this sprite for now 424 COMMENT 425 If we're about to draw the first sprite on the frame, then trigger the PPU register indicating that we're now drawing (Sprite 0 hit) 426 COMMENT 427 Check that the sprite priority is clear and that there is a visible pixel 428 BLOCK START - Set the pixel to draw 429 Mask the OAM attributes for palette data and add 4 to push them in to a valid range of 4 to 7 430 Confirm that our temporary pixel is the final pixel 431 BLOCK END - Set the pixel to draw 432 COMMENT 433 We want the first valid sprite to be drawn. The OAM prioritizes as first-come first-served 434 BLOCK END - Scanline rendering loop 435 Index the current pixel with the palette using the palette index and pixel value. If PPUMASK is set to greyscale then mask out the lower 4 hue bits 436 Send the pixel to IO::PutPixel along with the emphasis bits. See line 49 437 BLOCK END - PPU::render_pixel 438 BLANK 439 COMMENT 440 COMMENT 441 COMMENT 442 COMMENT 443 COMMENT 444 COMMENT 445 COMMENT 446 COMMENT 447 COMMENT 448 COMMENT 449 COMMENT 450 COMMENT 451 COMMENT 452 COMMENT 453 COMMENT 454 COMMENT 455 COMMENT 456 COMMENT 457 COMMENT 458 COMMENT 459 COMMENT 460 COMMENT 461 COMMENT 462 COMMENT 463 COMMENT 464 COMMENT 465 COMMENT 466 PPU::tick is the main loop of the PPU. It manages all of the drawing actions as we step through each scanline. The general flow is to check the scanline position and perform necessary work. The type of work changes based on certain states like vertical blanking and position within the scanline. 467 BLOCK START - PPU::tick 468 COMMENT 469 Check the current vertical blanking state 470 BLOCK START - Switch on vertical blanking state 471 CASE: If we've just started a new frame (top left of the screen), then we clear the PPUSTATUS register. 472 CASE: If we're currently vertically blanking (current visible frame is finished), then set the status register to reflect it 473 CASE: If we're rendering normally, then we have to check if we've just entered vertical blanking and trigger the CPU interrupt accordingly 474 BLOCK END - Switch on vertical blanking state 475 Adjust the vertical blanking state if we're outside of regular drawing mode. This inserts some delay when mode swapping. 476 Decrements the bus timer, which eventually clears the bus. Recall that ever IO access "holds" the last transmitted variable to simulate contention. 477 BLANK 478 COMMENT 479 Check if we're rendering the current frame 480 BLOCK START - Rendering the current frame 481 COMMENT 482 If rendering is currently active for either the background or sprites, then process scanline updates in PPU::rendering_tick 483 If we're in the visible part of the scanline then render the current pixel in PPU::PutPixel 484 BLOCK END - Rendering the current frame 485 BLANK 486 COMMENT 487 Increment the cycle counter modulo 4. This is used as the offset value in PutPixel for choosing from phase-shifted palette tables 488 If we're reached the end of the scanline... 489 BLOCK START - End of scanline 490 COMMENT 491 Flushes scanline. If te visible frame just completed then backbuffer is swapped to front in SDL 492 Sets scanline variable to maximum value 493 Resets PPU x position to 0 to prepare for the next scanline 494 COMMENT 495 Checks (and increments) the scanline for lines that require special procedures 496 BLOCK START - Switch on special scanlines 497 CASE: We're on the very last scanline 498 Set the scanline value to the pre-render line (-1) 499 Toggle the line parity 500 COMMENT 501 Reset the blanking state to indicate vertical retrace that ultimatly clears and readies us for a new frame 502 End case for last scanline 503 CASE: The final visible frame is drawn and we're now vertical blanking 504 COMMENT 505 COMMENT 506 COMMENT 507 Opens the file that this emulator uses to queue user input commands 508 If the file is ready to go... 509 BLOCK START - Handle valid input file 510 Sets a variable to hold control state 511 Check if we're at the start of the user input file 512 BLOCK START - Handle beginning of user input file 513 If we're on the first frame, then seek to byte 5. We need to get the control state stored there. 514 Read in byte 5 to the control state variable 515 Move file pointer to the beginning of input stream at byte 0x90 516 BLOCK END - Handle beginning of user input file 517 If Player 1 is active, then read in the next input byte. If we've reached the end of the file then there is no input. 518 If Player 2 is active, then read in the next input byte. If we've reached the end of the file then there is no input. 519 BLOCK END - Handle valid input file 520 COMMENT 521 Set the Vertical Blanking State variable to indicate that we are now blanking. 522 BLOCK END - Switch on special scanlines 523 BLOCK END - End of scanline 524 BLOCK END - PPU::tick 525 BLOCK END - namespace PPU 526 BLANK APU 527 Namespace change to APU 528 BLOCK START - namespace APU 529 The APU defines LengthCounters as a set of constant values indexed by 5 bits. Audio channels set their own length counters based off of these values and count down each tick. Channels go silent when the length counter expires. 530 The second set of length counter values 531 The APU defines 16 Noise Periods used in the noise channel to set frequencies. 532 The APU defines 16 Delta Modulation periods that affect channel frequency. 533 BLANK 534 Declares and initializes flags for the five-cycle divider, interrupt management, and five channels status flags 535 Declares and intializes values for a periodic interrupt and the DMC channel interrupt. 536 Defines a count function that decrements a value, possibly resetting it if it falls below zero. count returns true if it did reset 537 BLANK 538 Defines a generic audio channel. The APU has five of these 539 BLOCK START - struct channel definition 540 Each channel has a length counter to control length of an audio sequence, a linear counter for triangle waves, an address for memory IO, and an envelope variable for volume control 541 Each channel also defines audio specific variables including a sweep delay for recalculation timing, an envelope delay for decay control, a hold variable that has two uses for the noise channel and the DMC channel, a phase which cycles through channel sample points, and finally a level variable that holds the output voltage sample 542 Define a union for APU registers using RegBit accessors just like the PPU 543 BLOCK START - Union reg definition for APU registers 544 COMMENT 545 Declares RegBit accessors for three 32-registers that amalgamate the 8-bit APU registers at the follow addresses: 0x4000-0x4002, 0x4004-0x4006, 0x400A, 0x400C, 0x400E, 0x4012, and 0x4013 546 Declares RegBit accessors for the duty cycle, the sweep shift, and the noise frequency. 547 Declares RegBit accessors for the envelope decay enable, the sweep direction, and the noise mode 548 Declares RegBit accessors for envelope decay rate, the sweep period, and the wavelength representation (used differently in multiple channels) 549 Declares RegBit accessors for envelope decay loop enable, and the sweep enable 550 Declares RegBit accessors for constant volume, Pulse code sample length, and declares a 4th 32-bit register that combines APU registers at 0x4003, 0x4007, 0x400B, 0x400F, and 0x4010 551 Declares RegBit accessors for disabling length counters, and the length counter init index 552 Declares RegBit accessors for the linear counter Init value, and the channel loop bit 553 Declares RegBit accessors for disabling the linear counter decrement, and the IRQ enable bit 554 BLOCK END - union reg definition for APU registers 555 BLANK 556 COMMENT 557 Function template for tick based on the target channel number (0,1 = Pulse channel/square 2 = Triangle wave, 3 = Noise, 4 = Delta Modulation) 558 APU::channel::tick defines channel specific actions that happen during each APU tick. This is invoked within APU::tick by the macro defined on line 714. IMPORTANT: The return value of this tick function is a voltage reference out of the channel at the time. The values are normalized at the end of the APU::tick before being sent to the linux aplay output 559 BLOCK START - APU::channel::tick 560 Defines a reference to the current channel 561 Checks if the channel is enabled and returns the default silence level, which is 8 for all channels except the DMC, which is 64. 562 Defines a temporary wavelength holder based on WaveLength register and channel type 563 The noise channel wavelength is taken directly from the constant NoisePeriods indexed by the NoiseFreq register 564 The volume is based on both the length_counter, the fixed volume control, and the current volume. A null length counter means the channel is silence. The fixed volume is always set if the decay is disabled, otherwise use the envelope volume, which should match the prior tick 565 COMMENT 566 Define a variable referencing the channel signal level. This variable will be manipulated during the tick and the value retained as the original level variable between ticks 567 Returns the current signal level unless the wave counter is null. The counter is reset to the temporary wavelength and channel updating proceeds 568 Switch the tick update based on channel 569 BLOCK START - Switch on channel type 570 CASE: The Pulse channel / square wave 571 Returns channel silence if the channel if too silent 572 Masks a waveform based on the phase and the duty cycle. If the masked bit returns asserted, then output the channel volume, otherwise output nothing. This waveform appears to be an analog of 3-bit PCM spread across 32-bits. Of the four duty cycle choices, the output time is roughly (0%, 25%, 50%, and 100%) 573 BLANK 574 CASE: The triangle wave generator 575 If the length counter and linear counter are active within a meaningful waveform present, then increment the phase. Return 576 Uses bit twiddling to turn a counter (phase) in to a stepper. At phase zero, output is 0 and increases one step for every phase until 15. Then it returns to zero in single steps. 577 BLANK 578 CASE: The Linear feedback shift register. This function may not seem clear at first, but the hardware operation is described here: https://wiki.nesdev.com/w/index.php/APU_Noise 579 If bit 0 (hold variable) is not set, then assert it 580 Right shift hold and set it with the... 581 ..XOR'd bit zero or bit 6 depending on the mode flag and an asserted bit 14 582 If bit 0 is set, then return 0, otherwise return the current envelope 583 BLANK 584 CASE: The delta modulation channel - note the overloaded use of variables used differently in other channels 585 COMMENT 586 If the sample buffer is empty... 587 BLOCK START - Empty DMC buffer 588 Check if the channel has no data and see if loop is enabled 589 BLOCK START - Loop the delta modulation channel 590 The counter resets to the preset length in the address select register 591 The address pulls from the sample address 592 BLOCK END - Loop the delta modulation channel 593 If the channel is still in normal operation.. 594 BLOCK START - Load the next set of data 595 COMMENT 596 COMMENT 597 COMMENT 598 Checks how far we've progressed with the current buffer 599 Discards bytes that we won't need 600 Reads in another byte of data and asserts byte 14 601 Resets the phase 602 Decrements the length counter 603 BLOCK END - Load the next set of data 604 Otherwise, we're out of data and the channel should find data or go silent 605 Issues interrupt if possible, otherwise the channel is disabled 606 BLOCK END - Empty DMC buffer 607 Checks if we're in the process of handling channel data 608 BLOCK START - Update output based on current buffer data 609 Sets a temporary variable to the current (last) value of the linear counter 610 Mask the current data bit after decrementing phase. The change is +/- 2 depending on the bit 611 Verifies linear counter (sample) data and sets to updated value 612 BLOCK END - Update output based on current buffer data 613 Returns the current sample as held in the linear counter 614 BLOCK END - Switch on channel type 615 BLOCK END - APU::channel::tick 616 BLOCK END - struct channel definition 617 BLANK 618 Defines a struct with two separate counters internal to the APU that are used to manage certain events. The APU needs to trigger events at certain timers per second and we can also select the timer to operate in a 4-step or 5-step mode. 619 BLANK 620 APU::Write handles values written to the APU memory to addresses 0x4001-0x4017 excluding 0x4014 621 BLOCK START - APU::Write 622 Set a reference to the input channel based on input address 623 Switch on writes to APU registers 624 BLOCK START - Switch on input APU address 625 CASE: If writing to register 0 and the linear counter is disabled, then write the first 7 bits of that value to the channel's linear counter. Also, set register 0 to that value, which controls the pulse/square channel settings 626 CASE: If writing to register 1, then write the input value to register 1. This controls sweep options. Update the local sweep rate variable with the newly set register sweep rate 627 CASE: If writing to register 2 then update that register's values. 628 CASE: If writing to register 3... 629 Then update that register's values. 630 IF the target channel (based on input address right shifted by 2) is enaled... 631 Update the target's channel length counter using the register settings and constant lookup. 632 Reset the local linear with the regsiter's new value 633 Resets the local envelope delay variable with source the register's value 634 Resets the local envelope with the initial value 635 If the target address is below 0x4008 then reset the duty cycle 636 End case for register 3 637 If writing to register 0x4010 then set the input value to register 3. Update the WaveLength register with the frequency index using the input value's lower 4 bytes 638 If writing to 0x4012 (sample address) then update register 0 with the value. Then set our DMC sample address by shifting the new register 0 value in to the range expected, with top two bits asserted (...| 0x300 does this prior to shift). 639 If writing to 0x4013 (sample length) then we're intending to set the DMC sample length. Set register 1 with the new value. Then update the channel length counter 640 If writing to 0x4011 we're trying to directly load the DMC output. This is a 7-bit value so mask the local variable appropriately. 641 If writing to the 0x4015 status register... 642 Loop through all the channels 643 Set channel enable bits to match the input value's bit position. 644 Loop through the channels again 645 If any channels are not active.. 646 Disable the channel by setting the length_counter to 0 647 For the DMC channel, it may be active with a length counter of 0, if so... 648 Reset it to the sample length register's value, which is normally value * 16 + 1 649 End case for writing to 0x4015 650 If writing to the 0x4017 Frame counter register... 651 Match bit 6 for interrupt disable setting 652 Match bit 7 for the step cycle selection 653 Now that timing has potentially changed, reset our internal APU timer 654 If we've just disabled interrupts then roll the change to other interrupt flags (DMC_IRQ) 655 BLOCK END - Switch on input APU address 656 BLOCK END - APU::Write 657 APU::Read returns a byte that indicates the APU status 658 BLOCK START - APU::Read 659 Declares and defines a null status byte 660 Sets the lowest 4 bytes of the status with the output status of each channel 661 Sets the penultimate bit with the periodic IRQ status 662 Sets the final bit with the DMC IRQ status 663 Unsets the CPU interrupt status 664 Returns the APU status 665 BLOCK END - APU::Read 666 BLANK 667 Processes the APU once per CPU tick. Called from CPU::tick on line 739 668 BLOCK START - APU::tick 669 COMMENT 670 Increments a counter at a rate of 240 Hz. Since the CPU clock and the timer don't divide evenly, we have to count in of steps of the lowest common denominator. We check if we've hit the 240Hz tick threshold 671 BLOCK START - Timer-based APU events 672 Restart the clock 673 Increment our other timer and reset if it's ticked to the step-counter rate. The idea here is that our 240 Hz timer can be divided by 4 for 60Hz or 5 for 48Hz. Most events in the APU are triggered off of some combination of these cycles. 60Hz can be used as 120 and 48 can be used as 96, 192, etc. Hence the "FullTick" and "HalfTick" concepts below 674 BLANK 675 COMMENT 676 Check interrupt enable status, verify that we're set for 4 steps and check if our slow counter just reset, if so... 677 Issue an interrupt to the local APU and the CPU 678 BLANK 679 COMMENT 680 Define the half tick and full tick boolean 681 Loop through all the channels and check against the tick flags 682 BLOCK START - Check channels for tick events 683 Set reference to the current channel 684 Set a local wavelength variable from the source register 685 BLANK 686 COMMENT 687 If a halftick has occurred and the channel is active... 688 ...and the counters are still enabled.. 689 Then decrement the channel length counter 690 BLANK 691 COMMENT 692 If a half tick has occurred and we're looking at the pulse/square channels and the sweep delay has expired... 693 ...And the channel is sweeping 694 BLOCK START - Reset channel sweep 695 Define the temporary signal as the shifted wavelength based on the sweep shift setting. Also define a complement array for the shift effects 696 Change the temporary wavelength based on the shift amount and shift mode 697 Updat the source register with the new value 698 BLOCK END - Reset channel sweep 699 BLANK 700 COMMENT 701 If a fulltick has occurred and we're on the triangle channel then we update the linear counter 702 Check the counter diable register.. 703 Reset the linear counter to either the init value... 704 decrement it, or set it to zero 705 BLANK 706 COMMENT 707 If a fulltick has occurred on the pulse or noise channel then we need to tick and update the envelope 708 Check if the envelope is positive and active 709 Decrement it with a 4-bit mask 710 BLOCK END - Check channels for tick events 711 BLOCK END - Timer-based APU events 712 BLANK 713 COMMENT 714 Defines a macro that expands the s() function to return the current output level of the channel. Channel 1 simply resamples channel 0 (?) 715 Defines a lambda function to calculate and combine channel outputs. The output is normalized over the 2nd argument and any div/0 errors output the default value provided by the 3rd argument 716 The output of our mixer is a signed 16-bit value (-32768 - 32767). We start with a maximum value of 30000 and then scale it with a normalized value between -1.0 to +1.0 which representes the mixed channels. 717 This line balances the first two (pulse/square) channels including protection for dead channels avoiding div/0 errors, etc. The channel outputs are used to normalized the top level function when we go to the next line 718 This line samples and outputs the contribution from the triangle/noise/DMC channels. From what I can tell, the specific float factors used to mix the channel take in to account the average RMS of that channel type. These factors are probably selected to avoid sample clipping. A lot of these details really depend on what the linux tools resample/aplay expect. I'm not an expert with that so these are educated guesses. 719 Back the output off by a factor of 25% 720 End sample calculation 721 Undefines the sample macro to avoid potential problems elsewhere 722 COMMENT 723 COMMENT 724 COMMENT 725 COMMENT - Uncomment this to disable sound. MUST DO THIS for windows 726 Opens a file pointer to the resample program piped to aplay -- cute solution but non-portable 727 Outputs the first 8-bits of our mixed channel 728 Outputs the second 8-bits of our mixed channel by right shifting by 8 (divide by 256) 729 BLOCK END - APU::tick 730 BLOCK END - namespace APU 731 BLANK 732 Namespace change to CPU 733 BLOCK START - namespace CPU 734 Defines procedure for actions that happen outside the CPU during every CPU tick. 735 BLOCK START - CPU::tick 736 COMMENT 737 The PPU processes three times for every CPU tick. 738 COMMENT 739 The APU processes once for every CPU tick. 740 BLOCK END - CPU::tick 741 BLANK 742 CPU::MemAccess template function that handles byte-level reads and writes. Much of the work here is to process memory mappings and call the appropriate namespace read/write functions with the modified address. Input arguments include a 16-bit CPU-mapped address and an input value for writing. 743 BLOCK START - CPU::MemAccess 744 COMMENT 745 During reset state (ex. the first CPU cycle), all calls to MemAccess are treated as reads. 746 BLANK 747 Process a CPU background tick -- The PPU ticks 3x and the APU ticks 1x. 748 COMMENT 749 If the memory address is less than 0x2000 (8KiB), then we're interested in the NES RAM, which occupies the lower 0x800 (2KiB) of the address space. Get a reference to the target address in RAM. For read operations, return the reference. For writes, store the input value at that address. 750 If the memory access is between 0x2000 and 0x4000 (8KiB and 16KiB), then forward the memory access request to the PPU via PPU::Access and mask the lower 3 bits. Recall that PPU uses 8 ports from 0x2000 to 0x2008, so we only need those three bits to map correctly. 751 If the memory access is between 0x4000 and 0x4018, then we're dealing with I/O for DMA functions for either PPU, APU, or Joystick. 752 Mask the lower 5 bits and compare to find appropriate DMA handler 753 BLOCK START - switching for DMA memory access cases 754 CASE: If accessing CPU memory at 0x4014 then 755 Writing to 0x4014 initiates CPU-blocking 256-byte block transfer of OAM Data from one of the eight 256-byte RAM pages indexed by v. The data is written to the OAMDATA port 0x2004. This actual write is handled in PPU::Access on line 270 756 return 0 from case 757 CASE: If accessing CPU memory at 0x4015 then reads will return the current APU status (APU::Read line 657) or writes will affect the channels (APU::Write line 641) 758 CASE: If accessing CPU memory at 0x4016 then reads will get the last state of the Player 1 controller (IO::JoyRead on line 103), while writes will strobe both players if v is positive (See IO::JoyStrobe on line 98) 759 CASE: If accessing CPU memory at 0x4017 then reads will read the state of the Player 2 controller (IO::JoyRead on line 103). Writes do nothing. 760 CASE: All other cases between 0x4000 and 0x4018 are writes to various APU component registers such as pulse, triangle, noise, and delta modulation (See APU::Write on line 620). 761 762 BLOCK END - switching for DMA memory access cases 763 All memory accesses above 0x4018 must be read/writes to GamePak memory (PRG/CHR ROM via GamePak::Access). 764 All other memory accesses return 0 (code probably not reached) 765 BLOCK END - CPU::MemAccess 766 BLANK 767 COMMENT 768 Initialize the CPU Program Counter (PC) register to 0xC000. This is the beginning of the code segment (PRG ROM). 0x8000 could also be the beginning of PRG ROM with 0xC000 being the mirror. Advanced mappers may translate these addresses to other PRG ROM banks. 769 Initialize the accumulator (A), Index X(X), Index Y(Y), and Stack Pointer registers to 0. 770 Define a union for the Processor Status register (P) 771 BLOCK START - union status register 772 Defines variable for direct access to the entire register 773 RegBit accessor to the Carry flag. Functions like most CPU carry flags, by asserting after an operation that resulting in positive overflows from bit 7 or negative overflows from bit 0. Carry ~= unsigned overflow 774 RegBit accessor to the Zero flag. Assets when any operation results in a zero result 775 RegBit accessor to the Interrupt flag. When asserted, NES won't respond to interrupts. 776 RegBit accessor to the Decimal mode flag. Unused. 777 COMMENT 778 RegBit accessor to the Overflow flag. Asserts after an operation that resulting in overflows from bit 6 to bit 7. Overflow ~= signed positive overflow 779 RegBit accessor to the Negative flag. A sign indicator for the last instruction result. 0 is positive, 1 is negative. 780 BLOCK END - union status register 781 BLANK 782 CPU::wrap function ensures that only the lower byte changes within a multi-byte input address. Effectively, this confines input addresses within the same memory page (256-byte). 783 CPU::Misfire is a wrapper for reading an address using the CPU::wrap protection above. NOTE ON CPU STACK OPERATIONS This stack implementation is limited to one page of memory (256 bytes). While this technically means that it won't overflow, it behaves like a ring buffer and will smash itself after 256 open Pushes. The stack pointer always refers to the next uninitialized byte to be used. 784 CPU::Pop implements the classic stack pop by using the stack pointer register as an offset in to the 2nd page of NES RAM (bytes 0x100-0x1FF). Pop increments the stack pointer to the last used byte and returns the value stored there. 785 CPU:Push stores an input value at the byte pointed to by the stack pointer. It then decrements the stack pointer. 786 BLANK 787 Templates the upcoming Ins() function based on input op code NOTE ON THE OP DECODE MATRIX: This is the most discussed part of the Bisqwit NES emulator because it looks like something you would see from an automated code obfuscation tool. This method is the natural consequence of this question: "How can we compress the 256 NES operations in to 100 lines of code to quickly present it on YouTube?" Bisqwit answered this question, and it's very likely that you (the reader) will never need to do the same in your programming career. So to beginners, I would say to simply not worry about this specific method, and focus on semantics. The op decode matrix is more of a case study in applied information theory rather than programming. It wouldn't fly far in a production environment because team members would be screaming "readability refactor!". But for those seeking enlightenment, Bisqwit himself discusses the method starting halfway through this video: http://youtu.be/QIUVSD3yqqE An important concept buried in here is the idea that CPU instructions aggregate very low level tasks, usually following the sequence: fetch, operate, store. Bisqwit decomposed the 256 CPU ops in to 56 tasks you see prefexed with t(). The key is that each t() line does NOT represent a CPU operation; instead, each CPU operation executes multiple t() functions based on bitfield matches in augmented BASE64. Tasks are ordered roughly the same as in an actual CPU pipeline. Example: Op code 0xAA is TAX - Transfer A to X. TAX hits 5 tasks from the decode matrix: Line 818 -> t &= A; Store register A in to temp operand t Line 839 -> tick(); NOP Line 864 -> X = t; Set register X to operand t Line 869 -> P.N = t &= 0x80; Update flags negative bit to result sign bit Line 870 -> P.Z = u8(t) == 0 Update flags register for zero result In the future, I will try to re-encode this matrix in to lower entropy forms that make more sense to the reader. Binary form is quite easy to understand and retains the decoding method, but produces very wide lines. Easier still is the "switch on op code" strategy - easy to approach, but very redundant and replaces this method entirely. Here is an example of "switch on op" from a production NES emulator: https://sourceforge.net/p/fceultra/code/HEAD/tree/fceu/trunk/src/ops.inc Links for the reworked matrix: 788 CPU::Ins executes the relevant procedures required for the input op code. 789 BLOCK START - CPU::Ins 790 COMMENT 791 COMMENT 792 Sets up the default evaluation environment for Op exeuction. These are temporary variables to hold intermediate values. addr = no initial address. d = no address offset. t = fully maskable operand. c = null backup/offset operand, pbits = 0x[23]0. pbits mask the flags (P) register during certain operations. The core issue is controlling the mask of bits 4 and 5 depending on interrupt op status. (Research BRK, PHP, /IRQ, and /NMI instructions for more information) 793 BLANK 794 COMMENT 795 COMMENT 796 Sets untagged enum to force compile-time resolution of o8 and o8m based on op codes 797 COMMENT 798 COMMENT 799 COMMENT 800 COMMENT 801 Line 1 of decode matrix macro expansion. Expands t() macro to match current op code with possible tasks. For example: Line 818 is t("aa__ff__ab__,4 ____ - ____ ", t &= A) and the expanded macro becomes: { enum { i=o8m & ("aa__ff__ab__,4 ____ - ____ "[o8]>90 ? (130+" (),-089<>?BCFGHJLSVWZ[^hlmnxy|}"["aa__ff__ab__,4 ____ - ____ "[o8]-94]) : ("aa__ff__ab__,4 ____ - ____ "[o8]-" (("["aa__ff__ab__,4 ____ - ____ "[o8]/39])) }; if(i) { t &= A; } } Very ugly, but it does match op codes with the correct task sequence. See discussion above for more clarity. 802 Line 2 of decode matrix macro expansion 803 Line 3 of decode matrix macro expansion 804 BLANK 805 COMMENT 806 Set address register to 0xFFFA. Used in NMI. 807 Set address register to 0xFFFC. Used by Reset 808 Set address register to 0xFFFE. Used by IRQs 809 Read next instruction in as an address 810 Set address offset register to index register X 811 Set address offset register to index register Y 812 Offset address in to the same page 813 Read in a page and offset in to that page (absolute) 814 Read in a page and offset from that page (indirect) 815 Verify and correct address range across pages 816 Verify and correct address range across pages 817 COMMENT 818 Mask accumulator A on to temp operand t (store) 819 Mask index register X on to temp operand t (store) 820 Mask index register Y on to temp operand t (store) 821 Mask stack pointer register S on to temp operand t (store) 822 Asserts normally unused flags and backup temp operand 823 Copy primary operand and reset it 824 Mask indirectly stored byte in to temp operand 825 Mask next instruction in to operand t (immediate) 826 COMMENT 827 Sets overflow and sign flags from the temp result t 828 Stores the result carry bit in temporary bit sb 829 Masks the result operand t in to the carry bit 830 Masks the least significant bit in to the carry bit 831 Left shifts temp operand and retains stored carry bit 832 Right shifts temp operand and retains stored carry bit 833 Decrements temporary operand 834 Increments temporary operand 835 COMMENT 836 Writes the temporary operand to memory (indirect) 837 Writes temporary operand to memory across pages...this is reminiscent of a 'Far' memory operation without actual segmentation 838 COMMENT 839 Skip a cycle (NOP) 840 COMMENT 841 Pops value from the stack with NOP delay 842 Pops two bytes from the stack in to the 16-bit program counter. Similar to how RET in x86 undoes a stack frame 843 Reads (and discards) the next instruction 844 Sets address offset forward or backward and pushes page and absolute address to the stack. This is like x86 CALL 845 Directly sets the program counter 846 Pushes temp operand t to the stack 847 COMMENT 848 Sets temporary operand t to 1 849 Left shifts temp operand t by 1 (doubles it) 850 Left shifts temp operand t by 2 (quadruples it) 851 Left shifts temp operand t by 4 852 Logically invert temp operand t 853 Stores logical AND between temp and backup operands 854 Stores logical OR between temp and backup operands 855 Stores logical XOR between temp and backup operands 856 COMMENT 857 Jumps to a checked offset address based on t 858 Jumps to a checked offset address based on t 859 COMMENT 860 Saves operand t and adds(subtracts) last A with Carry. Checks for overflow and recalculates carry. 861 Compares t and c operands via subtraction (same means t == 0) Carry bit is set to inverse of t allowing the user to identify which operand is larger. 862 COMMENT 863 Sets accumulator (A) to operand t result 864 Sets index register (X) to operand t result 865 Sets index register (Y) to operand t result 866 Sets stack pointer (S) to operand t result 867 Directly sets the processor flags register ignoring the unused 4th and 5th bits 868 COMMENT 869 Update flags negative bit to match result sign bit 870 Update flags for zero result 871 Update flags overflow bit by testing if incrementing 6th bit asserts 7th 872 COMMENT 873 COMMENT 874 COMMENT 875 BLOCK END - CPU::Ins 876 BLANK 877 CPU::Op is essentially the "game loop" of the emulator. It is called repeatedly from line 942 until the emulator is closed/killed. Op determines the next instruction (either interrupt or NES rom), and calls the appropriate function pointer. 878 BLOCK START - CPU::Op 879 COMMENT 880 Updates the current state of non-maskable interrupts 881 BLANK 882 Reads in the next operation based on the program counter. Iterates PC for the next cycle. 883 BLANK 884 If resert is asserted then override the op code 885 Overrides next op code if a non-maskable interrupt is detected. This is commonly associated with vertical blanking in the PPU. If this the first nmi cycle, then nmi_edge_detected is asserted. 886 Regular interrupts also override the op code 887 If no NMI is active then deassert the nmi_edge flag 888 BLANK 889 COMMENT 890 c(n) is final macro expansion that produces pointers to two templated Ins(op) functions from the intermediate macro expansion defined on the next line. 891 o(n) is an intermediate macro that expands in to 4 tokens of c(n). Each o(n) will ultimately lead to a series of 8 function pointers to an Ins(op) template. 892 Declares an array of 264 constant function pointers. This is the classic syntax without using typedefs. It includes the 256 CPU ops plus eight more slots, 3 of which are tied to interrupts. The extra 5 don't appear to be used, but must be included to handle the upcoming macro expansion (need a multiple of 8) 893 BLOCK START - Op code function pointer definitions 894 Macros defining the first 64 function pointers in to the Ins() decoding matrix (see lines 890/891 for expansion info) 895 The second 64 function pointers in to the Ins() decoding matrix 896 The third 64 function pointers in to the Ins() decoding matrix 897 The final 64 function pointers in to the Ins() decoding matrix plus an extra 8 used for interrupts or left unused 898 BLOCK END - Op code function pointer definitions 899 Undefining the 'o' macro because that pattern could be easily matched in later code without realizing. 900 Undefining the 'c' macro because that pattern could be easily matched in later code without realizing, such as with fgetc(fp) 901 Call the function pointed to by 'op' from the function table constructed above. 902 BLANK 903 Deassert the reset flag in preparation for the next cycle. There is currently no way for reset to be asserted after the first cycle. Normally the physical NES reset button would trigger this. Now, CPU is initialized with reset asserted. 904 BLOCK END - CPU::Op 905 BLOCK END - namespace CPU 906 BLANK 907 main entry point definition 908 BLOCK START - main 909 COMMENT 910 Opens file pointer to NES game ROM file specified as command line argument 1. We're about to read in the game data. 911 Sets input file specified as argument 2 as command line argument 2. The Bisqit emulator doesn't directly read from keyboard or joysticks, instead it reads scripted input from a file. Very similar to tool-assisted speedrunning techniques. 912 BLANK 913 COMMENT 914 Verifies ROM file header. First four bytes should be "NES\x1A" 915 Reads the count of 16KiB blocks of game data 916 Reads the count of 8KiB blocks of video data 917 Reads byte of ROM flags, including half of the mapper number 918 Reads more flags including the other half of the mapper number, which is shifted and OR'd with the previous read to construct the entire mapper number. 919 Discards final 8 header bytes 920 If the mapper number is over 63, then mask the first 4 bits and choose a mapper under 16 921 Save the mapper number in GamePak for later reference 922 BLANK 923 COMMENT 924 Resizes PRG ROM memory vector to match ROM file we're about to load 925 Resizes CHR ROM memory vector to match ROM file we're about to load 926 Reads the appropriate number of PRG ROM bytes in to the GamePak memory vector. Note that the file position is currently 16 bytes in to the file. Reading now assumes that PRG ROM starts here -- some NES ROMs include a 512 byte trainer instead. This is indicated with the 2nd bit of the ROM flags on line 917. We aren't accounting for that case here. 927 Reads the appropriate number of CHR ROM bytes in to the GamePak memory vector. 928 BLANK 929 Close game ROM file 930 Print the detected ROM configuration to console 931 BLANK 932 COMMENT 933 Initialize GamePak and memory (See 182) 934 Initialize SDL via IO (See 41) 935 Initialize PPU register union (See 202) 936 BLANK 937 COMMENT 938 Init NES RAM: Loop through all 2KiB of system RAM 939 Set every group of 4 bytes to 0 or 255 alternating. This is done to supposedly mirror system state for popular TAS platforms. Apparently initial memory state is a source of entropy? 940 BLANK 941 COMMENT 942 main enters an infinite loop performing CPU operations and never returns. 943 BLOCK END - main 944 BLANK 945 BLANK 946 BLANK 947 BLANK 948 BLANK�