Instructions for writing Raw programs under starsearch: ------------------------------------------------------- 0. Documentation A. This file. B. the Raw infrastructure map (http://cag.lcs.mit.edu/raw/RawMap.html) C. the Raw spec (especially "14. Appendages"), OFF OF the Raw webpage (ftp://ftp.cag.lcs.mit.edu/pub/raw/documents/RawSpec99.pdf), D. the MIPS R4000 manual (www.mips.com), off of the MIPS web page. E. selected Raw memos: http://www.cag.lcs.mit.edu/raw/memo/19 1. Make your own copy of the starsearch directory (../starsearch). 2. Create your own program directory: cd starsearch/end-to-end mkdir cp ../module_tests/sample/* 3. Create your program. Modify the "sample test" to be your new program. You can also look at the static test to get an idea of how to write a switch test. You can write tests in assembly, C, or both. Success of a program is can be indicated with the PASS directive, failure of a program is indicated with the FAIL directive, and successful completion of the whole program can be indicated with the DONE directive. Assembly: see the /starsearch/module_test/sample/sample_unit_test.S file. Use any macros in /btl/common/include/raw_constants.h C: see the starsearch/module_test/sample/sample_test.c file. Use any of the functions and macros in /rawlib/include/raw.h. To pass compiler flags to gcc, use a line in your make file like: RGCCFLAGS += -O3 Makefile: see the sample makefile. It pretty much does everything for you. You should not have to write any make rules. Currently, the example makefile is for a single-tile test, but it can be rearranged to be a multi-tile test. The sample makefile shows a list of the different rules you can run. 4. Debug it on the simulator. make debug typical simulator command sequence for running the test: go(); - runs until the serial rom finishes streaming code into program go_to(0,"begin"); - stops at the entry point of your test (or use ` key in code window) See http://cag.lcs.mit.edu/raw/memo/19 for usage on the simulator. 5. Running the program in non-interactive mode. make run 6. Set the attributes flags for the test. We will be adding attributes to this list as needed. e.g., Currently the following tag(s) are relevant: LARGE_STATIC_DATA -- program uses large amounts of static data and should use boot loader Appendix 1: Advanced Starsearch usage. The common directory contains a number of helpful makefiles that you can include for different test configurations. Makefile.single -- single tile test Makefile.all -- most general, allows specification for any tiles. Edit the Makefile to modify OBJECT_FILES, for a single tile program. For a program using Makefile.all, you want set OBJECT_FILES_00, OBJECT_FILES_01, ..., OBJECT_FILES_15. make clean # clean things up make # creates .rbf and .rexe file make run # run the test make debug # fire the simulator up in interactive mode (rbf = raw boot format) (rexe = symbol file) make tileXX.dis # (XX = 00..15) disassemble contents of tile XX 9. Using btl In the Processor view window... s Single Step - This single steps the processor g Go - This lets the processor run free c Stop the processor from running if running else kills btl Everything in the shunt window is like a C function and actually corresponds to a bC function call. Therefore everything needs to be terminated with a ';' and be passed 0 or more parameters. Like this help(); or help(help); In the "shunt" window... go(); This simply lets the processor run free step(); Single steps the processor step(N); Steps the processor N cycles regs(); Prints out all of the regs regs(N); Prints out regs of tile number N sp(N); Prints out the contents of the switch processor number N along with all of the queues associated with it ?pNrM; prints out tile N register M's contents in hex, dec, and float example: ?p0r5; This prints out the contents of register 5 in tile 0. Note, that becasue of the writeback to the register file not happening till further down the pipeline it may take some time to show up correctly. dm(N); Dumps memory not exactly sure what N represents becasue the caching system just changed the meaning of this command. help("cmd"); print help message for the command named cmd. 8. Miscellaneous increasing instruction memory. by default, imem size = 32K. you can increase it to 64K or 128K: BTL-ARGS += -imem_size 65536 BTL-ARGS += -imem_size 131072 Appendix 2: Makefile rule parameters There are a number of additional parameters that can be used to activate or deactive additional logging features for the simulations: STRACE: dumps the strace file gmake run STRACE=1 | enables strace mode gmake run STRACE= | disables strace mode (DEFAULT) STRACEB: dumps strace file starting from the barrier in simulator TEE: tees the output of the simulation to a file in starsearch/results (it can be desirable to turn tee off because it buffers the output) gmake run TEE=1 | enables teeing of stdout output gmake run TEE= | disables (DEFAULT) Appendix 3: Strace file format The strace file that is generated with the STRACE=1 flag is in the following format: CYCLE # TILENUM PC SW_PC PC_REG_WRITE SW_REG_WRITE { other events .. } "other events" key: cache events ------ if the cache is in any state other than zero, C{state#,address currently fetching} memory names ------ I[] - imem T[] - tag mem D[] - data mem ST[] - status mem SW[] - switch mem fifo names ------ The first letter is the current location (m = memory, g = general, s = static, t = static2, p = processor) the second letter is the source (N = north, E = east, S = south, W = west, + network id's) e.g., pg (cgni) gp(cgno) gN gE gS gW pm (cmni) mp mN mE mS mW ps (csti) sp (csto) sN sE sS sW, st (swo1) pt (csti2) tN tE tS tW, ts (swo2) (second static switch) A B C D E F G H t11 031c .293800e1. | | 0658 .0000000000200008.s | 00000005<-pt, A = tile #, B = pc of processor in EXE, C = instr word of processor in EXE, D = any register writes in proc E = pc of switch in REC stage, F = instr wordof switch in REC stag, G = whether the switch is stalling in EXE stage H = accesses to network buffers, writes to switch register files, writes to tags/memories Appendix 4: Simulator and RTL extensions Just as a ".bug" file can be used in btl, a ".rtl" file can be used in the rtl simulator to extend the behaviour. Primarily, this involves registering for events that devices may have triggered. Here is a sample file that monitors DRAM traffic. See http://cag.lcs.mit.edu/raw/memo/19 for more information on the .bug files. // mbt's ~/.rtl file include("<.bug>"); // include standard support EventManager_RegisterHandler( "dram_request_received", "__std_handler"); EventManager_RegisterHandler( "dram_address_received", "__std_handler"); EventManager_RegisterHandler( "dram_sent_data", "__std_handler"); EventManager_RegisterHandler( "dram_received_data", "__std_handler"); fn __std_handler(hms) { printf("%s\n",hms.name); ? hms; } // counts floating point ops // (note this may be inaccurate in the face of self modifying code) global gFLOPS = 0; fn __clock_handler(hms) { local i; for (i = 0; i < gNumProc; i++) { gFLOPS += imem_instr_is_fpu(get_imem_instr(i,get_pc_for_proc(i))); } } EventManager_RegisterHandler( "clock", "__clock_handler"); Appendix 5. 5a. To use a different raw grid size on btl (currently only 8x8 is supported) Place this line in your make file after the first include xxxxx/Makefile.include: TILE_PATTERN=8x8 5b. To use additional, custom I/O devices. and both parse the command line arguments to the simulator. They provide most of the standard devices that people may want to use. Users can add devices by placing a BTL-ARGS += after the final include statement in their makefiles. (see starsearch/benchmarks/dhrystone/dhry_8_raw/Makefile) If basic or barebones does not include the functionality needed to specify the devices, then a custom machine file can be used. Typically, these files include the file, and then instantiate the additional devices. For an example of how the makefiles and devices files look for this, see starsearch/misc_tests/stream_file_example. see http://cag.lcs.mit.edu/raw/memo/19/btl-advanced.html for more information on bC and device coding. Appendix 6. libints contains a collection of routines to allow programs to receive and handle interrupts. Examples of the use of these routiens are in starsearch/libints_tests. Further documentation on how to use the deadlock avoidance routines is in starsearch/libints.tests/deadlock/how2use.txt. Edited contents of the libints.h file follows: void setup_interrupts(); // Must be called before any other interrupt related routine. // Installs default interrupts handlers ("FAIL(...)") and // turns interrupts on. /* * These routines interact with the External Interrupt Service device * to cause the tile they are running on to receive periodic external * interrupts. */ /* * To use the External Interrupt Service device you will have to * include a line similar to the following in your Makefile: * BTL-ARGS = 4 4 my_machine.bc -standard_bootrom $(ROOT).rbf -print_service -drams_rhs * "my_machine.bc" must be: * * include(""); * include(""); * { * local i; * local result; * i = 12; * result = dev_external_interrupt_service_init(i); * if (result == 0) * exit(-1); * } */ void start_external_ints(long Num_Cycles, int Num_Ints, void (*routine)()); // Setup to have external interrupts sent every Num_Cycles cycles. // This will happen Num_Ints times. If Num_Ints is -1 then it happens // an unlimited number of times. // if "routine" is non-NULL it will be called by the interrupt handler // with system interrupts off. On return the interrupted program will // continue executing. void stop_external_ints(); // Stop the external interrupts. This both turns off handling of // external intrrupts and tells the External Interrupt Service device // to stop sending interrupt messages. void reset_external_ints(); // After external interrupts have been stopped (or the requested number // of interrupts have occured) resets the External Interrupt Service // and handler to the state after the last call of start_external_ints. // If called before any call of start_external_ints, then this is // equivalent to the call "start_external_ints(0, 0, 0)". /* * These routines control the occurance and handling of periodic TIMER * interrupts. */ void clear_timer_ints(); // Saves the current TIMER interrupt dispatch instruction, // turns interrupts off, clears WATCH_SET, sets WATCH_VAL and then // WATCH_MAX to zero, turns interrupts on and handles any pending // TIMER interrupt (ignoring it), and restores the current TIMER // interrupt dispatch instruction. // This ensures that the delay until the next TIMER interrupt is maximum. void start_timer_ints(long Num_Ticks, void (*routine)(), int set_DYN_MOVE, int set_NOT_STALLED); // Sets things up to have TIMER interrupts go off every Num_Ticks cycles. // If "routine" is non-NULL then the handler will call routine with system // interrupts off. On return the interrupted program will continue // execution. // If set_DYN_MOVE is not 0 then the DYN_MOVE bit of WATCH_SET is set. // If set_NOT_STALLED is not 0 then the NOT_STALLED bit of WATCH_SET is set. // If either is 0 then the corresponding bit will be cleared. // The current TIMER interrupt dispatch instruction is saved so it can // be restored when stop_timer_ints or clear_timer_ints is called. void stop_timer_ints(); // Turns off handling of timer interrupts. Restores the saved TIMER // interrupt dispatch instruction. void reset_timer_ints(); // Resets things to the state after the last call of start_timer_ints. // If called before start_timer_ints is called, then it is equivalent // to a call of "start_timer_ints(0xffffffff, 0, 0, 0)". /* * Notes: * 12 cycles for interrupt dispatch and eret * * 6 cycles of handler if no rtn * 18 cycles per timer interrupt for no rtn. * * 136 cycles of handler plus trivial rtn (just returns). * 148 cycles per timer interrupt for trivial rtn. */ // A simple routine to called by the timer interrupt. void dump_status_register_state(); // Does a "PASS(9999)" and then dumps (using PASS_REG) all of the // status and control registers in order: // SW_FREEZE // SW_BUF1 // SW_BUF2 // MDN_BUF // SW_PC // BR_INCR // WATCH_VAL // WATCH_MAX // WATCH_SET // CYCLE_HI // CYCLE_LO // GDN_RF_VAL // GDN_REMAIN // GDN_BUF // [status register 18 not dumped] // GDN_CFG // MDN_CFG // EX_PC // EX_UPC // FPSR // [status register 23 not dumped] // EX_BITS // EX_MASK // [status register 26 not dumped] // POWER_CFG // TC_CFG // [status register 29-31 can not dumped] /* * These routines setup and control a deadlock avoidance interrupt handler. */ void setup_deadlock_handler(long Timeout, int set_NOT_STALLED); // Sets up the timer interrupt to handle deadlocks on the GDN. // Timeout: the number of cycles used as the deadlock timeout. // A value of zero will turn the interrupt off. // A call of reset_deadlock_handler should probably be made // after turning the interrupt off. // set_NOT_STALLED: if true then the NOT_STALLED bit is set in WATCH_SET. // (DYN_MOVE is always set). // Note: this routine does NOT turns interrupts on or off. Interrupts // must (of course) be on for it to do anything. // // On a timer interrupt the deadlock handler will read everything it can // off of $cgni into an 8K buffer. If the buffer fills then it will fail // with a value of 0xdead. Once everything has been read it will arm the // GDN_REFILL interrupt and start putting buffer values (in order) into // GDN_RF_VAL to be read from $cgni. Once the buffer is empty again, it // will revert to normal operation. // // As currently configured the deadlock handler will do a PASS_REG with // a value of 0xdeaddead when the interrupt goes off and there is stuff // to read from $cgni. void reset_deadlock_handler(); // Resets all the deadlock handler buffer pointers and counters. // Any currently saved values from $cgni will be lost. int get_number_of_deadlocks(); // Returns the number of times the deadlock interrupt has occurred and // found data to be read from $cgni.