MemMXtest - A Memory Testing Environment for MMX PCs Copyright (C) 1999, 2000 J.A. Bezemer Version 2.0 16 Mar 2000 OVERVIEW ======== MemMXtest is an extendible computer-based memory testing system that is built specifically for Intel or compatible processors that incorporate MMX technology (Pentium w/MMX, Pentium II and III, NOT Pentium Pro). The MMX instruction set allows reading and writing all 64 bits of the data words provided by the memory modules at once. MemMXtest incorporates a large number of well-known march tests and several pseudo-random tests. The vital parts of each test use manually optimized machine code for maximal speed. In it's physical form, the test is a floppy with which the "system under test" should be booted. It may also be possible to boot the test from a harddisk, bootrom or over a network, but this has not been tested. The "system under test" does not need a harddisk; it doesn't even need a keyboard or monitor, as a serial port can be used to control the system remotely. This document is meant to be read in-order (at least the first time). However, since many of the discussed subjects are highly related, there is no obvious ordering which provides a linear buildup of knowledge. This means that you might have to read this document twice in order to fully understand it. QUICKSTART ========== While MemMXtest is meant to be adjusted and recompiled for specific situations, I supplied a a pre-compiled version of the test program. This should allow you to get acquainted with the program soon. First unzip the memmxtest-1.0.tar.gz archive to a empty directory. On UNIX systems use gunzip < memmxtest-1.0.tar.gz | tar xvf - and on Windows systems use a recent version of WinZip. (Note: you probably can't do any _development_ on Windows systems, but you can use the pre-compiled version). As said, the test is a bootable floppy. The file "image" in the base directory of the archive is the pre-compiled version. Insert a formatted 1.44 MB floppy in the floppy drive. On Linux systems use a command like cat image > /dev/fd0 and on other UNIX systems dd if=image of=/dev/fd0 bs=512 conv=sync ; sync (You may need to change the permissions of /dev/fd0 to be able to access the floppy as a normal user.) On a Windows system, open a Dos window, and change to the MemMXtest base directory. There, give the command rawrite\rawrite2 and follow the prompts to write the file "image" to the floppy. Then insert the floppy in the drive of a PC that has a processor with MMX technology, and (re-)boot that computer. The words "Loading...." should appear, followed by a clear-screen and the MemMXtest version number. After this, the actual testing begins. Numerous notices appear so that you can easily track what's being done. You can control various aspects of the tests, see the next section. (Note: if something goes wrong, it's likely that either your floppy is bad or you're not using an MMX processor.) COMMANDS AND COMMAND MODE ========================= There is a "command mode" that can be used for "on-line" altering of the test parameters. (Some parameters are only changeable at compile time.) These keys are usable during testing (i.e. _not_ in command mode): `,' Switch to command mode as soon as possible, aborting the running test. `.' Switch to command mode at the end of this pass. This is useful for automated control (via the serial port) that needs to set different options for the next pass. You should wait till the "CommandMode:" indicator appears before entering additional commands. `;' and `ScrollLock' Stop the test output until the next press of `;' or `ScrollLock'. Other keys are still accepted and are placed in a buffer for later processing. (Note: these keys can either be pressed on the local keyboard or be sent to the PC via the serial port, see under "THE SERIAL PORT" below.) These keys can be used in command mode: `Escape' or `(Ctrl-Alt-)Del' Reboot the system immediately ("warm" boot). `a' Set the address bit maps. See under "ENTERING THE ADDRESS BIT MAPS" below for a detailed explanation. `c' Set cache mode. Modes are: AutoToggle: change between On and Half every pass. Force On: always cache both program and tested memory. Force Half: always cache program, but not the tested memory. Force Off: never cache anything. A discussion about caching is under "CACHING" below. (See also cacherefr.c) `r' Set refresh mode. ! NOTE: This has no effect on modern systems in which the host bridge chipset (like "VX" or "BX") provides the refresh. For those systems, you might be able to control the refresh rate from the BIOS setup menu (press Del during system startup). Modes are: AutoToggle: change between Normal and Extended every two passes (BIOS default is used in passes 0x0 and 0x1; this is supposed to be 15 ms). Force Normal: always use normal refresh rate, 15 ms. Force Extended: always use extended refresh rate, 50 ms. Force XLong: always use extra long refresh rate, 500 ms. KeepThis: don't change the refresh rate any more. When used in pass 0x0 or 0x1, this will maintain the BIOS default refresh rate (which is supposed to be 15 ms, but may be different in recent systems). (See cacherefr.c) `m' Set memory range that should be tested. For example: 01000000 - 02000000 tests 16M - 32M (the second 32MB DIMM module). See under "MEMORY IN A PC" below for more details. (Note: these are addresses as the processor sees them; they will be translated into `memory addresses' automatically.) `s' Select test set. See under "TEST SETS" below for a list. (See also tests.c) `p' Select the data-background pattern generator. See under "PATTERN GENERATORS" below for a list. (See also patterngen.c) `.' Stop the command mode and resume testing. This always starts a new pass. Note: all input and output numbers are in hexadecimal! When inputting a number, you can usually press `Escape' to quit without changing anything. Don't just press Enter as this will be interpreted as `0'. The Delete, Backspace and cursor keys are not supported, so try to avoid typing errors. (Command-mode keys also "work" during normal testing. They are buffered and processed at the end of the pass. This functionality may be removed in future versions, so don't use it; use the command mode instead.) TEST SETS ========= A large number of tests and test sets are available (see tests.c). Per pass, only one test set is used. You can not specify more test sets per pass; if you want another combination of tests, or another sequence, add a new test set to tests.c and recompile. You can also use the `.' key at the beginning of each pass, and specify a new test set for the next pass when the command mode is entered at the end of the pass (also see under "COMMANDS AND COMMAND MODE" above). Note that most tests are run several times. All march tests except WOM (nr. 110) are run for all address updating schemes and for all data-background patterns of the current pattern generator (see under "PATTERN GENERATORS" below). For example, MATS+ with two addressing schemes (fast-x and fast-y) and the counting pattern will run 2x7=14 times. The WOM test always runs only once; it provides its own data and expects a fast-x scheme in address bit map #0, and a fast-y scheme in #1. The Pseudo-Random tests also provide their own data patterns; they are run once for each addressing mode. The "repeats" in the Pseudo-Random tests are executed 5 times for the 2xx tests, and 10 times when called via set 1001 (hard-coded in tests.c). Available test sets (as defined in tests.c; see below for more info): Single March-like tests: 100 March A {Any(w0);Up(r0,w1,w0,w1);Up(r1,w0,w1);Dn(r1,w0,w1,w0); Dn(r0,w1,w0)} 101 March B {Any(w0);Up(r0,w1,r1,w0,r0,w1);Up(r1,w0,w1); Dn(r1,w0,w1,w0);Dn(r0,w1,w0)} 102 March C- {Any(w0);Up(r0,w1);Up(r1,w0);Dn(r0,w1);Dn(r1,w0);Any(r0)} 103 March C-R {Any(w0);Up(r0,r0,w1);Up(r1,r1,w0);Dn(r0,r0,w1); Dn(r1,r1,w0);Any(r0,r0)} 104 March G {Any(w0);Up(r0,w1,r1,w0,r0,w1);Up(r1,w0,w1); Dn(r1,w0,w1,w0);Dn(r0,w1,w0);Delay;Any(r0,w1,r1);Delay; Any(r1,w0,r0)} 105 March LA {Any(w0);Up(r0,w1,w0,w1,r1);Up(r1,w0,w1,w0,r0); Dn(r0,w1,w0,w1,r1);Dn(r1,w0,w1,w0,r0);Dn(r0)} 106 March LR {Any(w0);Dn(r0,w1);Up(r1,w0,r0,w1);Up(r1,w0); Up(r0,w1,r1,w0);Dn(r0)} 107 MATS+ {Any(w0);Up(r0,w1);Dn(r1,w0)} 108 MATS++ {Any(w0);Up(r0,w1);Dn(r1,w0,r0)} 109 PMOVI {Dn(w0);Up(r0,w1,r1);Up(r1,w0,r0);Dn(r0,w1,r1); Dn(r1,w0,r0)} 10A PMOVI-R {Dn(w0);Up(r0,w1,r1,r1);Up(r1,w0,r0,r0);Dn(r0,w1,r1,r1); Dn(r1,w0,r0,r0)} 10B Scan {Any(w0);Any(r0);Any(w1);Any(r1)} 10C March U {Any(w0);Up(r0,w1,r1,w0);Up(r0,w1);Dn(r1,w0,r0,w1); Dn(r1,w0)} 10D March U-R {Any(w0);Up(r0,w1,r1,r1,w0);Up(r0,w1);Dn(r1,w0,r0,r0,w1); Dn(r1,w0)} 10E March UD {Any(w0);Up(r0,w1,r1,w0);Delay;Up(r0,w1);Delay; Dn(r1,w0,r0,w1);Dn(r1,w0)} 10F March Y {Any(w0);Up(r0,w1,r1);Dn(r1,w0,r0);Any(r0)} 110 WOM 4-bit Word Oriented March test, UpX=fast-x, UpY=fast-y {UpX(w0000,w1111,r1111);DnY(r1111,w0000,r0000); DnX(r0000,w0111,r0111); UpY(r0111,w1000,r1000);UpX(r1000,w0000); DnX(w1011,r1011);DnY(r1011,w0100,r0100);UpX(r0100,w0000); UpY(w1101,r1101);DnX(r1101,w0010,r0010);UpX(r0010,w0000); DnY(w1110,r1110);UpY(r1110,w0001,r0001);DnY(r0001)} Single Pseudo-Random tests: 200 Pseudo-Random Scan equivalent {Up(wA);Repeat[Up(rA);Up(wB)]} 201 Pseudo-Random March C- equivalent {Up(wA);Repeat[Up(rA,wB)]} 202 Pseudo-Random PMOVI equivalent {Up(wA);Repeat[Up(rA,wB,rB)]} Combined test sets: 1000 Sequence of all March tests 1001 Sequence of all Pseudo-Random tests [Definitions for the majority of the implemented tests have been taken from A.J. van de Goor & J. de Neef: "Industrial Evaluation of DRAM tests", IEEE ref. 0-7695-0078-1/99$10.00] MEMORY CHIPS ============ To aid the discussion of the tests, first a typical schematic of a memory chip. This chip holds 16 bits of data. a d .-----------. d e--+b0 b1 b2 b3| a2---d c--+b4 b5 b6 b7| a3---r o--+b8 b9 bA bB| e d--+bC bD bE bF| s e `-+--+--+--+' s r | | | | address decoder<------->d | | (=demux) a0 a1 Where a0-a3 are 4 address bits, b0-bF are memory elements "remembering" one bit of data, d is the 1-bit data in/out. The memory elements are organized in a `square'. A few address bits select the row, and the others select which column of that row to use. Together they always specify exactly one memory location. Modern, high-capacity memory chips have a different layout. For enhanced operation, the square is cut in halves or even quarters, called "banks". a0 | #cols .-----. .-----. a3--#rows |b0 b1| |b2 b3| |b8 b9| |bA bB| `-----' `-----' #banks=======a1,a2 .-----. .-----. |b4 b5| |b6 b7| |bC bD| |bE bF| `-----' `-----' Now there is only one address bit that determines the row, one for the column, but two for the bank number. In the Intel architecture, the address bits are customarily divided as indicated in the drawing above, which means that rows are "circulating" through the banks (which is exactly why higher speeds are achieved). Note that the `squares' in this `set of squares' can perfectly well be rectangular, when #rows != #cols. There can be faults everywhere, for example in the address decoders, in the memory elements, between several memory elements, and in wiring. MEMORY IN A PC ============== Intel/IBM compatible systems use the memory area between 640 kB and 1024 kB for memory-mapped I/O purposes (e.g. text mode video), and there is no way to access the `hidden' memory in this region. If memory chips are to be tested as thoroughly as possible, they need to be accessed in a continuous manner (i.e. without "gaps"). Also, the testing program requires some code and data memory which can be located well below the 640 kB boundary. So the structure of the memory will look like this: up to 4 GB --+---- | | memory under test | | | >= 1024 kB --+---- | `trusted' memory for test program/data and | to fill 640 kB - 1024 kB "gap" 0 --+---- In the memory area above 1024 kB, there may be several memory modules that can be tested in one run. The Intel processors provide 32-bit memory addresses, resulting in an addressable space of 4 GB. The BIOS startup code is located at the top of these 4 GB; depending on the hardware used, the memory above a certain limit might not be accessible (the BX chip"set" for example has a maximum of 1 GB; see the appropriate documentation). On a system board with four memory slots, a possible configuration would be four 32 MB SDRAM modules. The first module will contain the test program; the memory to be tested is from 32 MB to 128 MB. In case the first module is also `untrusted', another test can be run with the modules shuffled. But there is more. The host bridge chip"set" (modern versions are actually only one big chip) does some weird shuffling of the address lines. For example, the datasheet of the host bridge may show the following connections: bit# B0 2 1 0 +---------------- row | a2 a4 a3 a5 col | a2 a1 a0 Which means that e.g. address line `a4' (as the program sees it) is connected to the row-bit #2, and `a1' to column bit #1. The actual addressing (with a0 etc. the _processors_ address lines, and b00 etc. the _processors_ memory references) is as shown below. a1 a0 a2 col1| |col0 |B0 ----decoder---- | (id.) | | | | MUX | | | | row0 d--b00 b01 b02 b03 | | b04 b05 b06 b07 a5------e--b20 b21 b22 b23<-' `->b24 b25 b26 b27 c--b08 b09 b0A b0B b0C b0D b0E b0F a3------o--b28 b29 b2A b2B b2C b2D b2E b2F row1 d--b10 b11 b12 b13 b14 b15 b16 b17 e--b30 b31 b32 b33 b34 b35 b36 b37 a4------r--b18 b19 b1A b1B b1C b1D b1E b1F row2 |--b38 b39 b3A b3B b3C b3D b3E b3F So, in order to address, for example, the first column top-down (called "fast-x addressing"), the processor must use the address sequence 00, 20, 08, 28, 10, 30, 18, 38. Actually, to access the memory in any orderly way (subsequent rows, subsequent columns, or subsequent banks), some non-trivial counting and wrapping has to be done. ADDRESS BIT MAPS ================ To solve the issue of the shuffled address bits described in the previous section, the notion of "memory addresses" is introduced. For any given addressing scheme, the memory address is defined as starting from 0 and increasing with 1 for each next location. For the previously used example of "fast-x" addressing, the memory addresses are as indicated below. b00 b08 b10 b18 b20 b28 b30 b38 b01 b09 b11 b19 b21 b29 b31 b39 b02 b0A b12 b1A b22 b2A b32 b3A b03 b0B b13 b1B b23 b2B b33 b3B b04 b0C b14 b1C b24 b2C b34 b3C b05 b0D b15 b1D b25 b2D b35 b3D b06 b0E b16 b1E b26 b2E b36 b3E b07 b0F b17 b1F b27 b2F b37 b3F This means, that memory address 00 corresponds with "processor address" 00, but memory address 01 with processor address 20, 02 with 08, 03 with 28 and so on, until 3F with 3F. (The notion "processor address" refers to the address that the processor actually sees, and the program actually uses to access the memory.) This mapping of memory address to processor address is unique for this specific addressing mode and this specific chip, we'll see some other examples below. First a more detailed analysis of this mapping. What is done in the program, is working internally with the memory address (always increment by 1), and then "inverse shuffling" it to get the processor address with which to do the reading/writing. This "inverse shuffling" looks like this: memory address bit 5 4 3 2 1 0 | | | | | | V V V V V V goes to processor address bit 2 1 0 4 3 5 For example: memory address 0C maps to processor address 11, and 33 maps to 2E as follows: memory address 0x0C = 0 0 1 1 0 0 0x33 = 1 1 0 0 1 1 maps to bit# 2 1 0 4 3 5 2 1 0 4 3 5 `-----. | `---|-. .-------|-' etc. .---' | etc. V V V V processor bit# 5 4 3 2 1 0 5 4 3 2 1 0 processor address 0 1 0 0 0 1 = 0x11 1 0 1 1 1 0 = 0x2E Alternatively, we can calculate the memory address equivalent of a given processor address by applying the reverse procedure ("forward mapping"). We identify a mapping by it's "bit destination list". So the fast-x scheme in this example has an address bit map "2 1 0 4 3 5". The address bit map can be determined easily from the table in the chipset's specifications. Recall that we used B0 2 1 0 +---------------- row | a2 a4 a3 a5 col | a2 a1 a0 in the current example. The address bit map now is: bank -col- ---row--- 2 1 0 4 3 5 0 0 0 0 0 0 memory addresses 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 : 0 0 0 1 1 1 0 0 1 0 0 0 <-- start of second column 0 0 1 0 0 1 : 0 1 1 1 1 1 1 0 0 0 0 0 <- start of second bank 1 0 0 0 0 1 : 1 1 1 1 1 1 In words, this means that we want to address subsequent rows of the first column in the first bank, then subsequent rows of the second, third and fourth columns in the first bank; then the same procedure for the second bank. The order rows - columns - banks is visible clearly (from right to left) in the address bit map. Fast-y addressing has the address bit map "2 4 3 5 1 0". (Note the columns - rows - banks order.) This is not as extreme as fast-x, for only two bits are exchanged. For example: memory address 0C now maps to processor address 28, and 33 now maps to 17. memory address 0x0C = 0 0 1 1 0 0 0x33 = 1 1 0 0 1 1 maps to bit# 2 4 3 5 1 0 2 4 3 5 1 0 | | | | `-|---. etc. .-----' | etc. V V V V processor bit# 5 4 3 2 1 0 5 4 3 2 1 0 processor address 1 0 1 0 0 0 = 0x28 0 1 0 1 1 1 = 0x17 The complete scheme is the following: b00 b01 b02 b03 b20 b21 b22 b23 b04 b05 b06 b07 b24 b25 b26 b27 b08 b09 b0A b0B b28 b29 b2A b2B b0C b0D b0E b0F b2C b2D b2E b2F b10 b11 b12 b13 b30 b31 b32 b33 b14 b15 b16 b17 b34 b35 b36 b37 b18 b19 b1A b1B b38 b39 b3A b3B b1C b1D b1E b1F b3C b3D b3E b3F Other possibilities include "fast-bank,y" with map "4 3 5 1 0 2", but also "fast-x, x-interleaved" with map "2 1 0 5 4 3" and "fast-x, y-interleaved" with map "2 0 1 4 3 5". As real-world example, we now look at the Intel BX chipset and a 64MB SDRAM DIMM module. The specs show this mapping: 11 B1 B0 10 9 8 7 6 5 4 3 2 1 0 +-------------------------------------------------------- row | 25 13 12 23 14 24 22 21 20 19 18 17 16 15 col | 13 12 AP 11 10 9 8 7 6 5 4 3 (The "AP" is not important here, so we simply ignore it.) Following the rules above, for fast-x the address bit map should be: bank --------col-------- ----------------row---------------- 13 12 11 10 9 8 7 6 5 4 3 25 23 14 24 22 21 20 19 18 17 16 15 and for fast-y: bank ----------------row---------------- --------col-------- 13 12 25 23 14 24 22 21 20 19 18 17 16 15 11 10 9 8 7 6 5 4 3 Note that the lowest bit occurring is #3. This is because the modules provide one 64-bit word at a time; the bits #0, #1 and #2 are used to select one of the eight 8-bit sub-words (bytes). The test program still increments the memory address with 1, which for the fast-y scheme results in processor address increments of 8. The highest bit occuring is #25, so there are 26 address bits for this module (counting from 0). This indeed results in a memory size of 2^26=64M. In case multiple subsequent modules, more bits should be added to the lefthand side, for example for fast-x: --modules-- bank --------col-------- ----------------row---------------- 29 28 27 26 13 12 11 10 9 8 7 6 5 4 3 25 23 14 24 22 21 20 19 18 17 16 15 However, MemMXtest will automatically do this for you. That is, it automatically fills the left-hand side of the map starting from the highest entered bit number plus one. (So it does not use bit numbers that were skipped in the entered series.) For example, if you entered: 5 4 3 9 8 7 it will be read as: 32 31 30 29 28 ... 15 14 13 12 11 10 5 4 3 9 8 7 | implicitly added by the program <--|--> entered values Note that bit #6 is absent; the usefulness of that is questionable. ENTERING THE ADDRESS BIT MAPS ============================= With the `a' command (in command mode), the address bit maps can be changed. The format for the address bit maps is as indicated in the previous section, but note that all input and output values are in hexadecimal. For example, the address bit map for fast-x as shown above, bank --------col-------- ----------------row---------------- 13 12 11 10 9 8 7 6 5 4 3 25 23 14 24 22 21 20 19 18 17 16 15 should be entered as: 0D 0C 0B 0A 09 08 07 06 05 04 03 19 17 0E 18 16 15 14 13 12 11 10 0F You should just start with entering the `0D' (or just `D', leading zeros are optional) and finish with the `0F' (or `F'), from left to right. This order might seem a bit unnatural at the first sight. After entering the maps, they will be printed for verification purposes. You'll see that the unspecified bit positions are filled with `FF', these will be calculated by the program at a later time. By default, 5 address bit maps can be specified (MAX_ADRBITMAPS in defines.h). During one pass, all march and/or pseudo-random tests will be executed once for each `filled' address bit map (where `unfilled' = all `FF's). This allows you to test with for example fast-x, fast-y, fast-bank-y, fast-bank-x and x-interleaved addressing modes in one run, without any interaction. When entering the address bit maps with the `a' command, you always get prompted for all 5 maps. Possible responses are: Escape or Map remains unchanged. Escape Enter Map is cleared (i.e. set to all `FF's). Enter Map is set to entered numbers. The numbers may be separated by any number of spaces. Pressing Escape on every input line will just show the current address bit maps without changing anything. To test the working of an address bit map, you can specify a non-existant memory region to be tested, for example a small range above 1 GB (=0x40000000). The fault addresses that are printed are in the order in which the memory is accessed. There is one default address bit map, which is just an equivalence relation between memory and processor addresses (i.e. there is no shuffling of the address lines). This works acceptably in most cases, but with situation-specific maps a higher fault coverage is often possible. The default is defined in defines.h (ADRBITMAP_LIST_INIT). ABOUT MARCH TESTS ================= March tests were originally designed to test for "permanent" faults in a single-bit memory chip. This as opposed to "non-permanent" faults that are caused (or enhanced) by for example heat, high frequencies (radiation, but also "overclocking") or specific usage patterns. March tests can be guaranteed to detect all permanent faults of certain classes. They may also detect "non-permanent" faults, but there is no guarantee due to the very nature of these faults. Example of a march test ("MATS+" test): { Any(w0); Up(r0, w1); Down(r1, w0) } This says: write a `0' in all memory locations in any address order. Then, in ascending address order, for each memory location: first test (read) if there is a `0' and immediately write a `1', then continue with the next location. Last, in descending address order, for each memory location: first test if there is a `1' and immeduately write a `0'. (Remember that we're still discussing a 1-bit memory.) Or, as a program, for a 1-bit memory of 16 (0xF) locations: Mem[0] = 0 Any(w0) Mem[1] = 0 : Mem[F] = 0 Test(Mem[0] == 0) Up(r0, w1) Mem[0]=1 Test(Mem[1] == 0) Mem[1]=1 : Test(Mem[F] == 0) Mem[F]=1 Test(Mem[F] == 1) Down(r1, w0) Mem[F]=0 Test(Mem[E] == 1) Mem[E]=0 : Test(Mem[0] == 1) Mem[0]=0 (The address order does not necessarily have to be ascending and descending, but may be any order, as long as "Up" is the exact reverse of "Down".) PATTERN GENERATORS ================== March tests were originally designed for single-bit memories. With MemMXtest, we are using 64-bit words. Each bit-position is still supposed to be in one `set of squares' (so 64 sets, one for each bit), but that's not sure. So, instead of using all 1's where a test says "write 1" and all 0's for "write 0", we substitute "write pattern p" and "write pattern n". Then we run each test with several values of p and n (n is usually the bitwise inverse of p). These values are generated with a "pattern generator". These pattern generators are the only data source for the march tests (nr. 1xx) except the WOM test, which provides its own data patterns. The march tests can't use pseudo-random data. And vice versa, the pseudo-random tests (nr. 2xx) can't use the pattern generators. Several pattern generators are available (see pattern_gen.c): 0 Counting pattern (default): p gets these values (and n=inv(p)): 0000000000000000000000000000000000000000000000000000000000000000 0101010101010101010101010101010101010101010101010101010101010101 0011001100110011001100110011001100110011001100110011001100110011 0000111100001111000011110000111100001111000011110000111100001111 0000000011111111000000001111111100000000111111110000000011111111 0000000000000000111111111111111100000000000000001111111111111111 0000000000000000000000000000000011111111111111111111111111111111 This is the shortest pattern that tests all possible combinations of the values of any two bits. The name "counting" comes from the standard binary counting pattern that can be seen when rotating the table above by 90 degrees clockwise. 1 Counting pattern, incl. inversion: p gets all values of the "original counting pattern" above, followed by the inverted values. According to theory, this should not be necessary, but in practice it might trigger some more faults. 2 Walking one: p gets these values: 0000000000000000000000000000000000000000000000000000000000000001 0000000000000000000000000000000000000000000000000000000000000010 0000000000000000000000000000000000000000000000000000000000000100 0000000000000000000000000000000000000000000000000000000000001000 : 0010000000000000000000000000000000000000000000000000000000000000 0100000000000000000000000000000000000000000000000000000000000000 1000000000000000000000000000000000000000000000000000000000000000 0000000000000000000000000000000000000000000000000000000000000000 The total number of patterns is 64+1=65. 3 Walking one per nibble: p gets these values: 0001000100010001000100010001000100010001000100010001000100010001 0010001000100010001000100010001000100010001000100010001000100010 0100010001000100010001000100010001000100010001000100010001000100 1000100010001000100010001000100010001000100010001000100010001000 0000000000000000000000000000000000000000000000000000000000000000 4 All ones: p always: 1111111111111111111111111111111111111111111111111111111111111111 So n always 000...000. This is the most basic way to extend a 1-bit march test to 64 bits; it will not detect coupling faults between data lines. 5 All ones, all zeros: p gets these values: 1111111111111111111111111111111111111111111111111111111111111111 0000000000000000000000000000000000000000000000000000000000000000 Note that several popular patterns like "row stripe", "column stripe" and "checkerboard" are not available. This is because the patterns would have to be swapped each or every few words, which is very time-consuming. Furtermore, there is no guarantee about the actual ordering of the bits on the chip, which means there is no obvious way to ensure that the intended patterns are indeed present. GENERATING PSEUDO-RANDOM SEQUENCES ================================== A well-known way to generate pseudo-random numbers is using a Linear Feedback Shift Register (LFSR). This is basically a one-bit shift register with some logic around it. Many variants are available, but only one software implementation is both easy and fast. .------------+-----------------------------. | |d2 | | V | |d2.---. .---. d1.---. d0.---. |d2 `--| M |<--|XOR|<--| M |<----------| M |<--' `---' `---' `---' `---' Above is an example of a 3-stage (or 3-bit) LFSR. The `M's are memory elements; the `state' of the shift register is d2d1d0, which is also the generated pseudo-random number. Say the state is 010 at some time. To determine the next state, calculate the values at the inputs of the memory elements. new_d2 = d1 XOR d2 new_d1 = d0 new_d0 = d2 = 1 XOR 0 = 1 = 0 = 0 So we get 100. The next step is 101: new_d2 = d1 XOR d2 new_d1 = d0 new_d0 = d2 = 0 XOR 1 = 1 = 0 = 1 And so on. This can be implemented easily in software. The d2, d1 and d0 bits are now the least significant bits of a >=3-bit variable or register. The procedure is as follows: Old content of register: 010 Shift one place left: 0100 If the shifted-out bit is 1, XOR with 101: 0100 New content of register: 100 Old content of register: 100 Shift one place left: 1000 If the shifted-out bit is 1, XOR with 101: 1101 New content of register: 101 And so on. We get exactly the same values as in the `circuit'-variant. The complete sequence is: 010 = 2 100 = 4 101 = 5 111 = 7 011 = 3 110 = 6 001 = 1 010 = 2 (back to start) 100 = 4 etc. So we have a pseudo-random sequence of length 7 = 2^3-1. The placement of the XOR gates, or the equivalent value of the `xor-mask' are very important because only very specific placements produce a maximum-length (2^#bits-1) sequence. This is accomplished only if the mask represents a `primitive polynomial'. The above example uses the polynomial 3 2 3 2 1 0 x + x + 1 = 1*x + 1*x + 0*x + 1*x - - - When you forget the x^3, the 101 xor-mask is clearly visible (the exponents are the bit-positions). The `inverse' of a primptive polynomial is also primitive. It is constructed by writing the exponents in the other direction: 0 1 2 3 3 2 1 0 1*x + 1*x + 0*x + 1*x = 1*x + 0*x + 1*x + 1*x ^ ^ ^ ^ - - - So the inverse xor-mask is 011. The file doc/primitive_polys contains a Maple program that `scans' for primitive polynomials. For 64-bit pseudo-random values, this gave the polynomial 64 61 60 27 x + x + x + x + 1 with xor-mask 0011000000000000000000000000000000001000000000000000000000000001 or as a hexadecimal number: 0x3000000008000001. This is defined as PR_XORMASK64 in defines.h. A C version of the actual shifting and XOR'ing is in update_currentpr64() in main.c. Because the shifted-out bit is lost after shifting, we first determine using its value if we have to shift either with or without XOR'ing. Note that the state of all-zeros will never be reached (it's impossible to leave, too). However, since it will take several years to generate the entire 64-bit pseudo-random sequence, the fact of one missing value will not be any problem. ABOUT PSEUDO-RANDOM TESTS ========================= There are various forms of pseudo-random tests, depending on where pseudo-random information is used. Pseudo-random tests are usually classified by their use of either pseudo-_r_andom or _d_eterministic _d_ata ("RD" or "DD"), _a_ddresses ("RA" or "DA") and read/_w_rite operations ("RW" or "DW", Random Write = Random Read). The three pseudo-random tests that are implemented in MemMXtest are "DADWRD"-type tests. These tests behave much like march tests: the address order and read/write operations are pre-defined, but the data patterns are pseudo-random. The pseudo-random data is generated in the way described in section "GENERATING PSEUDO-RANDOM SEQUENCES" above. The initial value (`state' of the LFSR) is derived from the system time using the "hardware clock" (also named "CMOS clock") at the start of the program (the "RandomSeed64:" value on screen), see pseudorandom_init() in main.c. All pseudo-random test elements use the same LSFR (the code is sometimes different, but the xor-mask is the same, and the state is passed from one element to the next). To enhance reproducibility, the starting value is printed on screen at the beginning of each test element. However, to actually have the program use a user-defined starting value (as opposed to the clock-derived value), you'll have to change pseudorandom_init() in main.c. Example of a Pseudo-Random test (March C- equivalent): { Up(wA); Repeat[ Up(rA, wB) ] } This says: initialize the memory with a sequence of 64-bit pseudo-random values, using increasing address order. Then do multiple repeats of the following: in increasing address order, for each memory location, first test if the expected value is present, then immediately write a completely unrelated new pseudo-random value. The "increasing address order" is either fast-x or fast-y. So for 5 repeats, the test becomes {Up(wA); Up(rA,wB); Up(rB,wC); Up(rC,wD); Up(rD,wE); Up(rE,wF)} with A, B, C, D, E and F completely unrelated pseudo-random values that are different for each memory location. The values in subsequent memory locations, when seen as 64-bit words, are highly related, because after shifting, either 0 or only 4 of 64 bits change value (there are only 4 `1'-bits in the xor-mask). This fact may lead to the masking of certain coupling faults in the data lines. To get "more random" values, it is possible to use only one of every 2, 4 or up to 64 values of the pseudo-random sequence. To achieve this, define PR_SHIFTMOREBITS64 in defines.h and set PR_SHIFTBITSNO64 to for example 2, 4 or 64. Note that this may cause a (severe) slowdown of the program. The `prmarchel' test elements uses pre-calculated table of pseudo-random data. This allows writing the data top-speed to the current row or column, while new data is calculated between these writes. The `prdmarchel' elements (Pseudo-Random Direct) write each value when it is calculated (or, in other words, they calculate "on the fly"). [Note: currently there are no prdmarchel's any more.] Some provisions have been made for future "RADWRD"-type tests, which have random data as well as random addresses. 16-bit to 32-bit xor-masks for generating the addresses have been constructed and tested. They are available in prepare_pr32() in main.c, but are not yet used by any test. THE SERIAL PORT =============== One of the serial ports is used to provide both a second screen and a second keyboard. The I/O-address of the serial port is defined as SERIAL_ADR in defines.h, as is the choice for ending lines with either CRLF or only LF (SERIAL_SEND_CRLF). The communication parameters (speed, parity, start/stopbits) can be configured in serial_echo_init() in mtest.c. Default is COM2 at 9600 baud, 8 bits, no parity, 1 stop bit, only LF. The intention is that this can be used by automatic processes that monitor and control the testing parameters in this way. All data is displayed in hexadecimal, and there are "line headers" ending in a colon, which should be easily parsable. For fault logging, the controlling device should remember at least the current pass number, memory range, address increments, test identification (and possibly all earlier test identifications of the current pass), sub-test identification (and all earlier ones of this test), and of course the printed sub-test read operand number, address and read/expected data. To give commands via the serial line, the contoller should use either `.' or `,' as described under "COMMANDS AND COMMAND MODE" above, and wait until the "CommandMode:" line header appears. The line header interpretation should be done case-insensitively. An "Error:" line header indicates that the current test or test set could not be completed succesfully. This usually means that some test parameters were entered incorrectly. DEVELOPERS' INFO ================ In the following sections, information is given that is primarily of interest to people wanting to change MemMXtest. MemMXtest is meant to be compiled under the Linux operating system. More info on Linux can be found at http://www.linux.org Using the Cygwin package, available from http://sourceware.cygnus.com/cygwin/ it might also be possible to compile under Windows 95/98/NT, but this is not officially supported and not tested. Apart from a C development environment (gcc, as, ld, make), you also need the "bin86" package to produce the startup section in "real mode" code. Note that the compiled code (boot floppy) doesn't need any operating system at all. It's effectively it's own OS. (RE-)COMPILING ============== You can start compilation by giving the command make in the src/ subdirectory. This will produce a file "image" in that directory. (Note: the provided pre-compiled image is also called "image" but is in the package's base directory.) You can write that file to a floppy as described under "QUICK START" above, or use the command make install which does both compilation and writing. As usual, the Makefile has been constructed in such a way that supposedly unchanged object files won't be compiled again. SOURCE FILES ============ There are several types of source files: .c C code that is preprocessed and compiled to "protected mode" code (.o). .S Assembler code that is preprocessed (.s) and assembled to either "real mode" code (bootsect.o and setup.o) or "protected mode" code (head.o en *_ml.o). .h Header files. marchel/marchel_*_ml.S Assembler code for one march element (ml=machine language). ~~~~~ ~~ marchel/marchel_*.c C file containing one function that calls the corresponding _ml.S code. This is the only place _ml.S code may be called from. march/march_*.c C files calling marchel_*.c functions as required by the march test. prmarch(el)/prmarch* march(el)* equivalents for Pseudo-Random tests. CALLING SEQUENCE ================ BIOS Boot code, loads first sector of floppy | V bootsect.S Loads rest of "image" to temporary location | V setup.S Moves "image" to where it belongs, switches to protected mode | V head.S Installs interrupt vectors | V do_test() in mtest.c Main loop for testing | +-> init() in mtest.c Initialization, only before first pass | |then per pass: | +-> set_cache() and set_refresh() in cacherefr.c Set per-pass cache/ | refresh status +-> testseq() in tests.c "Dispatch" routine for tests | V (stdtestseq() in tests.c Runs test several times for each | data pattern and addressing mode) V example: march_a() in march/march_marcha.c Executes one March A test | +-> marchel_up_rp_wn_wp_wn() in marchel/marchel_rp_wn_wp_wn.c | | | V : marchel_up_template() in marchel_template.c Template function : | +-> prepare_update_adr_up() in update_adr.c Compiles addressing | code V marchel_rp_wn_wp_wn_ml in marchel/marchel_rp_wn_wp_wn_ml.S SERVICE FUNCTIONS ================= The fact that MemMXtest doesn't run under any operating system results in quite severe limitations when compared to `normal' programs. This is most obvious with the Input/Output operations, that are usually taken care of by the OS. This support is totally unavailable to MemMXtest; even the standard BIOS routines (that are in the computers' ROM) can't be used because they have 16-bit code, which doesn't run in the 32-bit mode that MemMXtest uses. But also complex functions normally provided by a C library can't be used. Among those are for example all functions related to strings (strlen, strchr), memory blocks (memcpy, malloc), mathematics (sin, log), date/time (gettimeofday, alarm) and process/job control (system, fork, exit). When standard functions are unavailable, they have to be reprogrammed or worked around. In MemMXtest, several service functions are available (in mtest.c and .h). The most important are: cprint() Prints a character string (char*); special characters are not treated specially. b/h/p64print() Print 1/4/8 bytes in hexadecimal form, zero-padded. println() Prints a newline (which is _not_ printed with cprint!). memdump() Dumps specified range of memory, useful for debugging. delay() Waits for specified number of seconds (-0/+1) by looking at the hardware (CMOS) clock. Input-related functions are used only internally in mtest.c. Several other operations (like copying memory blocks) are mostly programmed ad-hoc. If other, more complex functions are needed, the source code of the GNU C library (http://www.gnu.org/software/libc/libc.html) might be quite useful. CACHING ======= In order to detect certain types of faults in memory chips, the order of read and write accesses is important. When testing memory in a computer system, the various caching mechanisms will cluster the read and write operations. It is therefore required that the cache is turned off (or ineffective) during these tests. In Intel CPUs, there are two internal state bits that control the `global caching' (bits CD and NW of the control register CR0). These bits set the caching mode to either `fill' or `no-fill'. The `fill' mode is the normal operation mode in which the cache is fully functional; in `no-fill' mode the cache remains functional (read/write) for the memory locations it was caching before the mode change, but it will not cache any new memory locations. The `WBINVD' instruction will invalidate ("flush") the cache so that, when in `no-fill' mode, the cache is effectively switched off. While the `fill'/`no-fill' caching modes affect all memory locations, there are additional mechanisms for disabling caching for certain areas of memory, without affecting other areas. Intel CPUs up to the Pentium (with or without MMX technology) can only do this in hardware; the system board sets a certain pin of the processor to a certain logic level to indicate that the accessed memory address should not be cached. The P6 family (Pentium Pro, II and III) allow software control of memory region specific caching, but to a limited extent. A number of Memory Type Range Registers (MTRRs) are provided, which each set the caching mode for a particular (fixed or variable) range of memory. Two problems arise when trying to use MTRRs in application software: 1. The BIOS sets the MTRRs to computer specific values at system boot. There may or may not be MTRRs left for use by application software, depending on system configuration. Additionally, defining an application specific MTRR may interfere with BIOS defined MTRRs, resulting in unspecified (hence unpredictable) behaviour. 2. The MTRRs are in the set of `Model Specific Registers', which means that there is no guarantee for them to be supported in future Intel processors. Application software that uses these MTRRs may well become useless in a few years. These considerations lead to the conclusion that memory range specific caching settings should not be used in programs that should function on a wide variety of systems. (Note that you can still customize the program for your specific needs to get the highest performance possible. However, this is not advisable.) MemMXtest therefore restricts its attention to the global caching controls. The main disadvantage of using only this technique is that, when data caching is switched off, code caching will be switched off too, so the testing program will run slowly. However, there are ways to get around this problem. The caching system is organized as sketched below. Processor <--- L1 cache <--- L2 cache <--- (L3 cache) <--- Main Memory code / data On all Pentium processors, there is a separate code and data part for the L1 (Level 1) cache. Starting from the Pentium Pro, the L2 cache is `on board' (i.e. very close to the processor) and runs at full processor speed. The L3 cache, if present, is located at the system board and runs at a slower speed. Two rules that are of paramount importance for MemMXtest are not mentioned in the Intel documentation: - If there is free space in L1, the requested code/data is cached in L1 only, and not in L2 or L3. - Memory locations can not be cached in L1-data and L1-code at the same time. When a memory location, that was in L1-code, is read as data, it will be marked as `invalid' in L1-code. This happens even when global caching is set to no-fill mode. (If caching is in fill mode, the location will be re-read from memory and cached in L1-data.) In MemMXtest a structure is used that, after invalidating (=emptying) the caches, pre-charges them by reading the required regions of data and "pre"-executing the test code for a small test region just below 640kB. The caching priority is as shown in the diagram below. _ Error reporting area | unimportant data | (not used often) Update_adr code | code | Pseudo-random tables | data Test code V most important code This leads to the following structure for calling a test element in MemMXtest. Cache: ON HALF OFF get "pre"-params copy "pre" upadr code invalidate cache read error report.area call upadr_start call test ("pre") cache to no-fill get real params get real params get real params copy real upadr code copy real upadr code copy real upadr code cache to fill call upadr_start call upadr_start call upadr_start [read PR tables] (read PR tables) cache to no-fill cache to no-fill invalidate cache call test call test (real) call test (cache to fill) cache to fill cache to fill (invalidate cache) invalidate cache invalidate cache end. end. end. The "pre"-test always uses its own parameters (data patterns etc.) and address updating scheme (also see under "THE ADDRESS UPDATING CODE" below). The upadr_start call sets up the start and end addresses (for both "pre" and "real" tests), and also executes the address updating code that is used by the test. The Pseudo-Random tables are read only by the Pseudo-Random test elements. During the "real" test, all needed code and data is available in the caches; the tested memory region however is not in any cache. Aliasing is no problem here, because all caches are at least 2-way set associative (each memory location can be cached in at least two different cache lines), and the L1 and L2 caches also seem to complement each other. This structure is implemented in the general march element template marchel_ml_template.S; see there for more info. EXAMPLE CODE ============ Below is a `pseudo-assembler' version of an Up(rp, wn) march element. The notation is a little unusual, but translates easily to a format that is used by the GNU `as' assembler (this assembler has a "opcode source, destination" instruction format, this pseudo code has "opcode source1, source2 -> destination"). #include "marchel_ml_template.S" loop: movq [EAX] -> MMin | Registers: movq MMin -> MMinbak | 32-bit: pcmpeqd MMin, MMp -> MMin | EAX: current address read psrlq MMin, MM16 -> MMin | EBX: autonomous increment & movd MMin -> EDX | ECX: test not EDX -> EDX | EDX: temporary values test EDX, EBP | ESI: mem-adr start of current jnz fault | autonomous range | EDI: proc-adr end of current write: movq MMp' -> [EAX] | autonomous range | EBP: 0xFFFFFFFF (AND value) add EBX, EAX -> EAX | 64-bit MMX: next cmp EAX, EDI | MMp : p pattern addr jbe loop | MMp' : not-p pattern call update_adr | MMin : data read from memory jnc loop | MMinbak: backup of read value | MM16 : 16=0x10 (shift amount) fault: store the fault pattern (Addr: EAX, Expect: MMin, Read: MMinbak) jmp write The marchel_ml_template.S file contains a template that uses the actual test code as a subroutine. The functionality of the template is discussed in the section "CACHING" above. Before this routine is called, EAX is loaded with the start address, EBX with the autonomous increment, EDI with the end of the autonomous range (also see below), and of course the correct values are in the MMX registers. The `read & test' instructions read one 64-bit memory word, that is backed up in case there is an error. The read value is then compared to the expected value on a 32-bit basis. If both values are the same, all bits of the result (MMin) will be 1's. In there are differences, either the left or the right or both 32-bit sub-words of the result will be set to 0's. Since none of the MMX instructions set condition code bits, the result is shifted 16 bits to the right and then transferred to a 32-bit register (EDX). So for EDX it now holds that if there is no error, all bits will be 1's, and if there are errors, either the left or the right or both 16-bit subwords of EDX will be set to 0's. EDX is then inverted which means that all 0's indicate no errors, which is then tested. The `write' instruction writes the `not-p' pattern. The `next addr' instructions finally calculate the next address. This is done in two parts: the autonomous increment, and, at the end of the autonomous range, a call to update_adr which calculates the start of the next autonomous range (see below). If update_adr returns with the carry bit set, the end of the test is reached. The branching condition to determine the end of the autonomous range is mutated at the beginning of marchel_ml_template.S to allow only one test routine to be used for both `up' and `down' addressing modes. Some notes about the optimization of this code for the Pentium processors: - Intel processors have a `write buffer' that writes data to memory in the background, but still `in order'. Therefore the `write' instruction should be placed immediately after the read instruction (labeled `loop:'). - The `read & test' block has too much data dependencies and can't be rescheduled. Register renaming on software level is impossible because there are no free registers. - The branch prediction mechanism may be `pre-loaded' with correct default values; this is done by the cache-filling "pre"-test. - All instructions (except read/write) operate on registers only, resulting in shorter machine instructions and faster execution. - Branch target labels should be aligned on 16-byte boundaries for optimal performance. THE ADDRESS UPDATING CODE ========================= The code to update addresses during the tests is probably the conceptually most difficult code in the program. Don't panic if you don't understand it the first time. Basically the "problem" is that we're dealing with two different kinds of addresses, the processor address and the memory address. See the section "ADDRESS BIT MAPS" above. During a test, the "address situation" is basically as follows: 1. Processor start and end addresses are passed to the test (the march element, to be exact) from the main program 2. Start and end are converted from processor to memory addresses 3. `Current memory address' is memory start address 4. Convert `current memory address' to `current processor address' 5. Test location `current processor address' 6. Increment `current memory address' by 1 7. Go to 4 (until finished) So we have to translate addresses both ways. The functions proc2mem() and mem2proc() in update_adr.c do this the C way, and bit-by-bit. Both examine the address bit map from right to left, and use that to decide where the specific memory address bit comes from (proc2mem) or goes to (mem2proc). The first bit that comes from/goes to a bit number >=32 (counting from 0; usually 0xFF) is seen as the end of the map, and subsequent bits come from/go to positions following the largest previously used position. Using these C functions during the test (step 4) of course has disastrous effects on the speed. Four speedup factors have been implemented: a. Machine code for mem2proc during test b. Converting ranges of bits instead of individual bits c. Converting only when necessary d. Run-time compilation: fast code and immediate values ad a. Manually optimized machine code is used (effectively making mem2proc() C function useless). ad b. The actual bit mappings show large bit ranges that can be converted at once by shifting them together to their rightful place. If this is enabled (`allowranges'), the the address bit map is basically split into ranges with consecutive bit numbers (sometimes only one bit in a range). The memory address is then ANDed with a value that extracts just that range, the result is shifted as appropriate and then ORed to the result. ad c. Usually there will be a range of consecutive bit mappings at the righthand side of the address bit map (like memory address bits #7 - #0, with mappings 22 - 15, in the fast-x example discussed above). At the start of the test they will be filled with all 0's, then increment by 1 each step (memory address!) until they are all 1's. The next increment by one is the first step that changes anything outside the righthand-side consecutive range (in the fast-x example, the bit with mapping 24). After that increment, the consecutive bits are all 0's again, and so on. Because the mappings are consecutive, there is also a range of bits in the processor address that exhibits exactly the same behaviour. In the fast-x example, bits #22 - #15 are all 0's at the start of the test, then bit #15 is incremented each step until all bits in the range #22 - #15 are 1's. Then "something happens" which results in bits #22 - #15 being set back to all 0's, and the counting starts again. Since both the processor address increment (bit #15) and the "temporary" processor start and end addresses (bits #22 - #15 all 0's/1's) can easily be determined in advance, a simple loop can just test the memory range between a given processor start and end address with the given increment, without bothering to convert to or from memory addresses. This is called `autonomous testing' and is fully exploited in MemMXtest. During the test, the machine code routine update_adr calculates the new processor start and end addresses (procstartautonom and procendautonom) while also keeping track of the current memory start address (memstartadr). The processor address increment (autonominc) does not change during the test, and is calculated in the C functions prepare_update_adr_up() and _dn(). ad d. The machine language implementation of update_adr (with included address conversion) could have been implemented elegantly by using conversion tables, reserved addresses for variables and so on. This would however have made the routine very slow because of extra data memory accesses per instruction and many unnecessary loops with branches. These problems have been solved by applying a technique of run-time compilation and immediate values. Instead of creating a simple table with the address bit mappings, an "extended" table is created with additional machine code that performs the conversion and also various other updates and checks. This is the update_adr code, which in this way effectively has the needed data handy right in the code itself (`immediate' values for the instructions). This `compilation' is done in the C functions prepare_update_adr_up() and _dn(). As indicated above, the prepare_update_adr_up/dn() functions are called from marchel_up/dn_template(), which are called from each march element's C function. In marchel_up/dn_template(), two distinct update_adr code blocks are generated, one for the cache-filling `pre'-test and the other for `real' use. Start addresses and lengths of both blocks (pre_adrcode(_len) and real_adrcode(_len)) are passed to the machine language test code via the `param' mechanism. The first part of the machine code then copies these blocks to a special reserved memory region to enable optimal caching and to provide one entry point to the actual test routine (namely update_adr). For more information regarding the functionality of update_adr, refer to the comments at the beginning of update_adr_ml.S. IMPLEMENTING A NEW MARCH ELEMENT ================================ To implement a new march element, use the following procedure. 1. Create a marchel/marchel_*_ml.S file by copying an existing one and adding/removing a few things. 2. Create a marchel/marchel_*.c file in the same way. 3. Add the name of the marchel/marchel_*.c file to the CSSRCS variable in the Makefile. 4. Add the prototype of the C fuction to tests.h. The same procedure applies for a new pseudo-random test element; change the file names as appropriate. Note that _every_ .S file is called *_ml.S and has an accompanying .c file. The one and only exception is the head.S file. The march element designation (like "rp_wn") always starts with the `positive' pattern, as does the argument list of the C function. The march test then calls the elements with either `normal' ("element(p,n)") or `reversed' ("element(n,p)") arguments. IMPLEMENTING A NEW MARCH TEST ============================= A march test does nothing more than calling several march elements. To implement a new one, use the following procedure. 1. Create a march/march_*.c file by copying an existing one and adding/removing a few things. 2. Add the name of the march/march_*.c file to the CSRCS variable in the Makefile. 3. Add the prototype of the C fuction to tests.h. 4. In testseq() in tests.c, add a new 1xx test number that calls your new test, and also add your test to the "everything"-sequence (nr. 1000). 5. Add a description of the test to this MANUAL and the CAP-SHEET. A similar procedure applies for a new pseudo-random test. As mentioned above, for the first action (read or write) taken by the march elements, always the pattern is used that was passed as first argument to the C function. In the formal definitions of the march tests, the first action of any element can use either a `0' or a `1' (possibilities are `r0', `r1', `w0' and `w1'); in general the first element of a complete test is a `w0' action. In the implementation, that uses `p' and `n' instead of `0' and `1', the convention is that each march test starts with a `wp' action. This leads to the situation that, in most cases, any element starting with a `0'-action in the formal definition has an "element(p,n)" calling scheme, while any element starting with a `1'-action has "element(n,p)". However, some caution is recommended. IMPLEMENTING A NEW PATTERN GENERATOR ==================================== To implement a new pattern generator, use this procedure. 1. In pattern_gen.c, create a new function pattern_*() taking one of the existing functions as an example. 2. At the beginning of pattern_gen.c, add a prototype for the new function. 3. In pattern_generate64() in pattern_gen.c, add a new number which calls the new pattern generator. 4. Add a description of the generator to this MANUAL and the CAP-SHEET. Note that the generator must re-calculate the pattern every call using the patterncount variable; a generator is not supposed to remember anything between calls. (In fact, that may well result in erroneous behaviour.) The generator should set *p and *n with the calculated patterns and return 1; except when patterncount is too high, in which case it should return 0. The program will then reset the pattern counter to 0 and try again.