I find this particular paper by Nightingale (Microsoft Research) quite interesting (will be presented at EUROSYS '11); it analyses data collected by the Windows Error Reporting system (crash dumps) to look at CPU, DRAM and disk failure rates.
Some of the things learnt:
Another interesting piece of work is Google's Failure Trends in a Large Disk Drive Population, FAST '07.
posted at: 16:11 | path: / | permanent link