With more embedded systems using SSDs in critical applications, designers are now asking the question, “How long will this SSD last in my application?” To help answer this pressing question, it is important to review the recent changes in NAND flash technology.
NAND flash components are the primary storage media in SSDs, and are experiencing technology changes at exponential rates. The quests for lower cost per bit and smaller size requirements are driving NAND flash technology to go to smaller process geometries and store multiple bits per cell, introducing new challenges for reliability and product longevity.
The reliability concern relative to NAND flash-based SSDs is centered on the limitation to the number of write/erase cycles of the device. Embedded system OEMs wonder if an SSD will meet their long-term deployment requirements in 24/7 applications with intensive write/erase usage. The fact is, SSDs are inherently more reliable than hard drives.
NAND flash must be managed. The SSD controller manages endurance by using wear-leveling and other storage management algorithms to optimize write/erase operations to increase endurance at a system level. In addition, SSD controllers reserve a “spare area” in the NAND flash array to mange bad blocks and other flash vulnerabilities. The number of spares in an SSD is 1 to 2%, but it can be as high as 50% in applications that require high reliability. This technology, called over-provisioning, is accomplished by providing additional NAND capacity specifically to address reliability issues.
Important to accurately calculating SSD useable life is the concept of write amplification, which defines the minimum number of writes the controller makes to the media for every write command from the host system. Write amplification highlights the fundamental mismatch between erase block sizes and page sizes. For example, the minimum write size for an SSD controller may be a 4 kilobyte (KB) page size. Most SSDs must erase before writing, which can require that a whole erase block (256KB) be erased and written. The resulting write amplification of this example would be 256:4 or 64:1. The worst-case scenario is writing to the same logical block address over and over again, which would result in the 64:1 ratio. The best-case scenario is streaming data in file sizes that are integer multiples of the erase block size. In this case, the write amplification would be 1:1. In practice, the write amplification is somewhere in the middle based on how the host writes the data. This illustrates that the usage model can have a 64× impact on the usable life of the SSD.
Good: Generic Useable Life Methodologies
Many SSD vendors do not adequately classify endurance in terms that are meaningful. Using the write/erase cycles per logical block may be a starting point to compare SSD specifications, but it does not answer the question, “How long will an SSD last?” OEMs want to know the life of an SSD in terms of time — years, months — not “cycles.” Therefore, it is critical to define and measure an application’s usage model to make a real-world determination of an SSD’s lifespan. The following embedded system application examples provide two “worst case” generic methodologies based on 24/7 usage models with a requirement for one year data retention.
Database-Transactional Application: For transactional usage models such as database applications, the lifetime calculation must take the input/output per second (IOPS) into account. IOPS measurements using an industry-standard benchmark like IOMeter allow the user to define usage model parameters such as file size and the percentage of reads and writes. The write IOPS rating is the output of IOMeter based on the desired file size. The concept of write amplification comes into play because simply monitoring the host writes (IOPS rating) does not yield the proper information. Duty cycle must also be considered.
- ER = Endurance Rating: Block level endurance that has been traditionally specified as 100K, 10K, or 5K. Use the value “5” for 5K, “10” for 10K, and so on. Many vendors do not give out this information because the NAND is changing so rapidly. Consequently, users “try out” different values and adjust capacities accordingly.
- GB = Gigabytes of storage
- 33.25 = Constant derived from “endurance rating in thousands of cycles,” “KB-to-GB,” and “seconds-to-years” unit conversion.
- WR = Write IOPS Rating: Number of write input/output per second.
- FS = File Size: The file size at which the IOPS rating is measured.
- WA = Write amplification: Number of writes at the NAND level for each host write. This value is related to usage model, but the worst case, for 100% random writes, is a value of 64. This value is based on the ratio of NAND erase block size to page size. If the file size is larger than the page size, the worst-case write amplification is erase block size divided by file size.
- WDC = Write Duty Cycle: Percentage of write cycles to (read cycles + idle time).
For example, a voicemail system manufacturer is considering a 32GB SSD to replace a HDD. The drive uses a NAND device that is rated at 100K endurance with 200 write IOPS for an 8KB file. The drive does not specify a write amplification factor so a value of 32 (256KB block/8KB file) will be used. The OEM estimates the write duty cycle at 25%, which is a high, conservative estimate. The SSD lifetime would be calculated as:
Network Virtualization System: Another application example would be a network virtualization appliance that requires a one-year product deployment based on a usage model of 3,000 write IOPS on a 4KB file at 50% write duty cycle. The SSD vendor has set the endurance rating at 300K with 3 months data retention, but does not specify the write amplification factor.
The embedded system OEM will see that a 32GB SSD may not do the job, and will realize that a 64GB product provides a good safety margin.
Better: Simplified LifeEST Methodology
Three parameters govern the real useable life of an SSD: SSD technology, capacity and usage model.
OEMs can use capacity and usage model to determine useable life based on the SSD technology. To that end, SiliconSystems proposes a new metric to measure SSD technology called LifeEST. With LifeEST, SSD technology is measured by specifying the number of “write years per GB” the SSD can achieve and is defined by the following equation:
Where: ER = NAND Endurance Rating UCC = Unit Conversion Constant
The UCC is calculated by:
Using the voicemail application example, the LifeEst calculation would show:
SSD useable life can now be calculated from the following equation:
To calculate storage needs for an application, designers have traditionally started by looking at NAND flash device cycles at the block level, then the size of the OS and program files, and finally the amount of data to be collected to determine SSD capacity. This methodology was adequate when NAND component technology was not changing rapidly.
Now, designers need to determine how long their product must be deployed in the field, the application’s usage model, and how to measure and predict it. With this information, they can accurately specify the proper SSD capacity for the required field deployment.
Even with well-modeled applications, calculations are at best theoretical. The most accurate methodology that yields real-world results is to use a tool within the application itself to monitor the exact wear of the NAND flash and report that data back to the host system.
SiliconSystems’ patent-pending SiSMART technology integrated into all Silicon - Drive SSDs is one such tool. SiSMART monitors the write/erase cycles of each block, as well as the use of spare blocks on the SiliconDrive and reports usage in real-time without powering down the application. It furnishes the host system with the number of spares still available on the drive. Using this information, the host can manage its resources more effectively. Together, SiSMART technology and LifeEST accurately answer, in real-time, the question, “How long will an SSD last?”