Date 1 - 3 of 3
[Proposal] Ordered swapping for performance on flash.
This outlines some of the current performance issues with swap on flash on linux.
Readahead is useless.
Small random writes are generally the worst-case performance for flash memory speed, long contiguous ones best. The current swap algorithm doesn't know this.
This proposal would help almost every embedded device that may at some time need to swap, increasing swap speeds by up to an order of magnitude.
It may even improve performance in some disk-based workloads, but that would be a side-effect.
Tim Bird <tim.bird@...>
Ian Stirling wrote:
http://pages.cs.wisc.edu/~msaxena/FlashVMpaper.htmlThis looks like interesting work, but I'm not sure exactly
what the proposal is. Is it to develop some new features
for FlashVM? Is it to mainline a current FlashVM implementation
to the Linux kernel?
Does an implementation of Ordered Swapping already exist?
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
Tim Bird wrote:
Ian Stirling wrote:http://pages.cs.wisc.edu/~msaxena/FlashVMpaper.html
This looks like interesting work, but I'm not sure exactlyAs I understand it, the above papers authors diddn't actually implement anything.
It was solely based on statistics of general fragmented writes, and analysis of swapping behaviour to find that indeed fragmented writes are common with the current algorithm.
I am unaware of any implementations of this.
The proposal would be to develop a swap-to-flash layer.
I have unfortunately done essentially no kernel coding, so the following might be inaccurate.
One way, probably not the most high performance might be to have a loopback-like layer, which does a simple algorithm like:
Maintain a list of pointers from a virtual block device that the kernel is swapping to, that point to the real flash device.
The real flash device is some fraction larger than the virtual block device.
For example, if the flash device is 6M, and the virtual swap devices size is 4M (with a block size of 4096).
The first 1536 blocks written (in any order) to the virtual swap device get written strictly in order to the real device. (regardless of order or place on the virtual device the swap subsystem chooses to place them)
After the first 1536 blocks, you have 1024 blocks allocated in the 6M device - which are mapped by internal pointers to the virtual swap device.
There are also 512 blocks free (overwritten).
There are various strategies you might now follow - but basically they boil down to finding allocated blocks in the middle of deallocated (overwritten), and copying them to the new 'head' of where you are writing, updating the virtual->real mapping as you go.
This increases IO - but greatly decreases latency.
This could be improved by giving this layer more knowledge of what the swap subsystem is doing - the above only knows about the swap subsystem saying it doesn't want a page anymore by it being overwritten.
This is all somewhat frustrated by the fact that you really want to often tell the underlying layers what you're doing.
Swapping to flash would be a lot more sane if you diddn't have to second-guess wear-leveling and other algorithms.
However, even absent that, writing large amounts of data linearly is always lots faster than writing fragmented data. (in at least all flash devices I've seen).
It is even faster taking into account the worst case of the above which would be that to write one block, you need to read one, and write two.
Apologies for the scrappy nature of this proposal - in the middle of a bout of flu.
|1 - 3 of 3|