[RFC PATCH 0/4] Add support for LZ4-compressed kernels


Andrew Morton
 

On Sat, 26 Jan 2013 14:50:43 +0900
Kyungsik Lee <kyungsik.lee@...> wrote:

This patchset is for supporting LZ4 compressed kernel and initial ramdisk on
the x86 and ARM architectures.

According to http://code.google.com/p/lz4/, LZ4 is a very fast lossless
compression algorithm and also features an extremely fast decoder.

Kernel Decompression APIs are based on implementation by Yann Collet
(http://code.google.com/p/lz4/source/checkout).
De/compression Tools are also provided from the site above.

The initial test result on ARM(v7) based board shows that the size of kernel
with LZ4 compressed is 8% bigger than LZO compressed but the decompressing
speed is faster(especially under the enabled unaligned memory access).

Test: 3.4 based kernel built with many modules
Uncompressed kernel size: 13MB
lzo: 6.3MB, 301ms
lz4: 6.8MB, 251ms(167ms, with enabled unaligned memory access)

It seems that it___s worth trying LZ4 compressed kernel image or ramdisk
for making the kernel boot more faster.

...

20 files changed, 663 insertions(+), 3 deletions(-)

...
What's this "with enabled unaligned memory access" thing? You mean "if
the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"? If so,
that's only x86, which isn't really in the target market for this
patch, yes?

It's a lot of code for a 50ms boot-time improvement. Does anyone have
any opinions on whether or not the benefits are worth the cost?


kyungsik.lee <kyungsik.lee@...>
 

On 2013-01-29 오전 7:25, Andrew Morton wrote:
On Sat, 26 Jan 2013 14:50:43 +0900
Kyungsik Lee <kyungsik.lee@...> wrote:

This patchset is for supporting LZ4 compressed kernel and initial ramdisk on
the x86 and ARM architectures.

According to http://code.google.com/p/lz4/, LZ4 is a very fast lossless
compression algorithm and also features an extremely fast decoder.

Kernel Decompression APIs are based on implementation by Yann Collet
(http://code.google.com/p/lz4/source/checkout).
De/compression Tools are also provided from the site above.

The initial test result on ARM(v7) based board shows that the size of kernel
with LZ4 compressed is 8% bigger than LZO compressed but the decompressing
speed is faster(especially under the enabled unaligned memory access).

Test: 3.4 based kernel built with many modules
Uncompressed kernel size: 13MB
lzo: 6.3MB, 301ms
lz4: 6.8MB, 251ms(167ms, with enabled unaligned memory access)

It seems that it___s worth trying LZ4 compressed kernel image or ramdisk
for making the kernel boot more faster.

...

20 files changed, 663 insertions(+), 3 deletions(-)

...
What's this "with enabled unaligned memory access" thing? You mean "if
the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"? If so,
that's only x86, which isn't really in the target market for this
patch, yes?
Yes, exactly. If the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS,

then it is expected more boot-time improvement by LZ4-decompressor.

Currently there are two architectures which support it in mainline; x86 and powerpc.
And it is expected that ARM arch(v6 or above) also support it since the commit below.
Commit ID: 5010192d5
ARM: 7583/1: decompressor: Enable unaligned memory access for v6 and above
by Dave Martin

The test results(167ms) come from the ARM(v7 arch), MSM8960 based board with
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS set.



It's a lot of code for a 50ms boot-time improvement. Does anyone have
any opinions on whether or not the benefits are worth the cost?
Not only for the kernel but also the ramdisk can be compressed with LZ4 so
the boot-time would be more improved. The test case above didn't include
the decompressing time result for LZ4-compressed ramdisk.

So far the implementation is applicable to boot-time improvement for
LZ4-compressed kernel and ramdisk images but the decompressor module is
exported as an interface for other usages like LZO.
With LZ4 compressor(not yet implemented for the kernel), it is expected
that it will be used in many places in kernel such as crypto and fs(squashfs, btrfs).

Thanks,
Kyungsik


Nicolas Pitre <nico@...>
 

On Mon, 28 Jan 2013, Andrew Morton wrote:

On Sat, 26 Jan 2013 14:50:43 +0900
Kyungsik Lee <kyungsik.lee@...> wrote:

This patchset is for supporting LZ4 compressed kernel and initial ramdisk on
the x86 and ARM architectures.

According to http://code.google.com/p/lz4/, LZ4 is a very fast lossless
compression algorithm and also features an extremely fast decoder.

Kernel Decompression APIs are based on implementation by Yann Collet
(http://code.google.com/p/lz4/source/checkout).
De/compression Tools are also provided from the site above.

The initial test result on ARM(v7) based board shows that the size of kernel
with LZ4 compressed is 8% bigger than LZO compressed but the decompressing
speed is faster(especially under the enabled unaligned memory access).

Test: 3.4 based kernel built with many modules
Uncompressed kernel size: 13MB
lzo: 6.3MB, 301ms
lz4: 6.8MB, 251ms(167ms, with enabled unaligned memory access)

It seems that it___s worth trying LZ4 compressed kernel image or ramdisk
for making the kernel boot more faster.

...

20 files changed, 663 insertions(+), 3 deletions(-)

...
What's this "with enabled unaligned memory access" thing? You mean "if
the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"? If so,
that's only x86, which isn't really in the target market for this
patch, yes?
I'm guessing this is referring to commit 5010192d5a.

It's a lot of code for a 50ms boot-time improvement. Does anyone have
any opinions on whether or not the benefits are worth the cost?
Well, we used to have only one compressed format. Now we have nearly
half a dozen, with the same worthiness issue between themselves.
Either we keep it very simple, or we make it very flexible. The former
would argue in favor of removing some of the existing formats, the later
would let this new format in.


Nicolas


H. Peter Anvin <hpa@...>
 

Uhm... you're saying we have to be at one extreme or the other?

We probably could drop the legacy lzma format, but someone might rely on it.

Nicolas Pitre <nico@...> wrote:

On Mon, 28 Jan 2013, Andrew Morton wrote:

On Sat, 26 Jan 2013 14:50:43 +0900
Kyungsik Lee <kyungsik.lee@...> wrote:

This patchset is for supporting LZ4 compressed kernel and initial
ramdisk on
the x86 and ARM architectures.

According to http://code.google.com/p/lz4/, LZ4 is a very fast
lossless
compression algorithm and also features an extremely fast decoder.

Kernel Decompression APIs are based on implementation by Yann
Collet
(http://code.google.com/p/lz4/source/checkout).
De/compression Tools are also provided from the site above.

The initial test result on ARM(v7) based board shows that the size
of kernel
with LZ4 compressed is 8% bigger than LZO compressed but the
decompressing
speed is faster(especially under the enabled unaligned memory
access).

Test: 3.4 based kernel built with many modules
Uncompressed kernel size: 13MB
lzo: 6.3MB, 301ms
lz4: 6.8MB, 251ms(167ms, with enabled unaligned memory access)

It seems that it___s worth trying LZ4 compressed kernel image or
ramdisk
for making the kernel boot more faster.

...

20 files changed, 663 insertions(+), 3 deletions(-)

...
What's this "with enabled unaligned memory access" thing? You mean
"if
the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"? If so,
that's only x86, which isn't really in the target market for this
patch, yes?
I'm guessing this is referring to commit 5010192d5a.

It's a lot of code for a 50ms boot-time improvement. Does anyone
have
any opinions on whether or not the benefits are worth the cost?
Well, we used to have only one compressed format. Now we have nearly
half a dozen, with the same worthiness issue between themselves.
Either we keep it very simple, or we make it very flexible. The former

would argue in favor of removing some of the existing formats, the
later
would let this new format in.


Nicolas
--
Sent from my mobile phone. Please excuse brevity and lack of formatting.


Richard Cochran <richardcochran@...>
 

On Mon, Jan 28, 2013 at 02:25:10PM -0800, Andrew Morton wrote:

It's a lot of code for a 50ms boot-time improvement. Does anyone have
any opinions on whether or not the benefits are worth the cost?
In the embedded space, quick boot is a really important feature to
have. Many people resort to awful hacks in order to improve boot time,
and so I would welcome this option.

I have seen arm systems that boot in 300 ms. I would say that 50 ms is
maybe not such a small improvement after all.

Thanks,
Richard


Russell King - ARM Linux <linux@...>
 

On Mon, Jan 28, 2013 at 02:25:10PM -0800, Andrew Morton wrote:
What's this "with enabled unaligned memory access" thing? You mean "if
the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"? If so,
that's only x86, which isn't really in the target market for this
patch, yes?

It's a lot of code for a 50ms boot-time improvement. Does anyone have
any opinions on whether or not the benefits are worth the cost?
Well... when I saw this my immediate reaction was "oh no, yet another
decompressor for the kernel". We have five of these things already.
Do we really need a sixth?

My feeling is that we should have:
- one decompressor which is the fastest
- one decompressor for the highest compression ratio
- one popular decompressor (eg conventional gzip)

And if we have a replacement one for one of these, then it should do
exactly that: replace it. I realise that various architectures will
behave differently, so we should really be looking at numbers across
several arches.

Otherwise, where do we stop adding new ones? After we have 6 of these
(which is after this one). After 12? After the 20th?


Egon Alter <egon.alter@...>
 

Am Dienstag, 29. Januar 2013, 10:15:49 schrieb Russell King - ARM Linux:
On Mon, Jan 28, 2013 at 02:25:10PM -0800, Andrew Morton wrote:
What's this "with enabled unaligned memory access" thing? You mean "if
the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"? If so,
that's only x86, which isn't really in the target market for this
patch, yes?

It's a lot of code for a 50ms boot-time improvement. Does anyone have
any opinions on whether or not the benefits are worth the cost?
Well... when I saw this my immediate reaction was "oh no, yet another
decompressor for the kernel". We have five of these things already.
Do we really need a sixth?

My feeling is that we should have:
- one decompressor which is the fastest
- one decompressor for the highest compression ratio
- one popular decompressor (eg conventional gzip)
the problem gets more complicated as the "fastest" decompressor usually
creates larger images which need more time to load from the storage, e.g. a
one MB larger image on a 10 MB/s storage (note: bootloaders often configure
the storage controllers in slow modes) gives 100 ms more boot time, thus
eating the gain of a "fast decompressor".

Egon


Russell King - ARM Linux <linux@...>
 

On Tue, Jan 29, 2013 at 12:43:20PM +0100, Egon Alter wrote:
Am Dienstag, 29. Januar 2013, 10:15:49 schrieb Russell King - ARM Linux:
On Mon, Jan 28, 2013 at 02:25:10PM -0800, Andrew Morton wrote:
What's this "with enabled unaligned memory access" thing? You mean "if
the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"? If so,
that's only x86, which isn't really in the target market for this
patch, yes?

It's a lot of code for a 50ms boot-time improvement. Does anyone have
any opinions on whether or not the benefits are worth the cost?
Well... when I saw this my immediate reaction was "oh no, yet another
decompressor for the kernel". We have five of these things already.
Do we really need a sixth?

My feeling is that we should have:
- one decompressor which is the fastest
- one decompressor for the highest compression ratio
- one popular decompressor (eg conventional gzip)
the problem gets more complicated as the "fastest" decompressor usually
creates larger images which need more time to load from the storage, e.g. a
one MB larger image on a 10 MB/s storage (note: bootloaders often configure
the storage controllers in slow modes) gives 100 ms more boot time, thus
eating the gain of a "fast decompressor".
Ok.

We already have:

- lzma: 33% smaller than gzip, decompression speed between gzip and bzip2
- xz: 30% smaller than gzip, decompression speed similar to lzma
- bzip2: 10% smaller than gzip, slowest decompression
- gzip: reference implementation
- lzo: 10% bigger than gzip, fastest

And now:

- lz4: 8% bigger than lzo, 16% faster than lzo?
(I make that 16% bigger than gzip)

So, image size wise, on a 2MB compressed gzip image, we're looking at
the difference between LZO at 2.2MB and LZ4 at 2.38MB.

But let's not stop there - the figures given for a 13MB decompressed
image were:

lzo: 6.3MB, 301ms
lz4: 6.8MB, 251ms(167ms, with enabled unaligned memory access)

At 10MB/s (your figure), it takes .68s to read 6.8MB as opposed to .63s
for LZO. So, totalling up these figures gives to give the overall figure:

lzo: 301ms + 630ms = 931ms
lz4: 167ms + 680ms = 797ms

Which gives the tradeoff at 10MB/s of 14% faster (but only with efficient
unaligned memory access.) So... this faster decompressor is still the
fastest even with your media transfer rate factored in.

That gives an argument for replacing lzo with lz4...


H. Peter Anvin <hpa@...>
 

On 01/29/2013 02:15 AM, Russell King - ARM Linux wrote:
On Mon, Jan 28, 2013 at 02:25:10PM -0800, Andrew Morton wrote:
What's this "with enabled unaligned memory access" thing? You mean "if
the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"? If so,
that's only x86, which isn't really in the target market for this
patch, yes?

It's a lot of code for a 50ms boot-time improvement. Does anyone have
any opinions on whether or not the benefits are worth the cost?
Well... when I saw this my immediate reaction was "oh no, yet another
decompressor for the kernel". We have five of these things already.
Do we really need a sixth?

My feeling is that we should have:
- one decompressor which is the fastest
- one decompressor for the highest compression ratio
- one popular decompressor (eg conventional gzip)

And if we have a replacement one for one of these, then it should do
exactly that: replace it. I realise that various architectures will
behave differently, so we should really be looking at numbers across
several arches.

Otherwise, where do we stop adding new ones? After we have 6 of these
(which is after this one). After 12? After the 20th?
The only concern I have with that is if someone paints themselves into a corner and absolutely wants, say, LZO.

Otherwise, per your list it pretty much sounds like we should have lz4, gzip, and xz.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.


Johannes Stezenbach <js@...>
 

On Mon, Jan 28, 2013 at 11:29:14PM -0500, Nicolas Pitre wrote:
On Mon, 28 Jan 2013, Andrew Morton wrote:

On Sat, 26 Jan 2013 14:50:43 +0900
Kyungsik Lee <kyungsik.lee@...> wrote:

This patchset is for supporting LZ4 compressed kernel and initial ramdisk on
the x86 and ARM architectures.

According to http://code.google.com/p/lz4/, LZ4 is a very fast lossless
compression algorithm and also features an extremely fast decoder.

Kernel Decompression APIs are based on implementation by Yann Collet
(http://code.google.com/p/lz4/source/checkout).
De/compression Tools are also provided from the site above.

The initial test result on ARM(v7) based board shows that the size of kernel
with LZ4 compressed is 8% bigger than LZO compressed but the decompressing
speed is faster(especially under the enabled unaligned memory access).

Test: 3.4 based kernel built with many modules
Uncompressed kernel size: 13MB
lzo: 6.3MB, 301ms
lz4: 6.8MB, 251ms(167ms, with enabled unaligned memory access)
What's this "with enabled unaligned memory access" thing? You mean "if
the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"? If so,
that's only x86, which isn't really in the target market for this
patch, yes?
I'm guessing this is referring to commit 5010192d5a.

It's a lot of code for a 50ms boot-time improvement. Does anyone have
any opinions on whether or not the benefits are worth the cost?
Well, we used to have only one compressed format. Now we have nearly
half a dozen, with the same worthiness issue between themselves.
Either we keep it very simple, or we make it very flexible. The former
would argue in favor of removing some of the existing formats, the later
would let this new format in.
This reminded me to check the status of the lzo update and it
seems it got lost?
http://lkml.org/lkml/2012/10/3/144

(Cc: added, I hope Markus still cares and someone could
eventually take his patch once he resends it.)

Johannes


Nicolas Pitre <nico@...>
 

On Tue, 29 Jan 2013, H. Peter Anvin wrote:

On 01/29/2013 02:15 AM, Russell King - ARM Linux wrote:
On Mon, Jan 28, 2013 at 02:25:10PM -0800, Andrew Morton wrote:
What's this "with enabled unaligned memory access" thing? You mean "if
the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"? If so,
that's only x86, which isn't really in the target market for this
patch, yes?

It's a lot of code for a 50ms boot-time improvement. Does anyone have
any opinions on whether or not the benefits are worth the cost?
Well... when I saw this my immediate reaction was "oh no, yet another
decompressor for the kernel". We have five of these things already.
Do we really need a sixth?

My feeling is that we should have:
- one decompressor which is the fastest
- one decompressor for the highest compression ratio
- one popular decompressor (eg conventional gzip)

And if we have a replacement one for one of these, then it should do
exactly that: replace it. I realise that various architectures will
behave differently, so we should really be looking at numbers across
several arches.

Otherwise, where do we stop adding new ones? After we have 6 of these
(which is after this one). After 12? After the 20th?
The only concern I have with that is if someone paints themselves into a
corner and absolutely wants, say, LZO.
That would be hard to justify given that the kernel provides its own
decompressor code, making the compression format transparent to
bootloaders, etc. And no one should be poking into the compressed
zImage.

Otherwise, per your list it pretty much sounds like we should have lz4, gzip,
and xz.
I do agree with that.


Nicolas


H. Peter Anvin <hpa@...>
 

On 01/30/2013 10:33 AM, Nicolas Pitre wrote:

The only concern I have with that is if someone paints themselves into a
corner and absolutely wants, say, LZO.
That would be hard to justify given that the kernel provides its own
decompressor code, making the compression format transparent to
bootloaders, etc. And no one should be poking into the compressed
zImage.
Some utterly weird things like the Xen domain builder do that, because
they have to. That is why we explicitly document that the payload is
ELF and how to access it in the bzImage spec.

-hpa


Nicolas Pitre <nico@...>
 

On Thu, 31 Jan 2013, H. Peter Anvin wrote:

On 01/30/2013 10:33 AM, Nicolas Pitre wrote:

The only concern I have with that is if someone paints themselves into a
corner and absolutely wants, say, LZO.
That would be hard to justify given that the kernel provides its own
decompressor code, making the compression format transparent to
bootloaders, etc. And no one should be poking into the compressed
zImage.
Some utterly weird things like the Xen domain builder do that, because
they have to. That is why we explicitly document that the payload is
ELF and how to access it in the bzImage spec.
Are you kidding?

And what format do they expect?

If people are doing weird things with formats we're about to remove then
it's their fault if they didn't make upstream developers aware of it.
And if the reason they didn't tell anyone is because it is too nasty for
public confession then they simply deserve to be broken and come up with
a more sustainable solution.


Nicolas


H. Peter Anvin <hpa@...>
 

On 01/31/2013 02:16 PM, Nicolas Pitre wrote:

Some utterly weird things like the Xen domain builder do that, because
they have to. That is why we explicitly document that the payload is
ELF and how to access it in the bzImage spec.
Are you kidding?

And what format do they expect?
I think they can be fairly flexible. Obviously gzip is always
supported. I don't know the details.

If people are doing weird things with formats we're about to remove then
it's their fault if they didn't make upstream developers aware of it.
And if the reason they didn't tell anyone is because it is too nasty for
public confession then they simply deserve to be broken and come up with
a more sustainable solution.
Well, it is too nasty for public confession, but it's called
"paravirtualization".

-hpa


Nicolas Pitre <nico@...>
 

On Thu, 31 Jan 2013, H. Peter Anvin wrote:

On 01/31/2013 02:16 PM, Nicolas Pitre wrote:

Some utterly weird things like the Xen domain builder do that, because
they have to. That is why we explicitly document that the payload is
ELF and how to access it in the bzImage spec.
Are you kidding?

And what format do they expect?
I think they can be fairly flexible. Obviously gzip is always
supported. I don't know the details.

If people are doing weird things with formats we're about to remove then
it's their fault if they didn't make upstream developers aware of it.
And if the reason they didn't tell anyone is because it is too nasty for
public confession then they simply deserve to be broken and come up with
a more sustainable solution.
Well, it is too nasty for public confession, but it's called
"paravirtualization".
The fact that you are aware of it means we're not going to break them.

But my point is that we must not be held back just in case someone out
there might have painted himself in a corner without telling anyone.


Nicolas


H. Peter Anvin <hpa@...>
 

On 01/31/2013 06:28 PM, Nicolas Pitre wrote:

Well, it is too nasty for public confession, but it's called
"paravirtualization".
The fact that you are aware of it means we're not going to break them.

But my point is that we must not be held back just in case someone out
there might have painted himself in a corner without telling anyone.
Yes. However, it makes it more questionable to simply rip out
compression methods without warning. Not that warnings help, as we have
learned.

-hpa


kyungsik.lee <kyungsik.lee@...>
 

On 2013-01-30 오전 6:09, Rajesh Pawar wrote:
Andrew Morton <akpm@...> wrote:

On Sat, 26 Jan 2013 14:50:43 +0900
Kyungsik Lee <kyungsik.lee@...> wrote:
This patchset is for supporting LZ4 compressed kernel and initial ramdisk on
the x86 and ARM architectures.

According to [[http://code.google.com/p/lz4/,]] LZ4 is a very fast lossless
compression algorithm and also features an extremely fast decoder.

Kernel Decompression APIs are based on implementation by Yann Collet
([[http://code.google.com/p/lz4/source/checkout]]).
De/compression Tools are also provided from the site above.

The initial test result on ARM(v7) based board shows that the size of kernel
with LZ4 compressed is 8% bigger than LZO compressed but the decompressing
speed is faster(especially under the enabled unaligned memory access).

Test: 3.4 based kernel built with many modules
Uncompressed kernel size: 13MB
lzo: 6.3MB, 301ms
lz4: 6.8MB, 251ms(167ms, with enabled unaligned memory access)

It seems that it___s worth trying LZ4 compressed kernel image or ramdisk
for making the kernel boot more faster.

...

20 files changed, 663 insertions(+), 3 deletions(-)

...
What's this "with enabled unaligned memory access" thing? You mean "if
the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"? If so,
that's only x86, which isn't really in the target market for this
patch, yes?
It's a lot of code for a 50ms boot-time improvement. Does anyone have
any opinions on whether or not the benefits are worth the cost?
BTW, what happened to the proposed LZO update - woudn't it better to merge this first?

Also, under the hood LZ4 seems to be quite similar to LZO, so probably
LZO speed would also greatly benefit from unaligned access and some other
ARM optimisations
I didn't test with the proposed LZO update you mentioned. Sorry, which one do you mean?
I did some tests with the latest LZO in the mainline.

As a result, LZO is not faster in an unaligned access enabled on ARM. Actually Slower.

Decompression time: 336ms(383ms, with unaligned access enabled)

You may refer to https://lkml.org/lkml/2012/10/7/85 to know more about it.

Thanks,
Kyungsik


Thanks,
Kyungsik


kyungsik.lee <kyungsik.lee@...>
 

On 2013-01-29 오후 8:43, Egon Alter wrote:
Am Dienstag, 29. Januar 2013, 10:15:49 schrieb Russell King - ARM Linux:
On Mon, Jan 28, 2013 at 02:25:10PM -0800, Andrew Morton wrote:
What's this "with enabled unaligned memory access" thing? You mean "if
the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"? If so,
that's only x86, which isn't really in the target market for this
patch, yes?

It's a lot of code for a 50ms boot-time improvement. Does anyone have
any opinions on whether or not the benefits are worth the cost?
Well... when I saw this my immediate reaction was "oh no, yet another
decompressor for the kernel". We have five of these things already.
Do we really need a sixth?

My feeling is that we should have:
- one decompressor which is the fastest
- one decompressor for the highest compression ratio
- one popular decompressor (eg conventional gzip)
the problem gets more complicated as the "fastest" decompressor usually
creates larger images which need more time to load from the storage, e.g. a
one MB larger image on a 10 MB/s storage (note: bootloaders often configure
the storage controllers in slow modes) gives 100 ms more boot time, thus
eating the gain of a "fast decompressor".
Yes, the larger image could matter. Definitely it takes longer.

Here are some updated test cases: Including "loading time"

lzo lz4
loading time: 480ms 510ms
decompression time: 336ms 180ms(with efficient unaligned memory access enabled and ARM optimization)
total time: 816ms 690ms

lz4 is still 15% faster in total time. This one is similar to the simulated result by Russell King.

Thanks,
Kyungsik


Markus F.X.J. Oberhumer <markus@...>
 

On 2013-02-01 08:00, kyungsik.lee wrote:
On 2013-01-30 오전 6:09, Rajesh Pawar wrote:
Andrew Morton <akpm@...> wrote:

On Sat, 26 Jan 2013 14:50:43 +0900
Kyungsik Lee <kyungsik.lee@...> wrote:
[...]
What's this "with enabled unaligned memory access" thing? You mean "if
the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"? If so,
that's only x86, which isn't really in the target market for this
patch, yes?
It's a lot of code for a 50ms boot-time improvement. Does anyone have
any opinions on whether or not the benefits are worth the cost?
BTW, what happened to the proposed LZO update - woudn't it better to merge
this first?

Also, under the hood LZ4 seems to be quite similar to LZO, so probably
LZO speed would also greatly benefit from unaligned access and some other
ARM optimisations
I didn't test with the proposed LZO update you mentioned. Sorry, which one do
you mean?
I did some tests with the latest LZO in the mainline.
In fact you can easily improve LZO decompression speed on armv7 by almost 50%
by adding just a few lines for enabling unaligend access:

armv7 (Cortex-A9), Linaro gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

compression speed decompression speed

LZO-2005 : 27 MB/sec 84 MB/sec
LZO-2012 : 44 MB/sec 117 MB/sec
LZO-2013-UA : 47 MB/sec 167 MB/sec

Please see my other mail to LKML for details.

Cheers,
Markus

As a result, LZO is not faster in an unaligned access enabled on ARM. Actually
Slower.

Decompression time: 336ms(383ms, with unaligned access enabled)

You may refer to https://lkml.org/lkml/2012/10/7/85 to know more about it.

Thanks,
Kyungsik


Thanks,
Kyungsik
--
Markus Oberhumer, <markus@...>, http://www.oberhumer.com/


Markus F.X.J. Oberhumer <markus@...>
 

On 2013-01-30 11:23, Johannes Stezenbach wrote:
On Mon, Jan 28, 2013 at 11:29:14PM -0500, Nicolas Pitre wrote:
On Mon, 28 Jan 2013, Andrew Morton wrote:

On Sat, 26 Jan 2013 14:50:43 +0900
Kyungsik Lee <kyungsik.lee@...> wrote:

This patchset is for supporting LZ4 compressed kernel and initial ramdisk on
the x86 and ARM architectures.

According to http://code.google.com/p/lz4/, LZ4 is a very fast lossless
compression algorithm and also features an extremely fast decoder.

Kernel Decompression APIs are based on implementation by Yann Collet
(http://code.google.com/p/lz4/source/checkout).
De/compression Tools are also provided from the site above.

The initial test result on ARM(v7) based board shows that the size of kernel
with LZ4 compressed is 8% bigger than LZO compressed but the decompressing
speed is faster(especially under the enabled unaligned memory access).

Test: 3.4 based kernel built with many modules
Uncompressed kernel size: 13MB
lzo: 6.3MB, 301ms
lz4: 6.8MB, 251ms(167ms, with enabled unaligned memory access)
What's this "with enabled unaligned memory access" thing? You mean "if
the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"? If so,
that's only x86, which isn't really in the target market for this
patch, yes?
I'm guessing this is referring to commit 5010192d5a.

It's a lot of code for a 50ms boot-time improvement. Does anyone have
any opinions on whether or not the benefits are worth the cost?
Well, we used to have only one compressed format. Now we have nearly
half a dozen, with the same worthiness issue between themselves.
Either we keep it very simple, or we make it very flexible. The former
would argue in favor of removing some of the existing formats, the later
would let this new format in.
This reminded me to check the status of the lzo update and it
seems it got lost?
http://lkml.org/lkml/2012/10/3/144
The proposed LZO update currently lives in the linux-next tree.

I had tried several times during the last 12 months to provide an update
of the kernel LZO version, but community interest seemed low and I
basically got no feedback about performance improvements - which made
we wonder if people actually care.

At least akpm did approve the LZO update for inclusion into 3.7, but the code
still has not been merged into the main tree.
> On 2012-10-09 21:26, Andrew Morton wrote:
> [...]
> The changes look OK to me. Please ask Stephen to include the tree in
> linux-next, for a 3.7 merge.

Well, this probably means I have done a rather poor marketing. Anyway, as
people seem to love *synthetic* benchmarks I'm finally posting some timings
(including a brand new ARM unaligned version - this is just a quick hack which
probably still can get optimized further).

Hopefully publishing these numbers will help arousing more interest. :-)

Cheers,
Markus


x86_64 (Sandy Bridge), gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

compression speed decompression speed

LZO-2005 : 150 MB/sec 468 MB/sec
LZO-2012 : 434 MB/sec 1210 MB/sec

i386 (Sandy Bridge), gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

compression speed decompression speed

LZO-2005 : 143 MB/sec 409 MB/sec
LZO-2012 : 372 MB/sec 1121 MB/sec

armv7 (Cortex-A9), Linaro gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

compression speed decompression speed

LZO-2005 : 27 MB/sec 84 MB/sec
LZO-2012 : 44 MB/sec 117 MB/sec
LZO-2013-UA : 47 MB/sec 167 MB/sec

Legend:

LZO-2005 : LZO version in current 3.8 rc6 kernel (which is based on
the LZO 2.02 release from 2005)
LZO-2012 : updated LZO version available in linux-next
LZO-2013-UA : updated LZO version available in linux-next plus
ARM Unaligned Access patch (attached below)


(Cc: added, I hope Markus still cares and someone could
eventually take his patch once he resends it.)

Johannes
--
Markus Oberhumer, <markus@...>, http://www.oberhumer.com/