Date   

Re: [RFC PATCH v2 0/4] Add support for LZ4-compressed kernel

Peter Korsgaard
 

"Nicolas" == Nicolas Pitre <nico@...> writes:
Hi,

>> Did you actually *try* the new LZO version and the patch (which is attached
>> once again) as explained in https://lkml.org/lkml/2013/2/3/367 ?
>>
>> Because the new LZO version is faster than LZ4 in my testing, at least
>> when comparing apples with apples and enabling unaligned access in
>> BOTH versions:
>>
>> armv7 (Cortex-A9), Linaro gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:
>>
>> compression speed decompression speed
>>
>> LZO-2012 : 44 MB/sec 117 MB/sec no unaligned access
>> LZO-2013-UA : 47 MB/sec 167 MB/sec Unaligned Access
>> LZ4 r88 UA : 46 MB/sec 154 MB/sec Unaligned Access

Nicolas> To be fair, you should also take into account the compressed
Nicolas> size of a typical ARM kernel. Sometimes a slightly slower
Nicolas> decompressor may be faster overall if the compressed image to
Nicolas> work on is smaller.

Yes, but notice that lzo compressed BETTER than lz4 - E.G. from the
introduction mail:

1. ARMv7, 1.5GHz based board
Kernel: linux 3.4
Uncompressed Kernel Size: 14MB
Compressed Size Decompression Speed
LZO 6.7MB 21.1MB/s
LZ4 7.3MB 29.1MB/s, 45.6MB/s(UA)

--
Bye, Peter Korsgaard


Re: [RFC PATCH v2 0/4] Add support for LZ4-compressed kernel

Nicolas Pitre <nico@...>
 

On Tue, 26 Feb 2013, Markus F.X.J. Oberhumer wrote:

On 2013-02-26 07:24, Kyungsik Lee wrote:
Hi,

[...]

Through the benchmark, it was found that -Os Compiler flag for
decompress.o brought better decompression performance in most of cases
(ex, different compiler and hardware spec.) in ARM architecture.

Lastly, CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not always the best
option even though it is supported. The decompression speed can be
slightly slower in some cases.

This patchset is based on 3.8.

Any comments are appreciated.
Did you actually *try* the new LZO version and the patch (which is attached
once again) as explained in https://lkml.org/lkml/2013/2/3/367 ?

Because the new LZO version is faster than LZ4 in my testing, at least
when comparing apples with apples and enabling unaligned access in
BOTH versions:

armv7 (Cortex-A9), Linaro gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

compression speed decompression speed

LZO-2012 : 44 MB/sec 117 MB/sec no unaligned access
LZO-2013-UA : 47 MB/sec 167 MB/sec Unaligned Access
LZ4 r88 UA : 46 MB/sec 154 MB/sec Unaligned Access
To be fair, you should also take into account the compressed size of a
typical ARM kernel. Sometimes a slightly slower decompressor may be
faster overall if the compressed image to work on is smaller.


Nicolas


Re: [RFC PATCH v2 0/4] Add support for LZ4-compressed kernel

Markus F.X.J. Oberhumer <markus@...>
 

On 2013-02-26 07:24, Kyungsik Lee wrote:
Hi,

[...]

Through the benchmark, it was found that -Os Compiler flag for
decompress.o brought better decompression performance in most of cases
(ex, different compiler and hardware spec.) in ARM architecture.

Lastly, CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not always the best
option even though it is supported. The decompression speed can be
slightly slower in some cases.

This patchset is based on 3.8.

Any comments are appreciated.
Did you actually *try* the new LZO version and the patch (which is attached
once again) as explained in https://lkml.org/lkml/2013/2/3/367 ?

Because the new LZO version is faster than LZ4 in my testing, at least
when comparing apples with apples and enabling unaligned access in
BOTH versions:

armv7 (Cortex-A9), Linaro gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

compression speed decompression speed

LZO-2012 : 44 MB/sec 117 MB/sec no unaligned access
LZO-2013-UA : 47 MB/sec 167 MB/sec Unaligned Access
LZ4 r88 UA : 46 MB/sec 154 MB/sec Unaligned Access

~Markus



Thanks,
Kyungsik


Benchmark Results(PATCH v2)
Compiler: Linaro ARM gcc 4.6.2
1. ARMv7, 1.5GHz based board
Kernel: linux 3.4
Uncompressed Kernel Size: 14MB
Compressed Size Decompression Speed
LZO 6.7MB 21.1MB/s
LZ4 7.3MB 29.1MB/s, 45.6MB/s(UA)
2. ARMv7, 1.7GHz based board
Kernel: linux 3.7
Uncompressed Kernel Size: 14MB
Compressed Size Decompression Speed
LZO 6.0MB 34.1MB/s
LZ4 6.5MB 86.7MB/s
UA: Unaligned memory Access support


Change log: v2
- Clean up code
- Enable unaligned access for ARM v6 and above with
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
- Add lz4_decompress() for faster decompression with
uncompressed output size
- Use lz4_decompress() for LZ4-compressed kernel during
boot-process
- Apply -Os to decompress.o to improve decompress
performance during boot-up process


Kyungsik Lee (4):
decompressor: Add LZ4 decompressor module
lib: Add support for LZ4-compressed kernel
arm: Add support for LZ4-compressed kernel
x86: Add support for LZ4-compressed kernel

arch/arm/Kconfig | 1 +
arch/arm/boot/compressed/.gitignore | 1 +
arch/arm/boot/compressed/Makefile | 6 +-
arch/arm/boot/compressed/decompress.c | 4 +
arch/arm/boot/compressed/piggy.lz4.S | 6 +
arch/x86/Kconfig | 1 +
arch/x86/boot/compressed/Makefile | 5 +-
arch/x86/boot/compressed/misc.c | 4 +
include/linux/decompress/unlz4.h | 10 +
include/linux/lz4.h | 48 +++++
init/Kconfig | 13 +-
lib/Kconfig | 7 +
lib/Makefile | 2 +
lib/decompress.c | 5 +
lib/decompress_unlz4.c | 190 +++++++++++++++++++
lib/lz4/Makefile | 1 +
lib/lz4/lz4_decompress.c | 331 ++++++++++++++++++++++++++++++++++
lib/lz4/lz4defs.h | 93 ++++++++++
scripts/Makefile.lib | 5 +
usr/Kconfig | 9 +
20 files changed, 739 insertions(+), 3 deletions(-)
create mode 100644 arch/arm/boot/compressed/piggy.lz4.S
create mode 100644 include/linux/decompress/unlz4.h
create mode 100644 include/linux/lz4.h
create mode 100644 lib/decompress_unlz4.c
create mode 100644 lib/lz4/Makefile
create mode 100644 lib/lz4/lz4_decompress.c
create mode 100644 lib/lz4/lz4defs.h

--
Markus Oberhumer, <markus@...>, http://www.oberhumer.com/


Re: [RFC PATCH v2 2/4] lib: Add support for LZ4-compressed kernel

David Sterba <dsterba@...>
 

On Tue, Feb 26, 2013 at 03:24:28PM +0900, Kyungsik Lee wrote:
+config KERNEL_LZ4
+ bool "LZ4"
+ depends on HAVE_KERNEL_LZ4
+ help
+ Its compression ratio is worse than LZO. The size of the kernel
+ is about 8% bigger than LZO. But the decompression speed is
+ faster than LZO.
+
Can you please add a sentence what lz4 actually is before you start
comparing it with the current competitor(s)?

--- /dev/null
+++ b/lib/decompress_unlz4.c
@@ -0,0 +1,190 @@
+#define LZ4_CHUNK_SIZE (8<<20)
Please use a less cryptic way of representing a value of 8 MB. Also,
this is a hardcoded value that must be used on the compressor side. It's
an upper limit, so anything below 8MB does not break decompresssion, but
this must be somehow checked or saved along in the binary stream. You
seem to use the lz4demo.c on the userspace side for compression, but
this is not a standard tool nor the output format is well-defined or
stabilized.

For proper use I would like to see a commandline tool similar to
gzip/bzip2/lzop that can be packaged and shipped by distros, and the
output format defintion.

Yann has some ideas for the format
http://fastcompression.blogspot.cz/2012/04/file-container-format-for-lz4.html

For kernel, the minimum of meta information is total compressed length,
total uncompressed length and chunk size. I don't know if the first two
aren't stored elsewhere in the generic kernel image headers, but chunk
size must be specified.


david


Re: [RFC PATCH v2 1/4] decompressor: Add LZ4 decompressor module

David Sterba <dsterba@...>
 

On Tue, Feb 26, 2013 at 03:24:27PM +0900, Kyungsik Lee wrote:
This patch adds support for LZ4 decompression in the Linux Kernel.
LZ4 Decompression APIs for kernel are based on LZ4 implementation
by Yann Collet.

LZ4 homepage : http://fastcompression.blogspot.com/p/lz4.html
LZ4 source repository : http://code.google.com/p/lz4/
What SVN version did you use?

--- /dev/null
+++ b/include/linux/lz4.h
@@ -0,0 +1,48 @@
+#ifndef __LZ4_H__
+#define __LZ4_H__
+/*
+ * LZ4 Kernel Interface
+ *
+ * Copyright (C) 2013, LG Electronics, Kyungsik Lee <kyungsik.lee@...>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * LZ4_COMPRESSBOUND()
+ * Provides the maximum size that LZ4 may output in a "worst case" scenario
+ * (input data not compressible)
+ */
+#define LZ4_COMPRESSBOUND(isize) (isize + ((isize)/255) + 16)
For safety reasons I suggest to add a temporary variable to avoid double
evaluation of isize.

--- /dev/null
+++ b/lib/lz4/lz4_decompress.c
+ }
+ cpy = op + length - (STEPSIZE - 4);
+ if (cpy > oend - COPYLENGTH) {
+
+ /* Error: request to write beyond destination buffer */
+ if (cpy > oend)
+ goto _output_error;
+ LZ4_SECURECOPY(ref, op, (oend - COPYLENGTH));
+ while (op < cpy)
+ *op++ = *ref++;
+ op = cpy;
+ /*
+ * Check EOF (should never happen, since last 5 bytes
+ * are supposed to be literals)
+ */
+ if (op == oend)
+ goto _output_error;
+ continue;
+ }
+ LZ4_SECURECOPY(ref, op, cpy);
+ op = cpy; /* correction */
+ }
Does this compile? The } is an extra one, and does not match the
original sources.

+ /* end of decoding */
+ return (int) (((char *)ip) - source);
+
+ /* write overflow error detected */
+_output_error:
+ return (int) (-(((char *)ip) - source));
+}
+
--- /dev/null
+++ b/lib/lz4/lz4defs.h
@@ -0,0 +1,93 @@
+#define LZ4_COPYSTEP(s, d) \
+ do { \
+ PUT8(s, d); \
+ d += 8; \
+ s += 8; \
+ } while (0)
+
+#define LZ4_COPYPACKET(s, d) LZ4_COPYSTEP(s, d)
+
+#define LZ4_SECURECOPY(s, d, e) \
+ do { \
+ if (d < e) { \
+ LZ4_WILDCOPY(s, d, e); \
+ } \
+ } while (0)
+
+#else /* 32-bit */
+#define STEPSIZE 4
+
+#define LZ4_COPYSTEP(s, d) \
+ do { \
+ PUT4(s, d); \
+ d += 4; \
+ s += 4; \
+ } while (0)
+
+#define LZ4_COPYPACKET(s, d) \
+ do { \
+ LZ4_COPYSTEP(s, d); \
+ LZ4_COPYSTEP(s, d); \
+ } while (0)
+
+#define LZ4_SECURECOPY LZ4_WILDCOPY
+#endif
+
+#define LZ4_READ_LITTLEENDIAN_16(d, s, p) \
+ (d = s - get_unaligned_le16(p))
+#define LZ4_WILDCOPY(s, d, e) \
+ do { \
+ LZ4_COPYPACKET(s, d); \
+ } while (d < e)
All the \ at the ends of lines would look better aligned in one column.

david


[RFC PATCH v2 4/4] x86: Add support for LZ4-compressed kernel

Kyungsik Lee <kyungsik.lee@...>
 

This patch integrates the LZ4 decompression code to the x86 pre-boot code.
And it depends on two patchs below

lib: Add support for LZ4-compressed kernel
decompressor: Add LZ4 decompressor module

Signed-off-by: Kyungsik Lee <kyungsik.lee@...>
---
arch/x86/Kconfig | 1 +
arch/x86/boot/compressed/Makefile | 5 ++++-
arch/x86/boot/compressed/misc.c | 4 ++++
3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 225543b..ab916fd 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -63,6 +63,7 @@ config X86
select HAVE_KERNEL_LZMA
select HAVE_KERNEL_XZ
select HAVE_KERNEL_LZO
+ select HAVE_KERNEL_LZ4
select HAVE_HW_BREAKPOINT
select HAVE_MIXED_BREAKPOINTS_REGS
select PERF_EVENTS
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 8a84501..c275db5 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -4,7 +4,7 @@
# create a compressed vmlinux image from the original vmlinux
#

-targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma vmlinux.bin.xz vmlinux.bin.lzo head_$(BITS).o misc.o string.o cmdline.o early_serial_console.o piggy.o
+targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4 head_$(BITS).o misc.o string.o cmdline.o early_serial_console.o piggy.o

KBUILD_CFLAGS := -m$(BITS) -D__KERNEL__ $(LINUX_INCLUDE) -O2
KBUILD_CFLAGS += -fno-strict-aliasing -fPIC
@@ -64,12 +64,15 @@ $(obj)/vmlinux.bin.xz: $(vmlinux.bin.all-y) FORCE
$(call if_changed,xzkern)
$(obj)/vmlinux.bin.lzo: $(vmlinux.bin.all-y) FORCE
$(call if_changed,lzo)
+$(obj)/vmlinux.bin.lz4: $(vmlinux.bin.all-y) FORCE
+ $(call if_changed,lz4)

suffix-$(CONFIG_KERNEL_GZIP) := gz
suffix-$(CONFIG_KERNEL_BZIP2) := bz2
suffix-$(CONFIG_KERNEL_LZMA) := lzma
suffix-$(CONFIG_KERNEL_XZ) := xz
suffix-$(CONFIG_KERNEL_LZO) := lzo
+suffix-$(CONFIG_KERNEL_LZ4) := lz4

quiet_cmd_mkpiggy = MKPIGGY $@
cmd_mkpiggy = $(obj)/mkpiggy $< > $@ || ( rm -f $@ ; false )
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 88f7ff6..166a0a8 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -145,6 +145,10 @@ static int lines, cols;
#include "../../../../lib/decompress_unlzo.c"
#endif

+#ifdef CONFIG_KERNEL_LZ4
+#include "../../../../lib/decompress_unlz4.c"
+#endif
+
static void scroll(void)
{
int i;
--
1.8.1.1


[RFC PATCH v2 3/4] arm: Add support for LZ4-compressed kernel

Kyungsik Lee <kyungsik.lee@...>
 

This patch integrates the LZ4 decompression code to the arm pre-boot code.
And it depends on two patchs below

lib: Add support for LZ4-compressed kernel
decompressor: Add LZ4 decompressor module

Signed-off-by: Kyungsik Lee <kyungsik.lee@...>

v2:
- Apply CFLAGS, -Os to decompress.o to improve decompress
performance during boot-up process
---
arch/arm/Kconfig | 1 +
arch/arm/boot/compressed/.gitignore | 1 +
arch/arm/boot/compressed/Makefile | 6 +++++-
arch/arm/boot/compressed/decompress.c | 4 ++++
arch/arm/boot/compressed/piggy.lz4.S | 6 ++++++
5 files changed, 17 insertions(+), 1 deletion(-)
create mode 100644 arch/arm/boot/compressed/piggy.lz4.S

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 67874b8..0f9b363 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -38,6 +38,7 @@ config ARM
select HAVE_IDE if PCI || ISA || PCMCIA
select HAVE_IRQ_WORK
select HAVE_KERNEL_GZIP
+ select HAVE_KERNEL_LZ4
select HAVE_KERNEL_LZMA
select HAVE_KERNEL_LZO
select HAVE_KERNEL_XZ
diff --git a/arch/arm/boot/compressed/.gitignore b/arch/arm/boot/compressed/.gitignore
index f79a08e..47279aa 100644
--- a/arch/arm/boot/compressed/.gitignore
+++ b/arch/arm/boot/compressed/.gitignore
@@ -6,6 +6,7 @@ piggy.gzip
piggy.lzo
piggy.lzma
piggy.xzkern
+piggy.lz4
vmlinux
vmlinux.lds

diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 5cad8a6..2249d52 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -24,6 +24,9 @@ endif
AFLAGS_head.o += -DTEXT_OFFSET=$(TEXT_OFFSET)
HEAD = head.o
OBJS += misc.o decompress.o
+ifeq ($(CONFIG_KERNEL_LZ4),y)
+CFLAGS_decompress.o := -Os
+endif
FONTC = $(srctree)/drivers/video/console/font_acorn_8x8.c

# string library code (-Os is enforced to keep it much smaller)
@@ -88,6 +91,7 @@ suffix_$(CONFIG_KERNEL_GZIP) = gzip
suffix_$(CONFIG_KERNEL_LZO) = lzo
suffix_$(CONFIG_KERNEL_LZMA) = lzma
suffix_$(CONFIG_KERNEL_XZ) = xzkern
+suffix_$(CONFIG_KERNEL_LZ4) = lz4

# Borrowed libfdt files for the ATAG compatibility mode

@@ -112,7 +116,7 @@ targets := vmlinux vmlinux.lds \
font.o font.c head.o misc.o $(OBJS)

# Make sure files are removed during clean
-extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern \
+extra-y += piggy.gzip piggy.lzo piggy.lzma piggy.xzkern piggy.lz4 \
lib1funcs.S ashldi3.S $(libfdt) $(libfdt_hdrs)

ifeq ($(CONFIG_FUNCTION_TRACER),y)
diff --git a/arch/arm/boot/compressed/decompress.c b/arch/arm/boot/compressed/decompress.c
index 9deb56a..a95f071 100644
--- a/arch/arm/boot/compressed/decompress.c
+++ b/arch/arm/boot/compressed/decompress.c
@@ -53,6 +53,10 @@ extern char * strstr(const char * s1, const char *s2);
#include "../../../../lib/decompress_unxz.c"
#endif

+#ifdef CONFIG_KERNEL_LZ4
+#include "../../../../lib/decompress_unlz4.c"
+#endif
+
int do_decompress(u8 *input, int len, u8 *output, void (*error)(char *x))
{
return decompress(input, len, NULL, NULL, output, NULL, error);
diff --git a/arch/arm/boot/compressed/piggy.lz4.S b/arch/arm/boot/compressed/piggy.lz4.S
new file mode 100644
index 0000000..3d9a575
--- /dev/null
+++ b/arch/arm/boot/compressed/piggy.lz4.S
@@ -0,0 +1,6 @@
+ .section .piggydata,#alloc
+ .globl input_data
+input_data:
+ .incbin "arch/arm/boot/compressed/piggy.lz4"
+ .globl input_data_end
+input_data_end:
--
1.8.1.1


[RFC PATCH v2 2/4] lib: Add support for LZ4-compressed kernel

Kyungsik Lee <kyungsik.lee@...>
 

This patch adds support for extracting LZ4-compressed kernel images,
as well as LZ4-compressed ramdisk images in the kernel boot process.

This depends on the patch below
decompressor: Add LZ4 decompressor module

Signed-off-by: Kyungsik Lee <kyungsik.lee@...>

v2:
- Clean up code
- Use lz4_decompress() for LZ4-compressed kernel during boot-process
---
include/linux/decompress/unlz4.h | 10 +++
init/Kconfig | 13 ++-
lib/Kconfig | 7 ++
lib/Makefile | 2 +
lib/decompress.c | 5 ++
lib/decompress_unlz4.c | 190 +++++++++++++++++++++++++++++++++++++++
lib/lz4/Makefile | 1 +
lib/lz4/lz4_decompress.c | 2 +-
scripts/Makefile.lib | 5 ++
usr/Kconfig | 9 ++
10 files changed, 242 insertions(+), 2 deletions(-)
create mode 100644 include/linux/decompress/unlz4.h
create mode 100644 lib/decompress_unlz4.c
create mode 100644 lib/lz4/Makefile

diff --git a/include/linux/decompress/unlz4.h b/include/linux/decompress/unlz4.h
new file mode 100644
index 0000000..d5b68bf
--- /dev/null
+++ b/include/linux/decompress/unlz4.h
@@ -0,0 +1,10 @@
+#ifndef DECOMPRESS_UNLZ4_H
+#define DECOMPRESS_UNLZ4_H
+
+int unlz4(unsigned char *inbuf, int len,
+ int(*fill)(void*, unsigned int),
+ int(*flush)(void*, unsigned int),
+ unsigned char *output,
+ int *pos,
+ void(*error)(char *x));
+#endif
diff --git a/init/Kconfig b/init/Kconfig
index be8b7f5..86dc67c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -133,10 +133,13 @@ config HAVE_KERNEL_XZ
config HAVE_KERNEL_LZO
bool

+config HAVE_KERNEL_LZ4
+ bool
+
choice
prompt "Kernel compression mode"
default KERNEL_GZIP
- depends on HAVE_KERNEL_GZIP || HAVE_KERNEL_BZIP2 || HAVE_KERNEL_LZMA || HAVE_KERNEL_XZ || HAVE_KERNEL_LZO
+ depends on HAVE_KERNEL_GZIP || HAVE_KERNEL_BZIP2 || HAVE_KERNEL_LZMA || HAVE_KERNEL_XZ || HAVE_KERNEL_LZO || HAVE_KERNEL_LZ4
help
The linux kernel is a kind of self-extracting executable.
Several compression algorithms are available, which differ
@@ -203,6 +206,14 @@ config KERNEL_LZO
size is about 10% bigger than gzip; however its speed
(both compression and decompression) is the fastest.

+config KERNEL_LZ4
+ bool "LZ4"
+ depends on HAVE_KERNEL_LZ4
+ help
+ Its compression ratio is worse than LZO. The size of the kernel
+ is about 8% bigger than LZO. But the decompression speed is
+ faster than LZO.
+
endchoice

config DEFAULT_HOSTNAME
diff --git a/lib/Kconfig b/lib/Kconfig
index 75cdb77..b108047 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -189,6 +189,9 @@ config LZO_COMPRESS
config LZO_DECOMPRESS
tristate

+config LZ4_DECOMPRESS
+ tristate
+
source "lib/xz/Kconfig"

#
@@ -213,6 +216,10 @@ config DECOMPRESS_LZO
select LZO_DECOMPRESS
tristate

+config DECOMPRESS_LZ4
+ select LZ4_DECOMPRESS
+ tristate
+
#
# Generic allocator support is selected if needed
#
diff --git a/lib/Makefile b/lib/Makefile
index 02ed6c0..c2073bf 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -72,6 +72,7 @@ obj-$(CONFIG_REED_SOLOMON) += reed_solomon/
obj-$(CONFIG_BCH) += bch.o
obj-$(CONFIG_LZO_COMPRESS) += lzo/
obj-$(CONFIG_LZO_DECOMPRESS) += lzo/
+obj-$(CONFIG_LZ4_DECOMPRESS) += lz4/
obj-$(CONFIG_XZ_DEC) += xz/
obj-$(CONFIG_RAID6_PQ) += raid6/

@@ -80,6 +81,7 @@ lib-$(CONFIG_DECOMPRESS_BZIP2) += decompress_bunzip2.o
lib-$(CONFIG_DECOMPRESS_LZMA) += decompress_unlzma.o
lib-$(CONFIG_DECOMPRESS_XZ) += decompress_unxz.o
lib-$(CONFIG_DECOMPRESS_LZO) += decompress_unlzo.o
+lib-$(CONFIG_DECOMPRESS_LZ4) += decompress_unlz4.o

obj-$(CONFIG_TEXTSEARCH) += textsearch.o
obj-$(CONFIG_TEXTSEARCH_KMP) += ts_kmp.o
diff --git a/lib/decompress.c b/lib/decompress.c
index 31a8042..c70810e 100644
--- a/lib/decompress.c
+++ b/lib/decompress.c
@@ -11,6 +11,7 @@
#include <linux/decompress/unxz.h>
#include <linux/decompress/inflate.h>
#include <linux/decompress/unlzo.h>
+#include <linux/decompress/unlz4.h>

#include <linux/types.h>
#include <linux/string.h>
@@ -31,6 +32,9 @@
#ifndef CONFIG_DECOMPRESS_LZO
# define unlzo NULL
#endif
+#ifndef CONFIG_DECOMPRESS_LZ4
+# define unlz4 NULL
+#endif

struct compress_format {
unsigned char magic[2];
@@ -45,6 +49,7 @@ static const struct compress_format compressed_formats[] __initdata = {
{ {0x5d, 0x00}, "lzma", unlzma },
{ {0xfd, 0x37}, "xz", unxz },
{ {0x89, 0x4c}, "lzo", unlzo },
+ { {0x02, 0x21}, "lz4", unlz4 },
{ {0, 0}, NULL, NULL }
};

diff --git a/lib/decompress_unlz4.c b/lib/decompress_unlz4.c
new file mode 100644
index 0000000..84346c4
--- /dev/null
+++ b/lib/decompress_unlz4.c
@@ -0,0 +1,190 @@
+/*
+ * Wrapper for decompressing LZ4-compressed kernel, initramfs, and initrd
+ *
+ * Copyright (C) 2013, LG Electronics, Kyungsik Lee <kyungsik.lee@...>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#ifdef STATIC
+#define PREBOOT
+#include "lz4/lz4_decompress.c"
+#else
+#include <linux/decompress/unlz4.h>
+#endif
+#include <linux/types.h>
+#include <linux/lz4.h>
+#include <linux/decompress/mm.h>
+#include <linux/compiler.h>
+
+#include <asm/unaligned.h>
+
+
+#define LZ4_CHUNK_SIZE (8<<20)
+#define ARCHIVE_MAGICNUMBER 0x184C2102
+
+STATIC inline int INIT unlz4(u8 *input, int in_len,
+ int (*fill) (void *, unsigned int),
+ int (*flush) (void *, unsigned int),
+ u8 *output, int *posp,
+ void (*error) (char *x))
+{
+ int ret = -1;
+ size_t chunksize = 0;
+ u8 *inp;
+ u8 *inp_start;
+ u8 *outp;
+ int size = in_len;
+#ifdef PREBOOT
+ size_t out_len = get_unaligned_le32(input + in_len);
+#endif
+ size_t dest_len;
+
+
+ if (output) {
+ outp = output;
+ } else if (!flush) {
+ error("NULL output pointer and no flush function provided");
+ goto exit_0;
+ } else {
+ outp = large_malloc(LZ4_CHUNK_SIZE);
+ if (!outp) {
+ error("Could not allocate output buffer");
+ goto exit_0;
+ }
+ }
+
+ if (input && fill) {
+ error("Both input pointer and fill function provided,");
+ goto exit_1;
+ } else if (input) {
+ inp = input;
+ } else if (!fill) {
+ error("NULL input pointer and missing fill function");
+ goto exit_1;
+ } else {
+ inp = large_malloc(LZ4_COMPRESSBOUND(LZ4_CHUNK_SIZE));
+ if (!inp) {
+ error("Could not allocate input buffer");
+ goto exit_1;
+ }
+ }
+ inp_start = inp;
+
+ if (posp)
+ *posp = 0;
+
+ if (fill)
+ fill(inp, 4);
+
+ chunksize = get_unaligned_le32(inp);
+ if (chunksize == ARCHIVE_MAGICNUMBER) {
+ inp += 4;
+ size -= 4;
+ } else {
+ error("invalid header");
+ goto exit_2;
+ }
+
+ if (posp)
+ *posp += 4;
+
+ for (;;) {
+
+ if (fill)
+ fill(inp, 4);
+
+ chunksize = get_unaligned_le32(inp);
+ if (chunksize == ARCHIVE_MAGICNUMBER) {
+ inp += 4;
+ size -= 4;
+ if (posp)
+ *posp += 4;
+ continue;
+ }
+ inp += 4;
+ size -= 4;
+
+ if (posp)
+ *posp += 4;
+
+ if (fill) {
+ if (chunksize > LZ4_COMPRESSBOUND(LZ4_CHUNK_SIZE)) {
+ error("chunk length is longer than allocated");
+ goto exit_2;
+ }
+ fill(inp, chunksize);
+ }
+#ifdef PREBOOT
+ if (out_len >= LZ4_CHUNK_SIZE) {
+ dest_len = LZ4_CHUNK_SIZE;
+ out_len -= dest_len;
+ } else
+ dest_len = out_len;
+ ret = lz4_decompress(inp, &chunksize, outp, dest_len);
+#else
+ dest_len = LZ4_CHUNK_SIZE;
+ ret = lz4_decompress_unknownoutputsize(inp, chunksize, outp,
+ &dest_len);
+#endif
+ if (ret < 0) {
+ error("Decoding failed");
+ goto exit_2;
+ }
+
+ if (flush && flush(outp, dest_len) != dest_len)
+ goto exit_2;
+ if (output)
+ outp += dest_len;
+ if (posp)
+ *posp += chunksize;
+
+ size -= chunksize;
+
+ if (size == 0)
+ break;
+ else if (size < 0) {
+ error("data corrupted");
+ goto exit_2;
+ }
+
+ inp += chunksize;
+ if (fill)
+ inp = inp_start;
+ }
+
+ ret = 0;
+exit_2:
+ if (!input)
+ large_free(inp_start);
+exit_1:
+ if (!output)
+ large_free(outp);
+exit_0:
+ return ret;
+}
+
+#ifdef PREBOOT
+STATIC int INIT decompress(unsigned char *buf, int in_len,
+ int(*fill)(void*, unsigned int),
+ int(*flush)(void*, unsigned int),
+ unsigned char *output,
+ int *posp,
+ void(*error)(char *x)
+ )
+{
+ return unlz4(buf, in_len - 4, fill, flush, output, posp, error);
+}
+#endif
diff --git a/lib/lz4/Makefile b/lib/lz4/Makefile
new file mode 100644
index 0000000..7f548c6
--- /dev/null
+++ b/lib/lz4/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_LZ4_DECOMPRESS) += lz4_decompress.o
diff --git a/lib/lz4/lz4_decompress.c b/lib/lz4/lz4_decompress.c
index 1998d7a..f58eaca 100644
--- a/lib/lz4/lz4_decompress.c
+++ b/lib/lz4/lz4_decompress.c
@@ -1,7 +1,7 @@
/*
* LZ4 Decompressor for Linux kernel
*
- * Copyright (C) 2013 LG Electronics Co., Ltd. (http://www.lge.com/)
+ * Copyright (C) 2013, LG Electronics, Kyungsik Lee <kyungsik.lee@...>
*
* Based on LZ4 implementation by Yann Collet.
*
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index bdf42fd..9293ca1 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -307,6 +307,11 @@ cmd_lzo = (cat $(filter-out FORCE,$^) | \
lzop -9 && $(call size_append, $(filter-out FORCE,$^))) > $@ || \
(rm -f $@ ; false)

+quiet_cmd_lz4 = LZ4 $@
+cmd_lz4 = (cat $(filter-out FORCE,$^) | \
+ lz4 -c1 stdin stdout && $(call size_append, $(filter-out FORCE,$^))) > $@ || \
+ (rm -f $@ ; false)
+
# U-Boot mkimage
# ---------------------------------------------------------------------------

diff --git a/usr/Kconfig b/usr/Kconfig
index 085872b..642f503 100644
--- a/usr/Kconfig
+++ b/usr/Kconfig
@@ -90,6 +90,15 @@ config RD_LZO
Support loading of a LZO encoded initial ramdisk or cpio buffer
If unsure, say N.

+config RD_LZ4
+ bool "Support initial ramdisks compressed using LZ4" if EXPERT
+ default !EXPERT
+ depends on BLK_DEV_INITRD
+ select DECOMPRESS_LZ4
+ help
+ Support loading of a LZ4 encoded initial ramdisk or cpio buffer
+ If unsure, say N.
+
choice
prompt "Built-in initramfs compression mode" if INITRAMFS_SOURCE!=""
help
--
1.8.1.1


[RFC PATCH v2 1/4] decompressor: Add LZ4 decompressor module

Kyungsik Lee <kyungsik.lee@...>
 

This patch adds support for LZ4 decompression in the Linux Kernel.
LZ4 Decompression APIs for kernel are based on LZ4 implementation
by Yann Collet.

LZ4 homepage : http://fastcompression.blogspot.com/p/lz4.html
LZ4 source repository : http://code.google.com/p/lz4/

Signed-off-by: Kyungsik Lee <kyungsik.lee@...>

v2:
- Clean up code
- Enable unaligned access for ARM v6 and above with
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
- Add lz4_decompress() for faster decompression with
uncompressed output size
---
include/linux/lz4.h | 48 +++++++
lib/lz4/lz4_decompress.c | 331 +++++++++++++++++++++++++++++++++++++++++++++++
lib/lz4/lz4defs.h | 93 +++++++++++++
3 files changed, 472 insertions(+)
create mode 100644 include/linux/lz4.h
create mode 100644 lib/lz4/lz4_decompress.c
create mode 100644 lib/lz4/lz4defs.h

diff --git a/include/linux/lz4.h b/include/linux/lz4.h
new file mode 100644
index 0000000..66b504c
--- /dev/null
+++ b/include/linux/lz4.h
@@ -0,0 +1,48 @@
+#ifndef __LZ4_H__
+#define __LZ4_H__
+/*
+ * LZ4 Kernel Interface
+ *
+ * Copyright (C) 2013, LG Electronics, Kyungsik Lee <kyungsik.lee@...>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * LZ4_COMPRESSBOUND()
+ * Provides the maximum size that LZ4 may output in a "worst case" scenario
+ * (input data not compressible)
+ */
+#define LZ4_COMPRESSBOUND(isize) (isize + ((isize)/255) + 16)
+
+/*
+ * lz4_decompress()
+ * src : source address of the compressed data
+ * src_len : is the input size, whcih is returned after decompress done
+ * dest : output buffer address of the decompressed data
+ * actual_dest_len: is the size of uncompressed data, supposing it's known
+ * return : Success if return 0
+ * Error if return (< 0)
+ * note : Destination buffer must be already allocated.
+ * a bit faster than lz4_decompress_unknownoutputsize()
+ */
+int lz4_decompress(const char *src, size_t *src_len, char *dest,
+ size_t actual_dest_len);
+
+/*
+ * lz4_decompress_unknownoutputsize()
+ * src : source address of the compressed data
+ * src_len : is the input size, therefore the compressed size
+ * dest : output buffer address of the decompressed data
+ * dest_len: is the max size of the destination buffer, which is
+ * returned with actual size of decompressed data after
+ * decompress done
+ * return : Success if return 0
+ * Error if return (< 0)
+ * note : Destination buffer must be already allocated.
+ */
+int lz4_decompress_unknownoutputsize(const char *src, size_t src_len,
+ char *dest, size_t *dest_len);
+#endif
diff --git a/lib/lz4/lz4_decompress.c b/lib/lz4/lz4_decompress.c
new file mode 100644
index 0000000..1998d7a
--- /dev/null
+++ b/lib/lz4/lz4_decompress.c
@@ -0,0 +1,331 @@
+/*
+ * LZ4 Decompressor for Linux kernel
+ *
+ * Copyright (C) 2013 LG Electronics Co., Ltd. (http://www.lge.com/)
+ *
+ * Based on LZ4 implementation by Yann Collet.
+ *
+ * LZ4 - Fast LZ compression algorithm
+ * Copyright (C) 2011-2012, Yann Collet.
+ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following disclaimer
+ * in the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * You can contact the author at :
+ * - LZ4 homepage : http://fastcompression.blogspot.com/p/lz4.html
+ * - LZ4 source repository : http://code.google.com/p/lz4/
+ */
+
+#ifndef STATIC
+#include <linux/module.h>
+#include <linux/kernel.h>
+#endif
+#include <linux/lz4.h>
+
+#include <asm/unaligned.h>
+
+#include "lz4defs.h"
+
+static int lz4_uncompress(const char *source, char *dest, int osize)
+{
+ const BYTE *ip = (const BYTE *) source;
+ const BYTE *ref;
+
+ BYTE *op = (BYTE *) dest;
+ BYTE * const oend = op + osize;
+ BYTE *cpy;
+
+ unsigned token;
+
+ size_t length;
+ size_t dec32table[] = {0, 3, 2, 3, 0, 0, 0, 0};
+#if LZ4_ARCH64
+ size_t dec64table[] = {0, 0, 0, -1, 0, 1, 2, 3};
+#endif
+
+ while (1) {
+
+ /* get runlength */
+ token = *ip++;
+ length = (token >> ML_BITS);
+ if (length == RUN_MASK) {
+ size_t len;
+
+ /* for (; (len = *ip++) == 255; length += 255){} */
+ len = *ip++;
+ for (; len == 255; length += 255)
+ len = *ip++;
+ length += len;
+ }
+
+ /* copy literals */
+ cpy = op + length;
+ if (unlikely(cpy > oend - COPYLENGTH)) {
+
+ /*
+ * Error: not enough place for another match
+ * (min 4) + 5 literals
+ */
+ if (cpy != oend)
+ goto _output_error;
+
+ memcpy(op, ip, length);
+ ip += length;
+ break; /* EOF */
+ }
+ LZ4_WILDCOPY(ip, op, cpy);
+ ip -= (op - cpy);
+ op = cpy;
+
+ /* get offset */
+ LZ4_READ_LITTLEENDIAN_16(ref, cpy, ip);
+ ip += 2;
+
+ /* Error: offset create reference outside destination buffer */
+ if (unlikely(ref < (BYTE *const)dest))
+ goto _output_error;
+
+ /* get matchlength */
+ length = token & ML_MASK;
+ if (length == ML_MASK) {
+ for (; *ip == 255; length += 255)
+ ip++;
+ length += *ip++;
+ }
+
+ /* copy repeated sequence */
+ if (unlikely((op - ref) < STEPSIZE)) {
+#if LZ4_ARCH64
+ size_t dec64 = dec64table[op - ref];
+#else
+ const int dec64 = 0;
+#endif
+ op[0] = ref[0];
+ op[1] = ref[1];
+ op[2] = ref[2];
+ op[3] = ref[3];
+ op += 4;
+ ref += 4;
+ ref -= dec32table[op-ref];
+ PUT4(ref, op);
+ op += STEPSIZE - 4;
+ ref -= dec64;
+ } else {
+ LZ4_COPYSTEP(ref, op);
+ }
+ cpy = op + length - (STEPSIZE - 4);
+ if (cpy > oend - COPYLENGTH) {
+
+ /* Error: request to write beyond destination buffer */
+ if (cpy > oend)
+ goto _output_error;
+ LZ4_SECURECOPY(ref, op, (oend - COPYLENGTH));
+ while (op < cpy)
+ *op++ = *ref++;
+ op = cpy;
+ /*
+ * Check EOF (should never happen, since last 5 bytes
+ * are supposed to be literals)
+ */
+ if (op == oend)
+ goto _output_error;
+ continue;
+ }
+ LZ4_SECURECOPY(ref, op, cpy);
+ op = cpy; /* correction */
+ }
+ /* end of decoding */
+ return (int) (((char *)ip) - source);
+
+ /* write overflow error detected */
+_output_error:
+ return (int) (-(((char *)ip) - source));
+}
+
+static int lz4_uncompress_unknownoutputsize(const char *source, char *dest,
+ int isize, size_t maxoutputsize)
+{
+ const BYTE *ip = (const BYTE *) source;
+ const BYTE *const iend = ip + isize;
+ const BYTE *ref;
+
+
+ BYTE *op = (BYTE *) dest;
+ BYTE * const oend = op + maxoutputsize;
+ BYTE *cpy;
+
+ size_t dec32table[] = {0, 3, 2, 3, 0, 0, 0, 0};
+#if LZ4_ARCH64
+ size_t dec64table[] = {0, 0, 0, -1, 0, 1, 2, 3};
+#endif
+
+ /* Main Loop */
+ while (ip < iend) {
+
+ unsigned token;
+ size_t length;
+
+ /* get runlength */
+ token = *ip++;
+ length = (token >> ML_BITS);
+ if (length == RUN_MASK) {
+ int s = 255;
+ while ((ip < iend) && (s == 255)) {
+ s = *ip++;
+ length += s;
+ }
+ }
+ /* copy literals */
+ cpy = op + length;
+ if ((cpy > oend - COPYLENGTH) ||
+ (ip + length > iend - COPYLENGTH)) {
+
+ if (cpy > oend)
+ goto _output_error;/* writes beyond buffer */
+
+ if (ip + length != iend)
+ goto _output_error;/*
+ * Error: LZ4 format requires
+ * to consume all input
+ * at this stage
+ */
+ memcpy(op, ip, length);
+ op += length;
+ break;/* Necessarily EOF, due to parsing restrictions */
+ }
+ LZ4_WILDCOPY(ip, op, cpy);
+ ip -= (op - cpy);
+ op = cpy;
+
+ /* get offset */
+ LZ4_READ_LITTLEENDIAN_16(ref, cpy, ip);
+ ip += 2;
+ if (ref < (BYTE * const)dest)
+ goto _output_error;
+ /*
+ * Error : offset creates reference
+ * outside of destination buffer
+ */
+
+ /* get matchlength */
+ length = (token & ML_MASK);
+ if (length == ML_MASK) {
+ while (ip < iend) {
+ int s = *ip++;
+ length += s;
+ if (s == 255)
+ continue;
+ break;
+ }
+ }
+
+ /* copy repeated sequence */
+ if (unlikely(op - ref < STEPSIZE)) {
+#if LZ4_ARCH64
+ size_t dec64 = dec64table[op - ref];
+#else
+ const int dec64 = 0;
+#endif
+ op[0] = ref[0];
+ op[1] = ref[1];
+ op[2] = ref[2];
+ op[3] = ref[3];
+ op += 4;
+ ref += 4;
+ ref -= dec32table[op - ref];
+ PUT4(ref, op);
+ op += STEPSIZE - 4;
+ ref -= dec64;
+ } else {
+ LZ4_COPYSTEP(ref, op);
+ }
+ cpy = op + length - (STEPSIZE-4);
+ if (cpy > oend - COPYLENGTH) {
+ if (cpy > oend)
+ goto _output_error; /* write outside of buf */
+
+ LZ4_SECURECOPY(ref, op, (oend - COPYLENGTH));
+ while (op < cpy)
+ *op++ = *ref++;
+ op = cpy;
+ /*
+ * Check EOF (should never happen, since last 5 bytes
+ * are supposed to be literals)
+ */
+ if (op == oend)
+ goto _output_error;
+ continue;
+ }
+ LZ4_SECURECOPY(ref, op, cpy);
+ op = cpy; /* correction */
+ }
+ /* end of decoding */
+ return (int) (((char *)op) - dest);
+
+ /* write overflow error detected */
+_output_error:
+ return (int) (-(((char *)ip) - source));
+}
+
+int lz4_decompress(const char *src, size_t *src_len, char *dest,
+ size_t actual_dest_len)
+{
+ int ret = -1;
+ int input_len = 0;
+
+ input_len = lz4_uncompress(src, dest, actual_dest_len);
+ if (input_len < 0)
+ goto exit_0;
+ *src_len = input_len;
+
+ return 0;
+exit_0:
+ return ret;
+}
+#ifndef STATIC
+EXPORT_SYMBOL_GPL(lz4_decompress);
+#endif
+
+int lz4_decompress_unknownoutputsize(const char *src, size_t src_len,
+ char *dest, size_t *dest_len)
+{
+ int ret = -1;
+ int out_len = 0;
+
+ out_len = lz4_uncompress_unknownoutputsize(src, dest, src_len,
+ *dest_len);
+ if (out_len < 0)
+ goto exit_0;
+ *dest_len = out_len;
+
+ return 0;
+exit_0:
+ return ret;
+}
+#ifndef STATIC
+EXPORT_SYMBOL_GPL(lz4_decompress_unknownoutputsize);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("LZ4 Decompressor");
+#endif
diff --git a/lib/lz4/lz4defs.h b/lib/lz4/lz4defs.h
new file mode 100644
index 0000000..fde76e6
--- /dev/null
+++ b/lib/lz4/lz4defs.h
@@ -0,0 +1,93 @@
+/*
+ * lz4defs.h -- architecture specific defines
+ *
+ * Copyright (C) 2013, LG Electronics, Kyungsik Lee <kyungsik.lee@...>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * Detects 64 bits mode
+ */
+#if (defined(__x86_64__) || defined(__x86_64) || defined(__amd64__) \
+ || defined(__ppc64__) || defined(__LP64__))
+#define LZ4_ARCH64 1
+#else
+#define LZ4_ARCH64 0
+#endif
+
+/*
+ * Architecture-specific macros
+ */
+#define BYTE u8
+#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) || defined(CONFIG_ARM) \
+ && __LINUX_ARM_ARCH__ >= 6 \
+ && defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
+typedef struct _U32_S { u32 v; } U32_S;
+typedef struct _U64_S { u64 v; } U64_S;
+
+#define A32(x) (((U32_S *)(x))->v)
+#define A64(x) (((U64_S *)(x))->v)
+
+#define PUT4(s, d) (A32(d) = A32(s))
+#define PUT8(s, d) (A64(d) = A64(s))
+#else /* CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS */
+
+#define PUT4(s, d) \
+ put_unaligned(get_unaligned((const u32 *) s), (u32 *) d)
+#define PUT8(s, d) \
+ put_unaligned(get_unaligned((const u64 *) s), (u64 *) d)
+#endif
+
+#define COPYLENGTH 8
+#define ML_BITS 4
+#define ML_MASK ((1U << ML_BITS) - 1)
+#define RUN_BITS (8 - ML_BITS)
+#define RUN_MASK ((1U << RUN_BITS) - 1)
+
+#if LZ4_ARCH64/* 64-bit */
+#define STEPSIZE 8
+
+#define LZ4_COPYSTEP(s, d) \
+ do { \
+ PUT8(s, d); \
+ d += 8; \
+ s += 8; \
+ } while (0)
+
+#define LZ4_COPYPACKET(s, d) LZ4_COPYSTEP(s, d)
+
+#define LZ4_SECURECOPY(s, d, e) \
+ do { \
+ if (d < e) { \
+ LZ4_WILDCOPY(s, d, e); \
+ } \
+ } while (0)
+
+#else /* 32-bit */
+#define STEPSIZE 4
+
+#define LZ4_COPYSTEP(s, d) \
+ do { \
+ PUT4(s, d); \
+ d += 4; \
+ s += 4; \
+ } while (0)
+
+#define LZ4_COPYPACKET(s, d) \
+ do { \
+ LZ4_COPYSTEP(s, d); \
+ LZ4_COPYSTEP(s, d); \
+ } while (0)
+
+#define LZ4_SECURECOPY LZ4_WILDCOPY
+#endif
+
+#define LZ4_READ_LITTLEENDIAN_16(d, s, p) \
+ (d = s - get_unaligned_le16(p))
+#define LZ4_WILDCOPY(s, d, e) \
+ do { \
+ LZ4_COPYPACKET(s, d); \
+ } while (d < e)
--
1.8.1.1


[RFC PATCH v2 0/4] Add support for LZ4-compressed kernel

Kyungsik Lee <kyungsik.lee@...>
 

Hi,

First of all, Thank you for the comments and emails from the community.

Here is the second version of support for LZ4-compressed kernel.
In this version, lz4_decompress() has been added. In case of knowing
the uncompressed data size, this function can be used to decompress
more faster.

Through the benchmark, it was found that -Os Compiler flag for
decompress.o brought better decompression performance in most of cases
(ex, different compiler and hardware spec.) in ARM architecture.

Lastly, CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not always the best
option even though it is supported. The decompression speed can be
slightly slower in some cases.

This patchset is based on 3.8.

Any comments are appreciated.

Thanks,
Kyungsik


Benchmark Results(PATCH v2)
Compiler: Linaro ARM gcc 4.6.2
1. ARMv7, 1.5GHz based board
Kernel: linux 3.4
Uncompressed Kernel Size: 14MB
Compressed Size Decompression Speed
LZO 6.7MB 21.1MB/s
LZ4 7.3MB 29.1MB/s, 45.6MB/s(UA)
2. ARMv7, 1.7GHz based board
Kernel: linux 3.7
Uncompressed Kernel Size: 14MB
Compressed Size Decompression Speed
LZO 6.0MB 34.1MB/s
LZ4 6.5MB 86.7MB/s
UA: Unaligned memory Access support


Change log: v2
- Clean up code
- Enable unaligned access for ARM v6 and above with
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
- Add lz4_decompress() for faster decompression with
uncompressed output size
- Use lz4_decompress() for LZ4-compressed kernel during
boot-process
- Apply -Os to decompress.o to improve decompress
performance during boot-up process


Kyungsik Lee (4):
decompressor: Add LZ4 decompressor module
lib: Add support for LZ4-compressed kernel
arm: Add support for LZ4-compressed kernel
x86: Add support for LZ4-compressed kernel

arch/arm/Kconfig | 1 +
arch/arm/boot/compressed/.gitignore | 1 +
arch/arm/boot/compressed/Makefile | 6 +-
arch/arm/boot/compressed/decompress.c | 4 +
arch/arm/boot/compressed/piggy.lz4.S | 6 +
arch/x86/Kconfig | 1 +
arch/x86/boot/compressed/Makefile | 5 +-
arch/x86/boot/compressed/misc.c | 4 +
include/linux/decompress/unlz4.h | 10 +
include/linux/lz4.h | 48 +++++
init/Kconfig | 13 +-
lib/Kconfig | 7 +
lib/Makefile | 2 +
lib/decompress.c | 5 +
lib/decompress_unlz4.c | 190 +++++++++++++++++++
lib/lz4/Makefile | 1 +
lib/lz4/lz4_decompress.c | 331 ++++++++++++++++++++++++++++++++++
lib/lz4/lz4defs.h | 93 ++++++++++
scripts/Makefile.lib | 5 +
usr/Kconfig | 9 +
20 files changed, 739 insertions(+), 3 deletions(-)
create mode 100644 arch/arm/boot/compressed/piggy.lz4.S
create mode 100644 include/linux/decompress/unlz4.h
create mode 100644 include/linux/lz4.h
create mode 100644 lib/decompress_unlz4.c
create mode 100644 lib/lz4/Makefile
create mode 100644 lib/lz4/lz4_decompress.c
create mode 100644 lib/lz4/lz4defs.h

--
1.8.1.1


Re: Final call for ELC registration and showcase proposals

Rob Landley
 

On 02/06/2013 12:08:40 PM, Tim Bird wrote:
Hi everyone,
Embedded Linux Conference is just around the corner -
February 20-22 in San Francisco.
I still can't find a link from the Linux Foundation's CELF website to where I should submit my talk outline. I emailed it to you, but it's still not on
http://events.linuxfoundation.org/events/embedded-linux-conference/slides

Dunno if ascii text qualifies (I never got around to doing pretty slides), but just in case you're interested:

http://landley.net/talks/celf-2013.txt

(Links inline to all the web pages I showed on the screen.)

Just FYI,

Rob


Re: Linux Foundation (CEWG) Japan Jamboree #44

Satoru Ueda <Satoru.Ueda@...>
 

Hi,

Please be reminded that the next Japan Jamboree is scheduled on March 8th
(Japan Time).

http://elinux.org/Japan_Technical_Jamboree_44

If you would like to make presentation remotely, we have Skype access.
To that case, please contact me as soon as possible.


Best,


----- Japanese -----
各位、

次回の日本テクニカルジャンボリーは、既にご案内の通り、来週金曜日、

 3月 8日(金) 午前10時~ 中野サンプラザ

にて開催します。

http://elinux.org/Japan_Technical_Jamboree_44

今月、ELC 2013が開催されます。次のジャンボリーではお土産話がいっぱい
聞けるかと思い楽しみです!

併せて皆さんからの発表の申し込みをお待ちしています。

なお、このイベントは、どなたでも、無料で参加可能です。また、
発表もなるべく多くの方にして頂けるように配慮しております。



上田


(2013/02/05 9:55), Satoru Ueda wrote:

Hi,

The next Japan Technical Jamboree (#44) is scheduled on March 8th,
Friday. It is just 1 month to the date! Please block your schedule!

http://elinux.org/Japan_Technical_Jamboree_44


Best,
S. Ueda

----- JAPANESE -----

各位、

次回の日本テクニカルジャンボリーは、既にご案内の通り、

 3月 8日(金) 午前10時~ 中野サンプラザ

にて開催します。およそ一ヶ月後です。詳細は下記を参照願います。

http://elinux.org/Japan_Technical_Jamboree_44

今月、ELC 2013が開催されます。次のジャンボリーではお土産話がいっぱい
聞けるかと思い楽しみです!

併せて皆さんからの発表の申し込みをお待ちしています。

なお、このイベントは、どなたでも、無料で参加可能です。また、
発表もなるべく多くの方にして頂けるように配慮しております。



上田
--
| TEL: +81-(0)50-3750-3877 FAX: +81-(0)50-3750-6620
| Base System Development dept.
| System and Software Technology Platform, Sony Corp.


Final call for ELC registration and showcase proposals

Tim Bird <tim.bird@...>
 

Hi everyone,

Embedded Linux Conference is just around the corner -
February 20-22 in San Francisco.

We're very excited about our technical content again this year.
We've got keynotes by developers at Google, Intel and SpaceX, talking
about some of the most exciting uses of embedded Linux: self-driving cars,
spacecraft and the challenges of the internet of things.

This is the place to learn more about the industry's leading
initiatives, such as LTSI, the Yocto Project and Open Embedded,
the current status of Linaro's work, and recent advances in flash
filesystems.

Of course there will be detailed presentations on all kinds of
technical areas of kernel and system development, such as:
* graphics and video
* kernel testing, debugging and tools
* board bringup
* board support development and maintenance
* hard and soft realtime
* power management
* security
* memory management
* core libraries (uclibc, eglibc)

Also covered will be diverse topics such as licensing issues
and how to promote your open source project.

We'll have some great "hallway" tracks, including areas for
discussions of specific technical areas - called "Tech Zones".

ELC is the place where you can talk to others with similar
problems and learn how they fixed them. I've saved myself weeks
of development effort with a single nugget of information gleaned
from ELC. You can too!

Please check out the details here:
http://events.linuxfoundation.org/events/embedded-linux-conference

There are still slots open for the technical showcase. If you
have something you want to show others, whether formal or informal,
just bring it to the event! We'll print a poster for you, and you
can get exposure for your project or idea! See
http://elinux.org/ELC_2013_Technical_Showcase to sign up.

I'm looking forward to another great year interacting with
all of you. Please join me in San Francisco in a few weeks.
-- Tim

=============================
Tim Bird
Architecture Group Chair, CE Workgroup of the Linux Foundation
Senior Staff Engineer, Sony Network Entertainment
=============================


Re: [RFC PATCH 0/4] Add support for LZ4-compressed kernels

Johannes Stezenbach <js@...>
 

On Mon, Feb 04, 2013 at 10:50:52AM +0000, Russell King - ARM Linux wrote:
On Mon, Feb 04, 2013 at 03:02:49AM +0100, Markus F.X.J. Oberhumer wrote:
At least akpm did approve the LZO update for inclusion into 3.7, but the code
still has not been merged into the main tree.
> On 2012-10-09 21:26, Andrew Morton wrote:
> [...]
> The changes look OK to me. Please ask Stephen to include the tree in
> linux-next, for a 3.7 merge.

Well, this probably means I have done a rather poor marketing.
I assume this code is sitting in *your* tree? How do you think it gets
into mainline?

There is no automatic way that code from linux-next gets merged into
mainline. That is up to the tree owner to make happen, either by getting
their tree into a parent maintainers tree, or if there is none, asking
Linus to pull your tree at the appropriate time.
My feeling is that in this case it is unneccessarily hard
for an outside contributor to get a patch accepted, all
because get_maintainer.pl doesn't put someone in charge.

Apparently it doesn't work to put all the usual maintainer
responsibilities onto the shoulders of a Linux development novice.
Thus it would be nice if some maintainer would come
forward and offer to handle the patches for Markus.


Thanks,
Johannes


Linux Foundation (CEWG) Japan Jamboree #44

Satoru Ueda <Satoru.Ueda@...>
 

Hi,

The next Japan Technical Jamboree (#44) is scheduled on March 8th,
Friday. It is just 1 month to the date! Please block your schedule!

http://elinux.org/Japan_Technical_Jamboree_44


Best,
S. Ueda

----- JAPANESE -----

各位、

次回の日本テクニカルジャンボリーは、既にご案内の通り、

 3月 8日(金) 午前10時~ 中野サンプラザ

にて開催します。およそ一ヶ月後です。詳細は下記を参照願います。

http://elinux.org/Japan_Technical_Jamboree_44

今月、ELC 2013が開催されます。次のジャンボリーではお土産話がいっぱい
聞けるかと思い楽しみです!

併せて皆さんからの発表の申し込みをお待ちしています。

なお、このイベントは、どなたでも、無料で参加可能です。また、
発表もなるべく多くの方にして頂けるように配慮しております。



上田
--
| TEL: +81-(0)50-3750-3877 FAX: +81-(0)50-3750-6620
| Base System Development dept.
| System and Software Technology Platform, Sony Corp.


Re: [RFC PATCH 0/4] Add support for LZ4-compressed kernels

Russell King - ARM Linux <linux@...>
 

On Mon, Feb 04, 2013 at 03:02:49AM +0100, Markus F.X.J. Oberhumer wrote:
At least akpm did approve the LZO update for inclusion into 3.7, but the code
still has not been merged into the main tree.
> On 2012-10-09 21:26, Andrew Morton wrote:
> [...]
> The changes look OK to me. Please ask Stephen to include the tree in
> linux-next, for a 3.7 merge.

Well, this probably means I have done a rather poor marketing.
I assume this code is sitting in *your* tree? How do you think it gets
into mainline?

There is no automatic way that code from linux-next gets merged into
mainline. That is up to the tree owner to make happen, either by getting
their tree into a parent maintainers tree, or if there is none, asking
Linus to pull your tree at the appropriate time.


Re: [RFC PATCH 0/4] Add support for LZ4-compressed kernels

Markus F.X.J. Oberhumer <markus@...>
 

On 2013-01-30 11:23, Johannes Stezenbach wrote:
On Mon, Jan 28, 2013 at 11:29:14PM -0500, Nicolas Pitre wrote:
On Mon, 28 Jan 2013, Andrew Morton wrote:

On Sat, 26 Jan 2013 14:50:43 +0900
Kyungsik Lee <kyungsik.lee@...> wrote:

This patchset is for supporting LZ4 compressed kernel and initial ramdisk on
the x86 and ARM architectures.

According to http://code.google.com/p/lz4/, LZ4 is a very fast lossless
compression algorithm and also features an extremely fast decoder.

Kernel Decompression APIs are based on implementation by Yann Collet
(http://code.google.com/p/lz4/source/checkout).
De/compression Tools are also provided from the site above.

The initial test result on ARM(v7) based board shows that the size of kernel
with LZ4 compressed is 8% bigger than LZO compressed but the decompressing
speed is faster(especially under the enabled unaligned memory access).

Test: 3.4 based kernel built with many modules
Uncompressed kernel size: 13MB
lzo: 6.3MB, 301ms
lz4: 6.8MB, 251ms(167ms, with enabled unaligned memory access)
What's this "with enabled unaligned memory access" thing? You mean "if
the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"? If so,
that's only x86, which isn't really in the target market for this
patch, yes?
I'm guessing this is referring to commit 5010192d5a.

It's a lot of code for a 50ms boot-time improvement. Does anyone have
any opinions on whether or not the benefits are worth the cost?
Well, we used to have only one compressed format. Now we have nearly
half a dozen, with the same worthiness issue between themselves.
Either we keep it very simple, or we make it very flexible. The former
would argue in favor of removing some of the existing formats, the later
would let this new format in.
This reminded me to check the status of the lzo update and it
seems it got lost?
http://lkml.org/lkml/2012/10/3/144
The proposed LZO update currently lives in the linux-next tree.

I had tried several times during the last 12 months to provide an update
of the kernel LZO version, but community interest seemed low and I
basically got no feedback about performance improvements - which made
we wonder if people actually care.

At least akpm did approve the LZO update for inclusion into 3.7, but the code
still has not been merged into the main tree.
> On 2012-10-09 21:26, Andrew Morton wrote:
> [...]
> The changes look OK to me. Please ask Stephen to include the tree in
> linux-next, for a 3.7 merge.

Well, this probably means I have done a rather poor marketing. Anyway, as
people seem to love *synthetic* benchmarks I'm finally posting some timings
(including a brand new ARM unaligned version - this is just a quick hack which
probably still can get optimized further).

Hopefully publishing these numbers will help arousing more interest. :-)

Cheers,
Markus


x86_64 (Sandy Bridge), gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

compression speed decompression speed

LZO-2005 : 150 MB/sec 468 MB/sec
LZO-2012 : 434 MB/sec 1210 MB/sec

i386 (Sandy Bridge), gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

compression speed decompression speed

LZO-2005 : 143 MB/sec 409 MB/sec
LZO-2012 : 372 MB/sec 1121 MB/sec

armv7 (Cortex-A9), Linaro gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

compression speed decompression speed

LZO-2005 : 27 MB/sec 84 MB/sec
LZO-2012 : 44 MB/sec 117 MB/sec
LZO-2013-UA : 47 MB/sec 167 MB/sec

Legend:

LZO-2005 : LZO version in current 3.8 rc6 kernel (which is based on
the LZO 2.02 release from 2005)
LZO-2012 : updated LZO version available in linux-next
LZO-2013-UA : updated LZO version available in linux-next plus
ARM Unaligned Access patch (attached below)


(Cc: added, I hope Markus still cares and someone could
eventually take his patch once he resends it.)

Johannes
--
Markus Oberhumer, <markus@...>, http://www.oberhumer.com/


Re: [RFC PATCH 0/4] Add support for LZ4-compressed kernels

Markus F.X.J. Oberhumer <markus@...>
 

On 2013-02-01 08:00, kyungsik.lee wrote:
On 2013-01-30 오전 6:09, Rajesh Pawar wrote:
Andrew Morton <akpm@...> wrote:

On Sat, 26 Jan 2013 14:50:43 +0900
Kyungsik Lee <kyungsik.lee@...> wrote:
[...]
What's this "with enabled unaligned memory access" thing? You mean "if
the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"? If so,
that's only x86, which isn't really in the target market for this
patch, yes?
It's a lot of code for a 50ms boot-time improvement. Does anyone have
any opinions on whether or not the benefits are worth the cost?
BTW, what happened to the proposed LZO update - woudn't it better to merge
this first?

Also, under the hood LZ4 seems to be quite similar to LZO, so probably
LZO speed would also greatly benefit from unaligned access and some other
ARM optimisations
I didn't test with the proposed LZO update you mentioned. Sorry, which one do
you mean?
I did some tests with the latest LZO in the mainline.
In fact you can easily improve LZO decompression speed on armv7 by almost 50%
by adding just a few lines for enabling unaligend access:

armv7 (Cortex-A9), Linaro gcc-4.6 -O3, Silesia test corpus, 256 kB block-size:

compression speed decompression speed

LZO-2005 : 27 MB/sec 84 MB/sec
LZO-2012 : 44 MB/sec 117 MB/sec
LZO-2013-UA : 47 MB/sec 167 MB/sec

Please see my other mail to LKML for details.

Cheers,
Markus

As a result, LZO is not faster in an unaligned access enabled on ARM. Actually
Slower.

Decompression time: 336ms(383ms, with unaligned access enabled)

You may refer to https://lkml.org/lkml/2012/10/7/85 to know more about it.

Thanks,
Kyungsik


Thanks,
Kyungsik
--
Markus Oberhumer, <markus@...>, http://www.oberhumer.com/


Re: [RFC PATCH 0/4] Add support for LZ4-compressed kernels

kyungsik.lee <kyungsik.lee@...>
 

On 2013-01-29 오후 8:43, Egon Alter wrote:
Am Dienstag, 29. Januar 2013, 10:15:49 schrieb Russell King - ARM Linux:
On Mon, Jan 28, 2013 at 02:25:10PM -0800, Andrew Morton wrote:
What's this "with enabled unaligned memory access" thing? You mean "if
the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"? If so,
that's only x86, which isn't really in the target market for this
patch, yes?

It's a lot of code for a 50ms boot-time improvement. Does anyone have
any opinions on whether or not the benefits are worth the cost?
Well... when I saw this my immediate reaction was "oh no, yet another
decompressor for the kernel". We have five of these things already.
Do we really need a sixth?

My feeling is that we should have:
- one decompressor which is the fastest
- one decompressor for the highest compression ratio
- one popular decompressor (eg conventional gzip)
the problem gets more complicated as the "fastest" decompressor usually
creates larger images which need more time to load from the storage, e.g. a
one MB larger image on a 10 MB/s storage (note: bootloaders often configure
the storage controllers in slow modes) gives 100 ms more boot time, thus
eating the gain of a "fast decompressor".
Yes, the larger image could matter. Definitely it takes longer.

Here are some updated test cases: Including "loading time"

lzo lz4
loading time: 480ms 510ms
decompression time: 336ms 180ms(with efficient unaligned memory access enabled and ARM optimization)
total time: 816ms 690ms

lz4 is still 15% faster in total time. This one is similar to the simulated result by Russell King.

Thanks,
Kyungsik


Re: [RFC PATCH 0/4] Add support for LZ4-compressed kernels

kyungsik.lee <kyungsik.lee@...>
 

On 2013-01-30 오전 6:09, Rajesh Pawar wrote:
Andrew Morton <akpm@...> wrote:

On Sat, 26 Jan 2013 14:50:43 +0900
Kyungsik Lee <kyungsik.lee@...> wrote:
This patchset is for supporting LZ4 compressed kernel and initial ramdisk on
the x86 and ARM architectures.

According to [[http://code.google.com/p/lz4/,]] LZ4 is a very fast lossless
compression algorithm and also features an extremely fast decoder.

Kernel Decompression APIs are based on implementation by Yann Collet
([[http://code.google.com/p/lz4/source/checkout]]).
De/compression Tools are also provided from the site above.

The initial test result on ARM(v7) based board shows that the size of kernel
with LZ4 compressed is 8% bigger than LZO compressed but the decompressing
speed is faster(especially under the enabled unaligned memory access).

Test: 3.4 based kernel built with many modules
Uncompressed kernel size: 13MB
lzo: 6.3MB, 301ms
lz4: 6.8MB, 251ms(167ms, with enabled unaligned memory access)

It seems that it___s worth trying LZ4 compressed kernel image or ramdisk
for making the kernel boot more faster.

...

20 files changed, 663 insertions(+), 3 deletions(-)

...
What's this "with enabled unaligned memory access" thing? You mean "if
the arch supports CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS"? If so,
that's only x86, which isn't really in the target market for this
patch, yes?
It's a lot of code for a 50ms boot-time improvement. Does anyone have
any opinions on whether or not the benefits are worth the cost?
BTW, what happened to the proposed LZO update - woudn't it better to merge this first?

Also, under the hood LZ4 seems to be quite similar to LZO, so probably
LZO speed would also greatly benefit from unaligned access and some other
ARM optimisations
I didn't test with the proposed LZO update you mentioned. Sorry, which one do you mean?
I did some tests with the latest LZO in the mainline.

As a result, LZO is not faster in an unaligned access enabled on ARM. Actually Slower.

Decompression time: 336ms(383ms, with unaligned access enabled)

You may refer to https://lkml.org/lkml/2012/10/7/85 to know more about it.

Thanks,
Kyungsik


Thanks,
Kyungsik