[patch 0/9] kdump: Patch series for s390 support

July 04th, 2011 - 01:20 pm ET by Michael Holzheu | Report spam
This patch series adds kdump support for the s390 architecture (64 bit). There
are a few common code changes necessary because the s390 implementation is
different to other architectures in some points. Especially these common code
patches (1-7) should be reviewed. Patch 8 "s390: kdump backend code" contains
the s390 specific part. Patch 9 includes the necessary changes for the kexec
tool.

In the following I describe the main differences of the s390 implementation:

The s390 kernel is not relocatable therefore the crashkernel memory is swapped
with the area [0 - crashkernel memory] before the kdump kernel is started.
Architectures other than s390 run the kdump kernel at a memory location that is
disjunct to the standard location for the kernel image and to all memory that
might be in use for I/O by the production system. The main reason for this
seems to be that these architectures do not have a means to clear all ongoing
I/O. If active memory of the production system is reused by the kdump kernel
they run into memory corruption issues. On s390 with diagnose call 308 or boot
(IPL) there is the possibility to stop all ongoing I/O. Therefore we can safely
run the kdump kernel at the old location.

On s390 we do not create page tables for the crashkernel memory and use a
memcpy_real() function to load the kdump kernel and ramdisk in kexec_load()
system call.

On s390 we have external kdump triggers. For example stand-alone dump tools.
The address range information of crashkernel memory is stored at a well defined
storage location that can be used by the external dump triggers to find the
kdump entry point. To export the address range for the crashkernel memory we
introduce a new mechanism that we call meminfo. This allows to define checksum
secured information in memory that is accessible via an s390 ABI defined
storage address. The following information is currently stored via meminfo:
* Crashkernel memory range
* kexec segments for kdump
* Pointer to vmcoreinfo note

Checksums for the loaded kexec segments are stored. This can be used to verify
that kdump is not corrupted. The check is done e.g. by the s390 stand-alone
dump tools via meminfo. If kdump has NOT been overwritten, the checksums are
valid and kdump is started, otherwise a full-blown s390 stand-alone dump is
created as backup dump mechanism.

On s390 the ELF header is created dynamically at kdump startup in the kdump
(2nd) kernel. This is possible, because the memory detection and collection of
the CPU register sets can be done on s390 in the 2nd kernel. Therefore on s390
the ELF header is NOT prepared by the kexec tool. The address for vmcoreinfo
can be found via meminfo and is used by the kdump kernel for ELF header
initialization.

On s390 no additional kernel parameter is needed for kdump. Everything kdump
needs to know can be determined dynamically when the 2nd kernel starts.

If you agree with the approach of this patch series, how should this go
upstream?

Thanks,

Michael
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
email Follow the discussionReplies 21 repliesReplies Make a reply

Similar topics

Replies

#1 Michael Holzheu
July 04th, 2011 - 01:20 pm ET | Report spam
From: Michael Holzheu

For s390 we create the ELF header for /proc/vmcore in the second (kdump)
kernel. Currently vmcore gets the ELF header from oldmem using the global
variable "elfcorehdr_addr". This patch introduces a new value
ELFCORE_ADDR_NEWMEM for "elfcorehdr_addr" that indicates that the ELF header
is allocated in the new kernel. In this case a new architecture function
"arch_vmcore_get_elf_hdr()" is called to obtain address and length of the
ELF header.

Signed-off-by: Michael Holzheu

fs/proc/vmcore.c | 66 ++++++++++++++++++++++++++++++++++++
include/linux/crash_dump.h | 1
2 files changed, 55 insertions(+), 12 deletions(-)

a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -494,14 +494,10 @@ static void __init set_vmcore_list_offse
struct list_head *vc_list)
{
loff_t vmcore_off;
- Elf64_Ehdr *ehdr_ptr;
struct vmcore *m;

- ehdr_ptr = (Elf64_Ehdr *)elfptr;
-
/* Skip Elf header and program headers. */
- vmcore_off = sizeof(Elf64_Ehdr) +
- (ehdr_ptr->e_phnum) * sizeof(Elf64_Phdr);
+ vmcore_off = elfcorebuf_sz;

list_for_each_entry(m, vc_list, list) {
m->offset = vmcore_off;
@@ -514,14 +510,10 @@ static void __init set_vmcore_list_offse
struct list_head *vc_list)
{
loff_t vmcore_off;
- Elf32_Ehdr *ehdr_ptr;
struct vmcore *m;

- ehdr_ptr = (Elf32_Ehdr *)elfptr;
-
/* Skip Elf header and program headers. */
- vmcore_off = sizeof(Elf32_Ehdr) +
- (ehdr_ptr->e_phnum) * sizeof(Elf32_Phdr);
+ vmcore_off = elfcorebuf_sz;

list_for_each_entry(m, vc_list, list) {
m->offset = vmcore_off;
@@ -641,7 +633,7 @@ static int __init parse_crash_elf32_head
return 0;
}

-static int __init parse_crash_elf_headers(void)
+static int __init parse_crash_elf_headers_oldmem(void)
{
unsigned char e_ident[EI_NIDENT];
u64 addr;
@@ -679,6 +671,53 @@ static int __init parse_crash_elf_header
return 0;
}

+/*
+ * provide an empty default implementation here -- architecture
+ * code may override this
+ */
+int __weak arch_vmcore_get_elf_hdr(char **elfcorebuf, size_t *elfcorebuf_sz)
+{
+ return -EOPNOTSUPP;
+}
+
+static int __init parse_crash_elf_headers_newmem(void)
+{
+ unsigned char e_ident[EI_NIDENT];
+ int rc;
+
+ rc = arch_vmcore_get_elf_hdr(&elfcorebuf, &elfcorebuf_sz);
+ if (rc)
+ return rc;
+ memcpy(e_ident, elfcorebuf, EI_NIDENT);
+ if (memcmp(e_ident, ELFMAG, SELFMAG) != 0) {
+ printk(KERN_WARNING "Warning: Core image elf header "
+ "not found");
+ rc = -EINVAL;
+ goto fail;
+ }
+ if (e_ident[EI_CLASS] == ELFCLASS64) {
+ rc = process_ptload_program_headers_elf64(elfcorebuf,
+ elfcorebuf_sz,
+ &vmcore_list);
+ if (rc)
+ goto fail;
+ set_vmcore_list_offsets_elf64(elfcorebuf, &vmcore_list);
+ vmcore_size = get_vmcore_size_elf64(elfcorebuf);
+ } else if (e_ident[EI_CLASS] == ELFCLASS32) {
+ rc = process_ptload_program_headers_elf32(elfcorebuf,
+ elfcorebuf_sz,
+ &vmcore_list);
+ if (rc)
+ goto fail;
+ set_vmcore_list_offsets_elf32(elfcorebuf, &vmcore_list);
+ vmcore_size = get_vmcore_size_elf32(elfcorebuf);
+ }
+ return 0;
+fail:
+ kfree(elfcorebuf);
+ return rc;
+}
+
/* Init function for vmcore module. */
static int __init vmcore_init(void)
{
@@ -687,7 +726,10 @@ static int __init vmcore_init(void)
/* If elfcorehdr= has been passed in cmdline, then capture the dump.*/
if (!(is_vmcore_usable()))
return rc;
- rc = parse_crash_elf_headers();
+ if (elfcorehdr_addr == ELFCORE_ADDR_NEWMEM)
+ rc = parse_crash_elf_headers_newmem();
+ else
+ rc = parse_crash_elf_headers_oldmem();
if (rc) {
printk(KERN_WARNING "Kdump: vmcore not initialized");
return rc;
a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -8,6 +8,7 @@

#define ELFCORE_ADDR_MAX (-1ULL)
#define ELFCORE_ADDR_ERR (-2ULL)
+#define ELFCORE_ADDR_NEWMEM (-3ULL)

extern unsigned long long elfcorehdr_addr;


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#2 Michael Holzheu
July 04th, 2011 - 01:20 pm ET | Report spam
From: Michael Holzheu

On s390 we do not create page tables at all for the crashkernel memory.
This requires a s390 specific version for kimage_load_crash_segment().
Therefore this patch declares this function as "__weak". The s390 version is
very simple. It just copies the kexec segment to real memory without using
page tables:

int kimage_load_crash_segment(struct kimage *image,
struct kexec_segment *segment)
{
return copy_from_user_real((void *) segment->mem, segment->buf,
segment->bufsz);
}

There are two main advantages of not creating page tables for the
crashkernel memory:

a) It saves memory. We have scenarios in mind, where crashkernel
memory can be very large and saving page table space is important.
b) We protect the crashkernel memory from being overwritten.

Signed-off-by: Michael Holzheu

kernel/kexec.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -842,8 +842,8 @@ out:
return result;
}

-static int kimage_load_crash_segment(struct kimage *image,
- struct kexec_segment *segment)
+int __weak kimage_load_crash_segment(struct kimage *image,
+ struct kexec_segment *segment)
{
/* For crash dumps kernels we simply copy the data from
* user space to it's destination.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#3 Vivek Goyal
July 05th, 2011 - 04:30 pm ET | Report spam
On Mon, Jul 04, 2011 at 07:09:22PM +0200, Michael Holzheu wrote:
This patch series adds kdump support for the s390 architecture (64 bit). There
are a few common code changes necessary because the s390 implementation is
different to other architectures in some points. Especially these common code
patches (1-7) should be reviewed. Patch 8 "s390: kdump backend code" contains
the s390 specific part. Patch 9 includes the necessary changes for the kexec
tool.

In the following I describe the main differences of the s390 implementation:

The s390 kernel is not relocatable therefore the crashkernel memory is swapped
with the area [0 - crashkernel memory] before the kdump kernel is started.
Architectures other than s390 run the kdump kernel at a memory location that is
disjunct to the standard location for the kernel image and to all memory that
might be in use for I/O by the production system. The main reason for this
seems to be that these architectures do not have a means to clear all ongoing
I/O. If active memory of the production system is reused by the kdump kernel
they run into memory corruption issues. On s390 with diagnose call 308 or boot
(IPL) there is the possibility to stop all ongoing I/O. Therefore we can safely
run the kdump kernel at the old location.

On s390 we do not create page tables for the crashkernel memory and use a
memcpy_real() function to load the kdump kernel and ramdisk in kexec_load()
system call.

On s390 we have external kdump triggers. For example stand-alone dump tools.
The address range information of crashkernel memory is stored at a well defined
storage location that can be used by the external dump triggers to find the
kdump entry point. To export the address range for the crashkernel memory we
introduce a new mechanism that we call meminfo. This allows to define checksum
secured information in memory that is accessible via an s390 ABI defined
storage address. The following information is currently stored via meminfo:
* Crashkernel memory range
* kexec segments for kdump
* Pointer to vmcoreinfo note



I don't understand what is stand-alone dump tools and why the existing
mechanism of preparing ELF headers to describe all the above info
and just passing the address of header on kernel commnad line
(crashkernel=) will not work for s390. Introducing an entirely new
infrastructure for communicating the same information does not
sound too exciting.

Thanks
Vivek
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#4 Michael Holzheu
July 06th, 2011 - 05:30 am ET | Report spam
Hello Vivec,

On Tue, 2011-07-05 at 16:26 -0400, Vivek Goyal wrote:
On Mon, Jul 04, 2011 at 07:09:22PM +0200, Michael Holzheu wrote:



[snip]

I don't understand what is stand-alone dump tools and



S390 stand-alone dump tools are independent mini operating systems that
are installed on disks or tapes. When a dump should be created, these
stand-alone dump tools are booted. All that they do is to write the dump
(current memory plus the CPU registers) to the disk/tape device.

The advantage compared to kdump is that since they are freshly loaded
into memory they can't be overwritten in memory. Another advantage is
that since it is different code, it is much less likely that the dump
tool will run into the same problem than the previously crashed kernel.
Also the boot process ensures that the hardware is in a initialized
state. And last but not least, with the stand-alone dump tools you can
dump early kernel problems which is not possible using kdump, because
you can't dump before the kdump kernel has been loaded with kexec.

That were more or less the arguments, why we did not support kdump in
the past.

In order to increase dump reliability with kdump, we now implemented a
two stage approach. The stand-alone dump tools first check via meminfo,
if kdump is valid using checksums. If kdump is loaded and healthy it is
started. Otherwise the stand-alone dump tools create a full-blown
stand-alone dump.

With this approach we still keep our s390 dump reliability and gain the
great kdump features, e.g. distributor installer support, dump filtering
with makedumpfile, etc.

why the existing
mechanism of preparing ELF headers to describe all the above info
and just passing the address of header on kernel commnad line
(crashkernel=) will not work for s390. Introducing an entirely new
infrastructure for communicating the same information does not
sound too exciting.



We need the meminfo interface anyway for the two stage approach. The
stand-alone dump tools have to find and verify the kdump kernel in order
to start it. Therefore the interface is there and can be used. Also
creating the ELF header in the 2nd kernel is more flexible and easier
IMHO:
* You do not have to care about memory or CPU hotplug.
* You do not have to preallocate CPU crash notes etc.
* It works independently from the tool/mechanism that loads the kdump
kernel into memory. E.g. we have the idea to load the kdump kernel at
boot time into the crashkernel memory (not via the kexec_load system
call). That would solve the main kdump problems: The kdump kernel can't
be overwritten by I/O and also early kernel problems could then be
dumped using kdump.

What do you think?

Michael

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
#5 Vivek Goyal
July 07th, 2011 - 03:40 pm ET | Report spam
On Wed, Jul 06, 2011 at 11:24:47AM +0200, Michael Holzheu wrote:
Hello Vivec,

On Tue, 2011-07-05 at 16:26 -0400, Vivek Goyal wrote:
> On Mon, Jul 04, 2011 at 07:09:22PM +0200, Michael Holzheu wrote:

[snip]

> I don't understand what is stand-alone dump tools and

S390 stand-alone dump tools are independent mini operating systems that
are installed on disks or tapes. When a dump should be created, these
stand-alone dump tools are booted. All that they do is to write the dump
(current memory plus the CPU registers) to the disk/tape device.

The advantage compared to kdump is that since they are freshly loaded
into memory they can't be overwritten in memory.

Another advantage is
that since it is different code, it is much less likely that the dump
tool will run into the same problem than the previously crashed kernel.



I think in practice this is not really a problem. If your kernel
is not stable enough to even boot and copy a file, then most likely
it has not even been deployed. The very fact that a kernel has been
up and running verifies that it is a stable kernel for that machine
and is capable of capturing the dump.

Also the boot process ensures that the hardware is in a initialized
state.



Who makes sure that hardware is in initiliazed state? Kdump kernel,
stand alone kernel or BIOS.

And last but not least, with the stand-alone dump tools you can
dump early kernel problems which is not possible using kdump, because
you can't dump before the kdump kernel has been loaded with kexec.




That is one limitation but again if your kernel can't even boot,
it is not ready to ship and it is more of a development issue and
there are other ways to debug problems. So I would not worry too
much about it.

On a side note, few months back there were folks who were trying
to enhance bootloaders to be able to prepare basic environment so
that a kdump kernel can boot even in the event of early first
kernel boot.

That were more or less the arguments, why we did not support kdump in
the past.

In order to increase dump reliability with kdump, we now implemented a
two stage approach. The stand-alone dump tools first check via meminfo,
if kdump is valid using checksums. If kdump is loaded and healthy it is
started. Otherwise the stand-alone dump tools create a full-blown
stand-alone dump.



kexec-tools purgatory code also checks the checksum of loaded kernel
and other information and next kernel boot starts only if nothing
has been corrupted in first kernel. So this additional meminfo strucutres
and need of checksums sounds unnecessary. I think what you do need is
that somehow invoking second hook (s390 specific stand alone kernel)
in case primary kernel is corrupted.


With this approach we still keep our s390 dump reliability and gain the
great kdump features, e.g. distributor installer support, dump filtering
with makedumpfile, etc.

> why the existing
> mechanism of preparing ELF headers to describe all the above info
> and just passing the address of header on kernel commnad line
> (crashkernel=) will not work for s390. Introducing an entirely new
> infrastructure for communicating the same information does not
> sound too exciting.

We need the meminfo interface anyway for the two stage approach. The
stand-alone dump tools have to find and verify the kdump kernel in order
to start it.



kexec-tools does this verification already. We verify the checksum of
all the loaded information in reserved area. So why introduce this
meminfo interface.

Therefore the interface is there and can be used. Also
creating the ELF header in the 2nd kernel is more flexible and easier
IMHO:
* You do not have to care about memory or CPU hotplug.



Reloading the kernel upon memory or cpu hotplug should be trivial. This
does not justify to move away from standard ELF interface and creation
of a new one.

* You do not have to preallocate CPU crash notes etc.



Its a small per cpu area. Looks like otherwise you will create meminfo
areas otherwise.

* It works independently from the tool/mechanism that loads the kdump
kernel into memory. E.g. we have the idea to load the kdump kernel at
boot time into the crashkernel memory (not via the kexec_load system
call). That would solve the main kdump problems: The kdump kernel can't
be overwritten by I/O and also early kernel problems could then be
dumped using kdump.



Can you give more details how exactly it works. I know very little about
s390 dump mechanism.

When do you load kdump kernel and who does it?

Who gets the control first after crash?

To me it looked like that you regularly load kdump kernel and if that
is corrupted then somehow you boot standalone kernel. So corruption
of kdump kernel should not be a issue for you.

Do you load kdump kenrel from some tape/storage after system crash. Where
does bootloader lies and how do you make sure it is not corrupted and
associated device is in good condition.

To me we should not create a arch specific way of passing information
between kernels. Stand alone kernel should be able to parse the
ELF headers which contains all the relevant info. They have already
been checksum verified.

Thanks
Vivek
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Replies Reply to this message
Help Create a new topicNext page Replies Make a reply
Search Make your own search