[PATCH 0/3] Fix for leapsecond caused hrtimer/futex issue

July 05th, 2012 - 03:20 pm ET by John Stultz | Report spam
Thomas:
So Prarit and my testing over the last few days have gone fine,
and its been quiet otherwise, so I wanted to go ahead and submit this
for inclusion.

As widely reported on the internet, many Linux systems after
the leapsecond was inserted experienced futex related load
spikes (usually connected to MySQL, Firefox, Thunderbird, Java, etc).

An apparent workaround for this issue is running:
$ date -s "`date`"

Credit: http://www.sheeri.com/content/mysql...pu-and-fix


This issue stemmed from the timekeeping subsystem not notifying
the hrtimer subsystem that the leapsecond occurred, causing
CLOCK_REALTIME hritmers to be fired one second early, and
sub-second CLOCK_REALTIME hrtimer timeouts to fire immediately
(causing the load spikes).


To address this issue I'm proposing we do three things:
1) Fix the clock_was_set() call to remove the limitation that kept
us from calling it from update_wall_time().

2) Call clock_was_set() when we add/remove a leapsecond.

3) Change hrtimer_interrupt to update the hrtimer base offset values.
This third item provides additional robustness should the
clock_was_set() notification (done via a timer if we're in_atomic)
be delayed significantly.


NOTE: Some reports have been of a hard hang right at or before
the leapsecond. I've not been able to reproduce or diagnose
this, so this fix does not likely address the reported hard
hangs (unless they end up being connected to the futex/hrtimer
issue). Please email lkml and me if you experienced this.

Big thanks to Prarit for shaking out a few issues in the earlier
version of this patch set, as well as the extra effort testing over
the Holiday!

Also, I've already got backports generated for -stable, that I'm
testing and I'll submitting them once I have upstream commit ids for
these patches.

thanks
-john

CC: Prarit Bhargava <prarit@redhat.com>
CC: stable@vger.kernel.org
CC: Thomas Gleixner <tglx@linutronix.de>


John Stultz (3):
hrtimer: Fix clock_was_set so it is safe to call from irq context
time: Fix leapsecond triggered hrtimer/futex load spike issue
hrtimer: Update hrtimer base offsets each hrtimer_interrupt

include/linux/hrtimer.h | 3 +++
kernel/hrtimer.c | 31 +++++++++++++++++++++++++++-
kernel/time/timekeeping.c | 38 ++++++++++++++++++++++++++++++++++++++
3 files changed, 68 insertions(+), 4 deletions(-)

1.7.9.5

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
email Follow the discussionReplies 1 replyReplies Make a reply

Similar topics

Replies

#1 John Stultz
July 05th, 2012 - 03:20 pm ET | Report spam
This patch introduces a new funciton which captures the
CLOCK_MONOTONIC time, along with the CLOCK_REALTIME and
CLOCK_BOOTTIME offsets at the same moment. This new function
is then used in place of ktime_get() when hrtimer_interrupt()
is expiring timers.

This ensures that any changes to realtime or boottime offsets
are noticed and stored into the per-cpu hrtimer base structures,
prior to doing any hrtimer expiration. This should ensure that
timers are not expired early if the offsets changes under us.

This is useful in the case where clock_was_set() is called from
atomic context and have to schedule the hrtimer base offset update
via a timer, as it provides extra robustness in the face of any
possible timer delay.

CC: Prarit Bhargava
CC:
CC: Thomas Gleixner
Acked-by: Prarit Bhargava
Signed-off-by: John Stultz

include/linux/hrtimer.h | 3 +++
kernel/hrtimer.c | 14 +++++++++++
kernel/time/timekeeping.c | 34 ++++++++++++++++++++++++++++++++++
3 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index fd0dc30..f6b2a74 100644
a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -320,6 +320,9 @@ extern ktime_t ktime_get(void);
extern ktime_t ktime_get_real(void);
extern ktime_t ktime_get_boottime(void);
extern ktime_t ktime_get_monotonic_offset(void);
+extern void ktime_get_and_real_and_sleep_offset(ktime_t *monotonic,
+ ktime_t *real_offset,
+ ktime_t *sleep_offset);

DECLARE_PER_CPU(struct tick_device, tick_cpu_device);

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index d730678..56600c4 100644
a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1258,18 +1258,26 @@ static void __run_hrtimer(struct hrtimer *timer, ktime_t *now)
void hrtimer_interrupt(struct clock_event_device *dev)
{
struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
- ktime_t expires_next, now, entry_time, delta;
+ ktime_t expires_next, now, entry_time, delta, real_offset, sleep_offset;
int i, retries = 0;

BUG_ON(!cpu_base->hres_active);
cpu_base->nr_events++;
dev->next_event.tv64 = KTIME_MAX;

- entry_time = now = ktime_get();
+
+ ktime_get_and_real_and_sleep_offset(&now, &real_offset, &sleep_offset);
+
+ entry_time = now;
retry:
expires_next.tv64 = KTIME_MAX;

raw_spin_lock(&cpu_base->lock);
+
+ /* Update base offsets, to avoid early wakeups */
+ cpu_base->clock_base[HRTIMER_BASE_REALTIME].offset = real_offset;
+ cpu_base->clock_base[HRTIMER_BASE_BOOTTIME].offset = sleep_offset;
+
/*
* We set expires_next to KTIME_MAX here with cpu_base->lock
* held to prevent that a timer is enqueued in our queue via
@@ -1346,7 +1354,7 @@ retry:
* interrupt routine. We give it 3 attempts to avoid
* overreacting on some spurious event.
*/
- now = ktime_get();
+ ktime_get_and_real_and_sleep_offset(&now, &real_offset, &sleep_offset);
cpu_base->nr_retries++;
if (++retries < 3)
goto retry;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cc2991d..b3404cf 100644
a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1251,6 +1251,40 @@ void get_xtime_and_monotonic_and_sleep_offset(struct timespec *xtim,
}

/**
+ * ktime_get_and_real_and_sleep_offset() - hrtimer helper, gets monotonic ktime,
+ * realtime offset, and sleep offsets.
+ */
+void ktime_get_and_real_and_sleep_offset(ktime_t *monotonic,
+ ktime_t *real_offset,
+ ktime_t *sleep_offset)
+{
+ unsigned long seq;
+ struct timespec wtom, sleep;
+ u64 secs, nsecs;
+
+ do {
+ seq = read_seqbegin(&timekeeper.lock);
+
+ secs = timekeeper.xtime.tv_sec +
+ timekeeper.wall_to_monotonic.tv_sec;
+ nsecs = timekeeper.xtime.tv_nsec +
+ timekeeper.wall_to_monotonic.tv_nsec;
+ nsecs += timekeeping_get_ns();
+ /* If arch requires, add in gettimeoffset() */
+ nsecs += arch_gettimeoffset();
+
+ wtom = timekeeper.wall_to_monotonic;
+ sleep = timekeeper.total_sleep_time;
+ } while (read_seqretry(&timekeeper.lock, seq));
+
+ *monotonic = ktime_add_ns(ktime_set(secs, 0), nsecs);
+ set_normalized_timespec(&wtom, -wtom.tv_sec, -wtom.tv_nsec);
+ *real_offset = timespec_to_ktime(wtom);
+ *sleep_offset = timespec_to_ktime(sleep);
+}
+
+
+/**
* ktime_get_monotonic_offset() - get wall_to_monotonic in ktime_t format
*/
ktime_t ktime_get_monotonic_offset(void)
1.7.9.5

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
email Follow the discussion Replies Reply to this message
Help Create a new topicReplies Make a reply
Search Make your own search