Bug#536823: I also see these losses of the uptimed database
December 07th, 2010 - 07:40 am ET by Martin Steigerwald | Report spam
Ted, I cc'd you. Could you please have a look at the save_records function
in the middle of my mail and tell us whether its safe to use on Ext4 at
least. I understand there might be a problem when using it on XFS, as XFS
doesn't cover the rename case. Thanks.
Hi!
It ate it, about 13 days ago - on my ThinkPad T42:
shambhala:~> uprecords | cut -c1-66
# Uptime | System
-+-
1 10 days, 21:01:41 | Linux 2.6.37-rc3-tp42 Fri Nov 26
2 2 days, 02:09:03 | Linux 2.6.37-rc3-tp42 Wed Nov 24
3 0 days, 13:59:05 | Linux 2.6.37-rc3-tp42 Tue Nov 23
4 0 days, 06:40:23 | Linux 2.6.36-tp42-gtt-vr Tue Nov 23
-> 5 0 days, 02:04:05 | Linux 2.6.37-rc3-tp42
6 0 days, 00:41:55 | Linux 2.6.37-rc3-tp42 Tue Nov 23
-+-
1up in 0 days, 04:36:19 | at Tue Dec 7
no1 in 10 days, 18:57:37 | at Sat Dec 18
up 13 days, 22:36:12 | since Tue Nov 23
down 0 days, 00:06:49 | since Tue Nov 23
%up 99.966 | since Tue Nov 23
I don't remember what might have happened at that time.
Its not the first time. I already restored it from a backup in october:
shambhala:~> ls -l /var/spool/uptimed
insgesamt 28
-rw-r--r-- 1 daemon daemon 11 7. Dez 10:50 bootid
-rw-r--r-- 1 root root 254 7. Dez 12:35 records
-rw-r--r-- 1 daemon daemon 9806 3. Màr 2010 records-2010-03-03-aus-dem-
rsync-backup
-rw-r--r-- 1 daemon daemon 1450 9. Màr 2010 records-2010-03-09-
unvollstaendig
-rw-r--r-- 1 daemon daemon 254 7. Dez 12:30 records.old
As you see the last working backup here is 9802 bytes, way bigger than the
current file.
This is on a
shambhala:~> df -hT /var/spool/uptimed
Dateisystem Typ Size Used Avail Use% Eingehàngt auf
/dev/mapper/shambhala-debian
ext4 20G 14G 5,5G 72% /
and a quite recent kernel 2.6.36 / 2.6.37-rc3 which has the Ext4 safeguard
for the rename and truncate case which was introduced in 2.6.30 I believe
- that it will flush written data *before* renaming the file. But according
to libuptimed/urec.d
247 void save_records(int max, time_t log_threshold) {
248 »·······FILE *f;
249 »·······Urec *u;
250 »·······int i = 0;
251 »·······
252 »·······f = fopen(FILE_RECORDS".tmp", "w");
253 »·······if (!f) {
254 »·······»·······printf("uptimed: cannot write to %s", FILE_RECORDS);
255 »·······»·······return;
256 »·······}
257
258 »·······for (u = urec_list; u; u = u->next) {
259 »·······»·······/* Ignore everything below the threshold */
260 »·······»·······if (u->utime >= log_threshold) {
261 »·······»·······»·······fprintf(f, "%lu:%lu:%s", (unsigned long)u-
utime, (unsigned long)u->btime, u->sys);
262 »·······»·······»·······/* Stop processing when we've logged the max
number specified. */
263 »·······»·······»·······if ((max > 0) && (++i >= max)) break;
264 »·······»·······}
265 »·······}
266 »·······fclose(f);
267 »·······rename(FILE_RECORDS, FILE_RECORDS".old");
268 »·······rename(FILE_RECORDS".tmp", FILE_RECORDS);
269 }
uptimed uses the rename case. Thus I do not get, *why* it ate my old
records again.
Nonetheless, I think there should be a safeguard, like using the old file
if the current one is empty.
I would also keep more than one backup given the small size of this file.
Maybe logrotate can do this while keeping the original file instead of
truncating it.
I have the following configuration:
shambhala:~> cat /etc/uptimed.conf
# Uptimed configuration file.
# Interval to write the logfile with in seconds.
UPDATE_INTERVAL=300
# Maximum number of entries in logfile. Set to 0 for unlimited.
LOG_MAXIMUM_ENTRIES=0
# Minimum uptime that must be reached for it to be considered a record.
LOG_MINIMUM_UPTIMED=1h
[...]
An option to fsync() would be fine, thus people here can easily test,
whether fsync helps in that case.
Then there is the slight chance that uptimed gets confused during runtime
and writes out an empty configuration file by accident. But I find this
highly unlikely.
I will restore as much as possible from my backup. Its easily possible to
combine the contents of a backup and a new records file.
I also lost the records on a Lenny => Squeeze update on my Dell
workstation at work. So this is three losses within just a few month. In
the current state, uptimed is hardly usable for me.
For now I done a backup for myself as fcrontab jobs:
# Backup der uptimed-Datenbank
@ 1d cp -p /var/spool/uptimed/records ~/Backup/uptimed/records-$(date
+%Y-%M-%d)
@ 30d find ~/Backup/uptimed/ -name "records-*" -and -mtime +30 -delete
Something like that should go into uptimed or a cron-job that comes with
the package. Could be a cron.daily or at least cron.weekly job (using some
directory in /var for backups).
So, I hope this was enough constructive feedback to show what can be done
about it. I can craft up a cron-job for the uptimed package if you want
that does the backup. I am not that much into C programming currently, but
eventually I could come up with a patch for uptimed as well.
But I think this bug needs acknowledgment as being serious cause data loss
is involved. Just denying that there is a problem, doesn't help proceeding
further. A user of uptimed IMHO rightly does not care whether its a
problem in the kernel, the filesystem, or the userspace program.
Ciao,
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
To UNSUBSCRIBE, email to debian-bugs-dist-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Similar topics
Make your own search :
Tags
Create a new topic
Follow the discussion
1 reply
Make a reply
June 18th, 2013 - 7:37 PM ET
Join now


Replies