Deleting duplicate mail messages

March 23rd, 2005 - 09:38 am ET by a_n_d_y_bell | Report spam
So... I have a large store of mail messages in Thunderbird, but many
duplicate messages (I screwed up while importing stuff a while ago).
What's the best way to identify and delete identical messages?
email Follow the discussionReplies 2 repliesReplies Make a reply

Similar topics

Replies

#1 Bill Marcum
March 23rd, 2005 - 02:38 pm ET | Report spam
On 23 Mar 2005 06:38:42 -0800,
wrote:
So... I have a large store of mail messages in Thunderbird, but many
duplicate messages (I screwed up while importing stuff a while ago).
What's the best way to identify and delete identical messages?



Switch to Mutt and type D~=.


If you call tech support today, wish them a happy Pakistan Day
Replies Reply to this message
#2 Alan Connor
March 23rd, 2005 - 03:51 pm ET | Report spam
On comp.os.linux.misc, in
,
"" wrote:

So... I have a large store of mail messages in Thunderbird, but
many duplicate messages (I screwed up while importing stuff
a while ago). What's the best way to identify and delete
identical messages?




If they are mbox mailboxes, you can use just formail.

From man formail:

-D maxlen idcache
Formail will detect if the Message-ID of the current
message has already been seen using an idcache file
of approximately maxlen size. If not splitting, it
will return success if a duplicate has been found.
If splitting, it will not output duplicate messages.
If used in conjunction with -r, formail will look at
the mail address of the envelope sender instead at
the Message-ID.


If they are maildir mailboxes, you'll need to use procmail
too, I think.

From man procmailex:


If you are subscribed to several mailinglists and people
cross-post to some of them, you usually receive several
duplicate mails (one from every list). The following sim
ple recipe eliminates duplicate mails. It tells formail
to keep an 8KB cache file in which it will store the Mes
sage-IDs of the most recent mails you received. Since
Message-IDs are guaranteed to be unique for every new
mail, they are ideally suited to weed out duplicate mails.
Simply put the following recipe at the top of your rcfile,
and no duplicate mail will get past it.

:0 Wh: msgid.lock
| formail -D 8192 msgid.cache

Beware if you have delivery problems in recipes below this
one and procmail tries to requeue the mail, then on the
next queue run, this mail will be considered a duplicate
and will be thrown away. For those not quite so confident
in their own scripting capabilities, you can use the fol
lowing recipe instead. It puts duplicates in a separate
folder instead of throwing them away. It is up to you to
periodically empty the folder of course.

:0 Whc: msgid.lock
| formail -D 8192 msgid.cache

:0 a:
duplicates


-

HTH

AC
email Follow the discussion Replies Reply to this message
Help Create a new topicReplies Make a reply
Search Make your own search