Since you guys were so helpful, I hate to wear out my welcome, but...
Now that I have the high-index data from my previous question, I need
to process the list of words to eliminate duplicates. The table below
shows some sample data. The first column is the worksheet row numbers.
The Word column (B) contains the words. The Score column (C) contains
a count of how often that words occurs is a large collection of text
(400 million words). The Index column (D) contains the result of the
GetHighIndex UDF from my previous post.
I need another UDF to walk down the list looking for duplicate words
(tee, a, in). When it finds a duplicate, it will merge the two rows
into one by adding the Scores and deleting the second row.
In the example below, rows 5 & 6 would be combined to form one row
with a Score of 3301 (199 + 3102).
Rows 62-65 would be combined into one row with a Score of 7,140,219
(808 + 9,711 + 279,364 + 6,850,336).
Only exact matches are combined. The two "balloon" rows would be
combined as would the two "ballooning" rows, but would not include the
Can I impose on someone to get this started?
The parts I am not sure how to do inside a UDF are:
1. How to step through the rows and address the cells (relatively).
2. How to delete a row.
B C D
4 Word Score Index
5 tee 199 2
6 tee 3,102 2
7 a 298 3
8 a 9,996,626 3
9 at 1,730,609 3
10 eat 69,484 3
62 in 808 6
63 in 9,711 6
64 in 279,364 6
65 in 6,850,336 6