Find the week-by-week
closing quote of the index
As shown in Figure 2 (and duplicated in Figure
4), the task we want to show by week number, the starting date, the
ending date, and the closing value of the NASDAQ 100 index.
Each week constitutes one logical data block. The starting
date is record 1 of a block, and the ending date and closing value
are in record 5 of the block. As shown in Figure 1, the data
table is in columns A and B and starts with row 2
From the section on the
theory, recall that record y of block x is in the worksheet row
r+(x-1)*n+(y-1). In this case, r=2 and n=5. .
Hence, record one of block x is in row
2+(x-1)*5+(1-1), or 2+(x-1)*5.
Similarly, the last record of block x is in
row 2+(x-1)*5+(5-1).
Now, we can create the formula for the
starting date of a particular week. The formula in E2 is
=INDIRECT("A"&(ROW($A$2)+(D2-1)*5),TRUE)
The piece that identifies the row of interest is ROW($A$2)+(D2-1)*5.
To see how this is the same as 2+(x-1)*5, note that
ROW($A$2) gives the starting row of the data
table, i.e., r
D2 contains the week number, i.e., x.
Similarly, F2 contains =INDIRECT("A"&(ROW($A$2)+(D2-1)*5)+(5-1),TRUE),
where
ROW($A$2)+(D2-1)*5)+(5-1) is the same as 2+(x-1)*5+(5-1)
To get the closing value of the index in F2,
all we need to note is that the same method used in F2 applies
except that now we want the formula to reference column B rather
than column A. Hence, =INDIRECT("B"&(ROW($A$2)+(D2-1)*5)+(5-1),TRUE)
Calculating the
week-by-week average value of the index
As shown in Figure 3, and duplicated in Figure
5, we want to calculate the average value of the index on a weekly
basis. In addition, this time around, we do not have the very
convenient helper column 'Week #'. That particular column was
useful because in gave us the correct
block number for our calculations.
However, as convenient as it was, that column
was not necessary. We can calculate the block number with the
formula ROW()-ROW($H$2)+1, where H2 is the first row of the
summary report. This is the
same as x in the equations above. Also recall the formula to
calculate the row of block x record y relative to the start
of the table is (x-1)*n+(y-1). In this case that becomes
(ROW()-ROW($H$2)+1-1)*5+(5-1) or just (ROW()-ROW($H$2))*5+(5-1).
The formula in H2 is
=OFFSET($A$2,(ROW()-ROW($H$2))*5+(5-1),0,1,1)
(ROW()-ROW($H$2))*5+(5-1) is the same as
(x-1)*5+(5-1). However, instead of adding the row number of
the first row of the data table, we specify it as the first argument
to the OFFSET function.
How about the actual formula to calculate the
weekly average? The formula in I2 is =AVERAGE(OFFSET($A$2,(ROW()-ROW($H$2))*5,1,5,1))
The first argument to the OFFSET function is
$A$2 -- the first cell of the first column of the data table.
The 2nd argument, ROW()-ROW($H$2))*5
calculates the starting row of the block of interest relative
to the start of the table. It is the same as (x-1)*5.
For those interested, the last 3 arguments of
the OFFSET function are briefly explained in this paragraph:
These arguments are ...,1,5,1).
The first 1 indicates that we want to refer to the column 1 to the
right of the base reference ($A$2), i.e., column B. The 5 and
the last 1 indicate that we want to include 5 rows and 1 column in
the range created by the function. So, the complete OFFSET
function refers to range 5 rows long by 1 column wide starting with
the cell in column B that is (x-1)*5 rows away from the start of the
data table ($A$2).
Application 2: Calculate the total
volume by product
for each daily 12 hour period for 6 days
http://www.mrexcel.com/board2/viewtopic.php?t=101040
We look at a simplified version of the problem
solved at the link above. The problem involves 25 columns of
data that represent 25 different products. The data, in 72
rows from row 147 to 218, consist of 6 logical blocks each
containing 12 rows. They represent readings over 12 hours for
6 working days each week. The requirement is to provide a
day-by-day summary in a table starting in cell BC3 while leaving a
blank row after each set of totals. So, the 25 cells in row 3
starting with BC3 will have the totals for the first day for each of
the 25 products. The 25 cells in row 5 starting with BC5 will
have the totals for the 2nd block of 12 records for each of the 25
products.
In this case, there are two data sets
consisting of logical blocks. The raw data has r=147, n=12. The
output section has r=3, n=2.
As is often the case in problems of this nature, it is best to start
with the output row. The formula to use in BC3 would be:
=SUM(OFFSET($A$147,(ROW()-ROW($BC$3))/2*12,COLUMN()-COLUMN($BC$3),12,1))
Given a particular output row, we need to find its corresponding
data block in the input section. The formula above is (z-r) div n +1
ROW() gives the record number of a particular row in the output
section, i.e., z.
ROW()-ROW($BC$3) is the equivalent of (z-r)
ROW()-ROW($BC$3))/2 is the equivalent of (z-r) div n. This is so
because the formula is entered only in alternate rows 3, 5, 7, etc.
So, division by 2 is the equivalent of the DIV function.
ROW()-ROW($BC$3))/2+1 is the equivalent of (z-r) div n +1.
However, as in Application 1 above, the 1 will be subtracted off
soon and won't appear in the final formula.
This gives us the value of x.
Now that we have the block number that the output row corresponds
to, we can find the physical record in the input section where this
block begins. The formula for the starting row is r + (x-1)*n.
Using the number specific to this problem, we get 147+
((ROW()-ROW($BC$3))/2+1-1)*12. Note that the +1 and -1 cancel out to
give the final result of (ROW()-ROW($BC$3))/2*12. Also, by leaving
out the 147 piece, we get a reference that is *zero relative* to the
start of the data set. The COLUMN()-COLUMN($BC$3) simply selects the
different columns to sum based on the relative distance from column
BC.
Now, that we've established the starting row and column *zero
relative* to the start of the data set, the OFFSET then uses ,12,1)
to select 12 rows and 1 column as the range for the SUM() function.
Application 3: Isolate microtiter
readings by concentration in a research lab
This case is adapted from a posting at
http://www.mrexcel.com/board2/viewtopic.php?t=95985&start=10
A summary description of the problem:
There is an array of 12 columns (A-L) and 8
rows (starting with 4-11) that represents the wells in the plate.
This same array is repeated 100 times with a blank row between
each iteration. This represents each reading at increasing time
points. Also on the starting sheet is a row (3) that is 100
columns wide containing the time of the read.
What I need to do now is to take each set of 8 rows (for column A)
and transpose them into 8 columns (C-J) so that each row becomes
each different time point. This should be repeated for each
of the other columns B-L.
The readings are absorbance readings from a
microtiter plate. The plate itself is divided into 12 columns
(1-12) by 8 rows (A-H). The samples are arranged so that each row
on the plate is a different specimen and each column is a
different concentration. In other words row A is all the same
specimen but different dilutions whereas row B is a different
specimen.
So the original data from the instrument consists of 100 different
reads of the same plate. (A4:L902 remember the extra blank row
after each set).
In other words the information is three dimensional and we only
want to look at two dimensions - rearranging it so that each row
is one of the 100 reads and each column is a separate sample. This
is why we put each column on a separate sheet so that all the
similar dilutions stay together since ultimately we will only
choose the optimum dilution to use results from.
The question then is how to transform the
data as shown in the Input data below into the layout shown
in Output data.
 |
Figure 6 -- Input data
 |
Figure 7 -- Output data
While the poster was pursuing a VBA based
solution, it should also be apparent that what we have are two
lists.
The first list is the input data organized in
logical blocks by time. The starting row for the data was row
4. Each block of data used 9 rows -- 8 containing data and one
empty row. Basically, r=4 and n=9.
The second list is the output data organized
in logical blocks by conentration. Since it starts in row 4,
r=4. Each concentration needs 100 rows to describe.
Together with the empty row at the end of each group, we get n=101.
Basically, what we need to do is for each
output row to identify the matching range segment of the input data
and use the TRANSPOSE function on that range.
Suppose the data are in a sheet named Sheet1.
Then, in a 2nd sheet, say, Sheet2...
In row 3 starting with A3 enter the literals: Conc, Iteration, Time,
Specimen1, Specimen2, ..., Specimen8.
In A4 enter =IF(MOD(ROW()-4,101)+1=101,"",INT((ROW()-4)/101)+1)
This identifies which concentration each
output block contains -- and also the block number of the output
data. The formula puts 1 in column A of the first output
block (concentration 1), 2 in the 2nd output block (concentration
2) etc.
In B4 enter =IF(A4="","",MOD(ROW()-4,101)+1)
This simply adds a sequential number from 1
to 100 in each block. It creates a unique identifier for
each record (time snapshot) within each output block.
In C4 enter =IF(A4="","",INDEX(Sheet1!$3:$3,Sheet2!B4))
This 'extracts' the correct time value from
the data in row 3 of the input worksheet.
In D4:K4 enter the
array formula
=IF(A4="","",TRANSPOSE(OFFSET(Sheet1!$A$4,(Sheet2!B4-1)*9,Sheet2!A4-1,8,1)))
As is often the case, it is easiest to
construct the solution starting with the output side. First,
for any given output row, figure out the corresponding input logical
block and record number within that block. Then, map that
information into the corresponding row in the input data. If
we look at any output row, the first column identifies the
concentration that that row corresponds to.
In the input, each type of concentration is
contained in a column -- A, B, C, etc. Also, in any output
row, the 2nd column contains a number from 1 to 100 for the time
sequence when the reading was taken. On the input side, this
is exactly how each logical block is laid out. So, column B
gives us the value of what was called x in the
section on the theory. Finally, remember that n=9 -- 8 rows
of readings at each time and one empty row). Hence
(Sheet2!B4-1)*9 is the equivalent of (x-1)*n.
Sheet2!A4-1 identifies which column
contains the concentration that corresponds to the output row, and
8 is the number of rows of with data for any
one time reading.
The final result is that data for
'concentration 1' are in rows 4:103, for 'concentration 2' in rows
105:204, all the way to concentration 12 in 1115:1214.
Using a PivotTable
As useful as the technique of mapping logical
and physical records is, in some instances it might be easier to use
a PivotTable to do the analysis. While PivotTables do have
their limitations, some might find one easier to create than to work
through the math involved in this analysis. It will be easiest
to use a PivotTable if one were to add a 'key' column to the
existing data and use it for the 'row field.'
In the case of the NASDAQ analysis, it would
help to add a 'week number' column.
In the case of the micro-titer analysis, it
would help to add two columns to the data. The first would be
the 'Reading Time' and the second would be 'Specimen ID.'
Assuming that the concentration columns were labeled C1, C2, etc.,
and that the blank rows were removed, one could create one cross-tab
PivotTable for each concentration with the Reading Time as the row
field and the Specimen ID as the column field.
Note that the data layout in the micro-titer
case is still not ideally suited for contemporary data analysis
tools. Ideally speaking, the data should be organized in a
table with columns labeled Reading Time, Specimen ID, Concentration
Level, and Reading. This would have allowed us to create a
single PivotTable that would contain information about all
concentration levels and not one per concentration. But, the
reality is that when Excel interfaces with existing equipment -- in
a laboratory or a service center or a manufacturing plant -- we have
to accept the data as they are organized and not as we might ideally
want them to be. This case study demonstrates how one can
adapt Excel to easily and elegantly analyze such data.
Introduction to DIV
and MOD and technical note: For those unfamiliar with the DIV and
MOD operators, the first, used as a div n , returns the
integer part resulting from dividing the positive integer a
by the positive integer n (example: 10 div 3 = 3).
The second, used as a mod n, returns the remainder from the
same division (example: 10 mod 3 = 1). In some
implementations, such as VBA, the div operator is represented by the
reverse slash symbol \ and used as a \ n. In
other instances -- such as with Excel -- these are implemented as
functions and written as DIV(a,n) or MOD(a,n).
Finally, while the both DIV and MOD are unambiguously defined when
a and n are positive integers, different
implementations may confer different interpretations when either
a or n or both are negative integers or
real numbers.
An array formula is completed not
with the ENTER key but the CTRL+SHIFT+ENTER combination. If
done correctly, Excel will display the formula within braces as in
Figure 8. Remember, you should not
enter the curly braces yourself!
 |
Figure 8