November 2000/Extracting Data from X-Y Plots/Sidebar

Walking through the Math

In order to understand the implementation of the data extraction algorithm, it is necessary to review some mathematics, which I will present using an example. If you are not interested in math, you can safely skip this sidebar.

Figure 2 shows a diagram imported into the document's client window. For simplicity, the diagram contains only one data point (D) without any error bars. The physical diagram is slightly tilted within the client window because some minimal tilting will always be introduced when scanning a diagram. The tilt has been somewhat exaggerated for the purpose of the example.

The user has already marked points A and B on the physical axis, entered their respective physical values (10 and 50 in the example) into the pop-up dialog, and acquired the sole data point D. SCANDAT has to calculate the physical x and y of the data point D in the physical coordinate system of the diagram. The calculations for the physical x and y values of D are formally identical (using different axes, of course); therefore, I restrict the discussion to the calculation of the physical x value of D.

Projecting the Data Point onto the X Axis

To obtain the physical x value of data point D, we project the data point onto the physical x axis. This is accomplished by first finding the device coordinates of the projected point. This section shows how to find those device coordinates. The next section will show how to calculate the physical coordinates of the projected point from the device coordinates.

The diagram shows a projection line p(x) from the data point to the physical x axis. It is possible to interpret the physical x axis g(x) and the projection line p(x) as mathematical functions within the coordinate system of the device. g(x) and p(x) can be expressed according to formula (1):
g_d(x_d) = m*x_d + c
p_d(x_d) = M*x_d + C
where x_d and g_d are the x and y device coordinates, respectively, along the line g(x); x_d and p_d are the x and y coordinates along the line p(x).

The quantities m and c of g(x) can be directly expressed in terms of the device coordinates of axis points A and B using formulas (2) and (3).
m = (B.y_d - A.y_d)/(B.x_d - A.x_d)
c = A.y_d - m * A.x_d
To express p(x) in terms of user input, we use the fact that p(x) is orthogonal to g(x), which means the slope M of p(x) is the negative inverse of the slope m of g(x):
M = -(B.x_d - A.x_d)/(B.y_d - A.y_d)
With M as a known quantity, we can use formula (3) to write C in terms of A, B, and D.
C = D.y_d - M * D.x_d     <==>
C = D.y_d + D.x_d * (B.x_d - A.x_d)/(B.y_d - A.y_d)
g(x) and p(x) are now expressed entirely in terms of known quantities.

The next task is to calculate the coordinates of point P, where g(x) and p(x) intersect. The device x coordinate of P is given by the equation:
g(x) = p(x)
The solution for x is:
x = (C - c) / (m - M)            ( = P.x )
Substituting P.x_d into the expression for g(x) or p(x) yields the device y coordinate of P:
y_d = M * P.x_d + C = m * P.x_d + x    ( = P.y_d )
The above yields the coordinates of the projection point P expressed in terms of the coordinates of A, B, and D. These coordinates, however, are device coordinates. What we are really interested in is the physical x coordinate of P. This is calculated in the next step, using the device coordinates of P as input.

Reading the Physical Value

Let us denote the distance between points A and B in device coordinates by (AB)_d and in physical coordinates by (AB)_p. Similarly, the distance between A and P is written as (AP)_d in device and (AP)_p in physical coordinates. In the case of a linear axis, the following relationship holds:
(AP)_p / (AB_p) = (AP)_d / (AB)_d
The distance from point A to B in physical coordinates (AB)_p is known: it is the difference in physical values entered by the user for the axis defining points. In the example, these values are 10 and 50 respectively, therefore (AB)_p = 40. (AB)_d and (AP)_d can be calculated using the law of Pythagoras, since their respective device coordinates are known:
(AP_d) = SQRT ( (A.x_d - P.x_d)² + (A.y_d - P.y_d)² )
(AB_d) = SQRT ( (A.x_d - B.x_d)² + (A.y_d - P.y_d)² )
This is why we needed to calculate the device coordinates of P in the previous section. Therefore, we can calculate the distance between P and A in physical coordinates:
(AP)_p = (AP)_d * (AB)_p / (AB)_d
The actual physical value of P is then given by
P.x_p = A.x_p + (AP)_p
This is true only in those cases where P falls to the right of A, as in the example. Otherwise, the expression is
P.x_p = A.x_p - (AP)_p.
The program determines the proper sign by evaluating the inner product between the vectors from A to P and A to B respectively. But that's another story.

The formulas must be modified when logarithmic scales are involved. I will omit this here, but you can see them in Listing 2.

P.x_p is the physical x value of the data point D. To obtain the physical y value of D, the calculations shown above have to be repeated, substituting for A and B the corresponding points on the physical y axis.