I am building a small project that takes test data and is suppose to use polynomial regression to calculate the coefficients of the polynomial equation. Initial searching online led me to NumPy but I know know it is not compatible with Ignition (Jython).
What I’ve been given is a 6x6 matrix of the Sum of the Squares of the sample data and a 5th order poly equation. The customer has dictated that the solution is to use the Sum of Squares, the Inverse Matrix of the Sum of Squares matrix and matrix multiplication of the Inverse Matrix and the result side matrix to solve for the coefficients.
It would look something like this:
Getting the sum of squares array (matrix) is somewhat trivial.
Calculating the Inverse Matrix of that is another matter altogether! Out beyond a 2x2 matrix this becomes exponentially more difficult with each increase in matrix size.
I’ve seen a few suggestions on other libraries that are Ignition (Jython) compatible, but I haven’t found one that will do this matrix based regression calculation to solve for the coefficients.
Am I just missing it?
Has anyone else had to do this and found an efficient solution?
I have gotten some test code to work!
I am able to manually validate the resulting Inverse Matrix.
The problem I have now is, I think, with the underlying equations (y = a0 + a1x + a2x^2…) and the assumption that the coefficients (a) can be calculated from this Inv of Sum of Square * Y Result Vector.
I’m not math genius, but when starting with known values for x, y, and a, my resulting coefficients (a) aren’t what I get back out of this logic.
As that part is what was provided by the customer, I’ll have to bounce it back to them for clarification.
Well, I’m back!
Although I figured out the matrix math using the Apache Commons libraries, the results in the real world are lacking. This has to do with the the polynomial regression to find a 5th order (6 coefficients) polynomial requiring a 6x6 Sum of Squares matrix and a 6x1 results vector and the sampling of 7 test points causing me to have to drop one of the sample points. This throws off my resulting equation.
The previous solution used Excel and the LINEST function to perform a linear regression to find the coefficients. I think I can replicate that using the Apache Commons library, I’m just not sure how.
I think the org.apache.commons.math3.fitting package could be used, I’m just at a loss for how to implement it.
Can anyone give me a bit of guidance on implementation of this to solve for the polynomial coefficients?
I’m completely confused as to what to import and how to implement it.
What I have is 7 test points of flow (X) and pressure (Y) and need to evaluate the 5th order polynomial for an equation to fit these test points. That equation would look like this:
OK, I think I got this. They really don’t document the usage of these libraries very well sometimes.
Here is what I figured out in case anyone needs this in the future (or when I forget and need a reminder!).
from org.apache.commons.math3.fitting import PolynomialCurveFitter as pcf
from org.apache.commons.math3.fitting import WeightedObservedPoints as wop
lstFlow = [4590, 4099, 3517, 2849, 1673, 690.5, 6.221]
lstHead = [2.142, 5.962, 15.148, 18.065, 23.294, 23.873, 24.187]
fitter = pcf.create(5)
tstPoints = wop()
for i in range(7):
flow = lstFlow[i]
head = lstHead[i]
tstPoints.add(flow, head)
coeff = fitter.fit(tstPoints.toList())
for i in range(len(coeff)):
print coeff[i]
I outputs the following coefficients (which match what my customer’s spreadsheet outputs):
24.1179181818
0.00230480179819
-5.50959194266e-06
3.78439176723e-09
-1.13047599678e-12
1.07539450483e-16