This week, I am working to get the dynamic time warping functionality into my program. The process of doing so includes re-processing the features to include the time series, putting each series back together when we construct sequences, and then performing the DTW to generate a number that will be used to compute the kNN of each sequence which can then be used for predictions with the models. The processing time of these activities has gone up significantly since we have been using five different metrics with each of the F phase datasets. I am returning to school next week, and once I've completed the DTW processing all that will remain before we put together our second paper (The date for the reach journal we would like to submit it to is October 1), I am hoping I will have time to look again into the Agglomerative Hierarchical Clustering concept, which I did not successfully complete when we explored it earlier in the summer and then changed focus to the paper. We heard back
This week I worked on the kNN regression model I discussed
last week. The appeal of this method would be that instead of attempting to
create a model to classify a single instance based only on historical data, I
could potentially form a model based on all instances across all patients based
on their similarity to the target instance. I hoped that this would yield more
accurate results, as there would be more data. Once I had the framework to
conduct the experiment, there were several sets of variables I needed to test.
The first of these was K, the number of neighbours used. There
is not a single way by which I felt confident I could optimize this value
without simply testing different values and comparing my results. This became
quite time consuming around k = 30, where it took multiple hours in order for
the program to run. I eventually settled on k = 15 by trial-and-error testing.
The second variable was the set of attributes used to find
the “closest” instances. Again, there did not appear to be a formulaic way to
do this, so I ran tests including all possible combinations of my selected
features to determine which was best. An additional opportunity presented
itself here, as I was able to compare the results from attribute sets that were
identical except for either containing night sleep minutes or night inactive
minutes.
Comments
Post a Comment