Skip to main content

Summer - Week 10

This week, I am working to get the dynamic time warping functionality into my program. The process of doing so includes re-processing the features to include the time series, putting each series back together when we construct sequences, and then performing the DTW to generate a number that will be used to compute the kNN of each sequence which can then be used for predictions with the models. The processing time of these activities has gone up significantly since we have been using five different metrics with each of the F phase datasets. I am returning to school next week, and once I've completed the DTW processing all that will remain before we put together our second paper (The date for the reach journal we would like to submit it to is October 1), I am hoping I will have time to look again into the Agglomerative Hierarchical Clustering concept, which I did not successfully complete when we explored it earlier in the summer and then changed focus to the paper. We heard back

Week 35


This week I worked on the kNN regression model I discussed last week. The appeal of this method would be that instead of attempting to create a model to classify a single instance based only on historical data, I could potentially form a model based on all instances across all patients based on their similarity to the target instance. I hoped that this would yield more accurate results, as there would be more data. Once I had the framework to conduct the experiment, there were several sets of variables I needed to test.

The first of these was K, the number of neighbours used. There is not a single way by which I felt confident I could optimize this value without simply testing different values and comparing my results. This became quite time consuming around k = 30, where it took multiple hours in order for the program to run. I eventually settled on k = 15 by trial-and-error testing.

The second variable was the set of attributes used to find the “closest” instances. Again, there did not appear to be a formulaic way to do this, so I ran tests including all possible combinations of my selected features to determine which was best. An additional opportunity presented itself here, as I was able to compare the results from attribute sets that were identical except for either containing night sleep minutes or night inactive minutes. 

Comments