Skip to main content

Summer - Week 10

This week, I am working to get the dynamic time warping functionality into my program. The process of doing so includes re-processing the features to include the time series, putting each series back together when we construct sequences, and then performing the DTW to generate a number that will be used to compute the kNN of each sequence which can then be used for predictions with the models. The processing time of these activities has gone up significantly since we have been using five different metrics with each of the F phase datasets. I am returning to school next week, and once I've completed the DTW processing all that will remain before we put together our second paper (The date for the reach journal we would like to submit it to is October 1), I am hoping I will have time to look again into the Agglomerative Hierarchical Clustering concept, which I did not successfully complete when we explored it earlier in the summer and then changed focus to the paper. We heard back

Summer - Week 4

Based on my results from the experiments above, the best model thus far has been decision trees. A difficult element with these is their random nature. Using the same data, I may get ten different results if I run the program ten times. A way to cope with this is to perform each experiment multiple times and then average all of the resulting errors, but the cost is time. Even if I only run each experiment for each sequence ten times, my code takes all night to run. Random Forests, while they do not get as low error at current as decision trees can, may be the answer to this, as they already work to generate multiple decision trees with random seeding and create one result from that. Because of the optimized nature, making a random forest with 100 trees is much faster than creating ten different individual Decision Tree Regressors. This is likely the method that we will want to focus on going forward.

Comments