Skip to main content

Posts

Showing posts from July, 2019

Summer - Week 10

This week, I am working to get the dynamic time warping functionality into my program. The process of doing so includes re-processing the features to include the time series, putting each series back together when we construct sequences, and then performing the DTW to generate a number that will be used to compute the kNN of each sequence which can then be used for predictions with the models. The processing time of these activities has gone up significantly since we have been using five different metrics with each of the F phase datasets. I am returning to school next week, and once I've completed the DTW processing all that will remain before we put together our second paper (The date for the reach journal we would like to submit it to is October 1), I am hoping I will have time to look again into the Agglomerative Hierarchical Clustering concept, which I did not successfully complete when we explored it earlier in the summer and then changed focus to the paper. We heard back

Summer - Week 7

Continuing work on the paper, we decided to demonstrate our results by including a heat-map that displayed how error changed when we modified both the length of the series and the value of k, using values from 5 to N, increasing by 5. This was not very telling when we viewed each and every value, but a trendline became apparent when we excluded the bottom 3 k values and bottom 3 sequence lengths. It appeared that the best results were obtained when both sequence length and k were low and when both were high. This is a result that we want to look into more in the future. This week will be spent writing and editing the paper. Once we've submitted it, we will regroup and establish our goals for the final three weeks of the summer.

Summer - Week 6

Further investigation of the first phase shows that there are significantly fewer males than females and that there is only one TBI subject. Therefore, we do not intend to do any subgroup analyses in the first paper that we are working on. When we reintroduce the second group of data, we will revisit this idea. Additionally, I tested out several different age groupings to see if we could choose a midpoint that classified "Older" and "Younger" patients to then perform k-Nearest-Sequence predictions within. This did not seem to improve the results. This may be another factor that we want to address when we return to the entire two-phase dataset, but for now, it will be left out of our paper. This week will be spent organizing the necessary data for the different sections of the paper so that we can begin writing (finding relevant references for the related work section, thinking about what aspects of the project we can use to tell a cohesive and meaningful story, et

Summer - Week 5

For the first paper that we hope to submit to the IEEE Healthcare Innovations and Point of Care Technologies conference in Bethesda, Maryland, we are going to focus solely on the first "Phase" of data. We have also decided that we want to work with the manufacturer's classifications of sleep/wake instead of the inactive minutes approach. This is because we want more time to refine the Inactive Minutes approach and because we found the source of the data to be a reliable sleep/wake classification method. Working only with a single-phase removes one grouping factor from the experimentation we hope to do. Although I performed experiments on Phase 1 and Phase 2 data together, I now need to experiment solely with Phase 1. We will vary the prediction model (from sklearn: decision trees, random forests, SVM), k, length of sequence and period for which minutes will be predicted (Daytime or Nighttime). We also want to include more subject features in each sequence's attribut

Summer - Week 4

Based on my results from the experiments above, the best model thus far has been decision trees. A difficult element with these is their random nature. Using the same data, I may get ten different results if I run the program ten times. A way to cope with this is to perform each experiment multiple times and then average all of the resulting errors, but the cost is time. Even if I only run each experiment for each sequence ten times, my code takes all night to run. Random Forests, while they do not get as low error at current as decision trees can, may be the answer to this, as they already work to generate multiple decision trees with random seeding and create one result from that. Because of the optimized nature, making a random forest with 100 trees is much faster than creating ten different individual Decision Tree Regressors. This is likely the method that we will want to focus on going forward.