Alli Fellger CREU Blog 2018-2019

Posts

Summer - Week 9

In addition to the Fitbit data, we've received 12 new subjects' data from the P Phase. I am going to spend this week cleaning the data and running it through our processing pipeline to get it up to speed. A challenge with introducing Fitbit data is going to be finding an appropriate measurement of activity since it's output does not provide a single summarizing "Activity" metric. Instead, it reports minute-by-minute steps, heart rate, calories burned, distance moved (vertically) and caloric mets. We want to use the Fitbit data with our other data, meaning we need to find out which of the measurements can be used for the "Active" and "Inactive" minute calculations - finding the median and classifying all of the minutes above/below as Active or Inactive for Day and Night, respectively. My plan with respect to this is to go forward with all of the measurements and see which performs best. We hope that we will be able to successfully integrate th...

Summer - Week 8

We've submitted the paper, and now we are looking to move forward into the second paper. There are a few experiments we'd like to run: 1. Using Dynamic Time Warping to find the nearest 0/1 sleep sequences and then use features from similar series in predictions 2. Using Agglomerative Hierarchical clustering with #1 3. Bringing a third "Phase" of Fitbit activity data to see if we can apply our inactive-minutes concept into 4. Experiment to see if we can find a single model that works well across all three phases. 5. As a stretch goal: introduce an ANN model and see if it can outperform our previous best models, which were random forest. Going forward, we are going to refer to the first "Phase" as "A", after the Actilogger devices used to collect them, "P" to refer to the second series, collected with Phillips devices, and "F" to refer to the new Fitbit data being incorporated.

Summer - Week 7

Continuing work on the paper, we decided to demonstrate our results by including a heat-map that displayed how error changed when we modified both the length of the series and the value of k, using values from 5 to N, increasing by 5. This was not very telling when we viewed each and every value, but a trendline became apparent when we excluded the bottom 3 k values and bottom 3 sequence lengths. It appeared that the best results were obtained when both sequence length and k were low and when both were high. This is a result that we want to look into more in the future. This week will be spent writing and editing the paper. Once we've submitted it, we will regroup and establish our goals for the final three weeks of the summer.

Summer - Week 6

Further investigation of the first phase shows that there are significantly fewer males than females and that there is only one TBI subject. Therefore, we do not intend to do any subgroup analyses in the first paper that we are working on. When we reintroduce the second group of data, we will revisit this idea. Additionally, I tested out several different age groupings to see if we could choose a midpoint that classified "Older" and "Younger" patients to then perform k-Nearest-Sequence predictions within. This did not seem to improve the results. This may be another factor that we want to address when we return to the entire two-phase dataset, but for now, it will be left out of our paper. This week will be spent organizing the necessary data for the different sections of the paper so that we can begin writing (finding relevant references for the related work section, thinking about what aspects of the project we can use to tell a cohesive and meaningful story, et...

Summer - Week 5

For the first paper that we hope to submit to the IEEE Healthcare Innovations and Point of Care Technologies conference in Bethesda, Maryland, we are going to focus solely on the first "Phase" of data. We have also decided that we want to work with the manufacturer's classifications of sleep/wake instead of the inactive minutes approach. This is because we want more time to refine the Inactive Minutes approach and because we found the source of the data to be a reliable sleep/wake classification method. Working only with a single-phase removes one grouping factor from the experimentation we hope to do. Although I performed experiments on Phase 1 and Phase 2 data together, I now need to experiment solely with Phase 1. We will vary the prediction model (from sklearn: decision trees, random forests, SVM), k, length of sequence and period for which minutes will be predicted (Daytime or Nighttime). We also want to include more subject features in each sequence's attribut...

Summer - Week 4

Based on my results from the experiments above, the best model thus far has been decision trees. A difficult element with these is their random nature. Using the same data, I may get ten different results if I run the program ten times. A way to cope with this is to perform each experiment multiple times and then average all of the resulting errors, but the cost is time. Even if I only run each experiment for each sequence ten times, my code takes all night to run. Random Forests, while they do not get as low error at current as decision trees can, may be the answer to this, as they already work to generate multiple decision trees with random seeding and create one result from that. Because of the optimized nature, making a random forest with 100 trees is much faster than creating ten different individual Decision Tree Regressors. This is likely the method that we will want to focus on going forward.

Summer - Week 3

This week, I performed experiments with the first and second phase data together and varied different factors. These were various model types from sklearn, length of the sequence (How many periods precede the one that we are attempting to predict), number of nearby sequences to use and subgroupings (gender, phase, injury type, experiment group). The primary aspect of this that we didn't get to experiment with when we were finishing our final report was testing out how different models worked. My model for the report was a from-scratch k-Nearest-Neighbor Regressor, and I took an equal-weighted approach for calculating my final predictions, using all of the similar active/inactive minute values and taking the average. Using the sklearn's kNN model takes into account the distance of each series from the series that we are attempting to predict and weight closer sequences higher than farther ones. Other models that we intend to test out include a linear SVM, decision trees (Alexa...

Summer - Week 2

I spent most of this week cleaning up my code, organizing our files and reviewing Alexa's code. I am mostly done with this and I am looking forward to working on our first objective, wrapping up the topics that we were working on at the end of the year.

Summer - Week 1

Alexa is not going to continue working on the project throughout the summer, but Gina and I intend to branch out based on what we've done so far. In order to do so, we have set a few objectives for ourselves: 1. Tie up some loose ends from Alexa and I's initial project 2. Complete 1-2 papers, since we were not able to get our research during the year to that point 3. Push the "Nearest Sequences" concept farther by associating data from farther back with a given night or day period we are predicting - as far back as we can go without having to use baseline days in the experiment while not losing any data. 3. Explore the built-in sleep detection methods that are automatically classified by the Actigraph's software, and how they compare to Dr. Skornyakov's sleep/wake algorithm and the inactive minutes method we've used previously. One of the variables we addressed when deciding to proceed with the "k-nearest sequences" approach was the...

Week 40

This week, I want to spend some time to look back at what we’ve done thus far and regroup as I prepare for the summer. I want to clean up my code, tie up loose ends and see if there are any unanswered questions that we’ve passed over but that I may have time to review over the summer. This semester, we’ve covered a lot of ground by way of data exploration but not as much in terms of actual predictions. I want my work going forward to be primarily based on what we’ve already done and focused on obtaining actual results.

Week 39

This week, we are going to merge the “P” approach discussed last week with a kNN grouping to generate sets of similar instances that can be used to create a custom regression model for an individual instance. This introduces several new variables in addition to k and the set of attributes used as nearness indicators, as I discussed in Week 35, as well as the P value. Since we are now using kNN as a means to make groups, we are able to try out different regressors to generate models from these groups. The models we will use in our experiments wil be decision trees, support vector models, random forest models, bayes models and linear regression models. We will use the packages in scikit learn to easily switch between regressors. We will also vary the subgroupings used as the pool from which kNN are selected, as we did with the original kNN model. These results will be compared to those obtained by generating a model with the same regressor, subgrouping and P val...

Week 38

With our paper out of the way, I began looking at making predictions based on data from multiple days preceding an instance. This would be helpful, because it would give us more features and take into account more than only the two periods preceding the period to predict. We can then compare the effectiveness of this approach to the initial one, and if we find that epochs with more historical data can generate more accurate results, we can conclude that we should look more into the patterns leading up to the period we want to predict. Additionally, I want to add a new feature to each epoch called “DAYS_SINCE_EVENT” which contains the number of days that have passed since the patient had either a TBI or Stroke. We will refer to the number of periods before an instance as P. So P 1 will be a set of epochs containing only features from a nighttime period being used to predict the following daytime activity or features from a daytime period being used to predict ...