Skip to main content

Summer - Week 10

This week, I am working to get the dynamic time warping functionality into my program. The process of doing so includes re-processing the features to include the time series, putting each series back together when we construct sequences, and then performing the DTW to generate a number that will be used to compute the kNN of each sequence which can then be used for predictions with the models. The processing time of these activities has gone up significantly since we have been using five different metrics with each of the F phase datasets. I am returning to school next week, and once I've completed the DTW processing all that will remain before we put together our second paper (The date for the reach journal we would like to submit it to is October 1), I am hoping I will have time to look again into the Agglomerative Hierarchical Clustering concept, which I did not successfully complete when we explored it earlier in the summer and then changed focus to the paper. We heard back

Week 7

There are no patients currently enrolled in the study at St. Lukes, so this week's work was entirely focused on coding and reading about slicing. This was my second week of work dealing with the summary statistics of the practice data for subjects K002 and K027. My original code was functional and produced a result that was just slightly different than my mentor's. However, I'd chosen quite a roundabout way of doing this. While I did manage to create a date-time index for my DataFrame, I failed to fully utilize the full potential of this set-up. Instead of passing a slice of the DataFrame through a function for each period, I was moving through the data line by line in order to determine the number of transitions, minutes of sleep and minutes of activity for the period. Clearly, the former is both more efficient and more modular than the latter.

I planned to create a function slice_stats that would accept a slice of a DataFrame up to twenty-four hours in size, but shorter on the first and last day of wear. It would then return a row populated with the aforementioned study statistic for a single day and add it to the stats DataFrame in my get_stats function. Similar to my original design, this function would return the frame of stats which could then be written to an excel file.

Unfortunately, I made a terrible mistake when I went back to modify my code. Instead of saving a copy of my original (working) code, I modified the code directly. I cannot be sure what exactly I did, but in my efforts to fix my clean_data function such that I did not include data on activity that existed following more than 7 hours of continuous NaNs, I ruined the functions involving the general cleanup of the frame. Without fully realizing this, I spent a large amount of time working on my new stats function, only to find that it did not produce the desired result. Now, I'll never know if it would have worked, given an appropriately cleaned DataFrame, because I deleted the function after reaching the erroneous conclusion that it was at fault for the improper results.


Now that I understand my first mistake, I am going forward attempting to replicate my slice_stats function to see if I was on the right track. I have learned, additionally, that when adding new elements to programs or making major changes, it would be advantageous for me to begin to save multiple versions of it to prevent a similar situation from arising in the future.

Comments