Difference between revisions of "Main Page/Research/MSB/Scripts"

From phurvitz
< Main Page‎ | Research‎ | MSB
Jump to: navigation, search
(Combining all MyExperience files into a single file for easier parsing)
Line 36: Line 36:
 
This creates symlinks for all subjects' myexper.csv files in to a single dir. Assumes '''/home/phurvitz/public_html/msb/tools/process_sdf.R''' has been run for each subject
 
This creates symlinks for all subjects' myexper.csv files in to a single dir. Assumes '''/home/phurvitz/public_html/msb/tools/process_sdf.R''' has been run for each subject
 
# <tt>cd ~/public_html/msb/processed_data/downloaded_data/myexperience_merge</tt>
 
# <tt>cd ~/public_html/msb/processed_data/downloaded_data/myexperience_merge</tt>
# <tt>find ../ -name "myexper.csv"|sed "s|\(\.\.\/\)\(.*\)\(\/.*\)|ln \1\2\3 \2\.myexper\.csv|" | sh<tt>
+
# <tt>find ../ -name "myexper.csv"|sed "s|\(\.\.\/\)\(.*\)\(\/.*\)|ln \1\2\3 \2\.myexper\.csv|" | sh</tt>

Revision as of 17:52, 21 January 2009

Scripts to push and pull MSB data

  • Raw MSB data are downloaded from the device onto a local PC. This script pushes data to my storage server msb_push_data.pl.
  • Data are sucked from my server to Jonathan's server and then the parse and classify routines are run. Data are retrieved from Jonathan's server using msb.get.data.R.

Scripts to parse the MSB data on the receiving Linux server

  • Parsing SMS files, standardizes all SMS into a single format, in a single file, preprocess_sms.R
    This creates a single file form which individual subjects' sms records can easily be extracted.
    1. Some SMS have GPS week and TOW
    2. For those that have GPS week and TOW:
      1. If GPS week != 1340 then this is a good timestamp
      2. If GPS week = 1340, then get the mean difference between this timestamp and the phone timestamp
        1. Apply this difference to the "bad" GPS timestamps as an estimate of GPS time
  • Joining MSB and GPS data, calculates distance to previous and next points, makes WASPN83 coordinates: join_data.R
  • To parse the binary MyExperience.sdf file into a CSV ASCII file: process_sdf.R
    This requires pre-processing to CSV via MyExperience_Analyzer.exe or MyExperience_Analyzer_new.exe.
    Because the sdf files may be in different formats, use trial-and-error to see which exe file will open the sdf. If one exe fails, use the other.
    When the MyExperience Analyzer opens, click Queries > Get All Responses, then File > Save As and save the file as Get All Responses.csv.
  • To conflate SMS and MSB records
    Where both have GPS timestamps (Subject ID >= 11): conflate_msb_sms_gps.R
    Where SMS does not have GPS timestamps (Subject ID < 11): conflate_msb_sms_nogps.R

Conflating MyExperience and SMS

The idea is to get the MyExperience record's phone timestamp, find the corresponding phone timestamp from the SMS, get that SMS's GPS timestamp for conflating with MSB data (which are all indexed by GPS time). There are several basic problems:

  1. If the true time of a MyExperience record cannot be ascertained, the record must be discarded, otherwise there is a risk of associating a survey with the wrong time and place. This data pollution would be worse than no record at all.
  2. Subjects before S11 had no MSB data on the SMS, so there is no sure way to conflate timestamps.
    • Some of the subjects have a fairly close match between phone time and network time, so there is evidence that the phone time was set and stayed correct. However, because time differences between phone and network time can vary substantially, it is not possible to determine with any certainty if the network time actually reflects when a MyExperience survey was taken.
    • This means that none of the MyExperience surveys from subjects before S11 can be used.
  3. For those subjects after S11, not all MyExperience records sent out SMS. No other log file was recorded to store MSB data and MyExperience data. So these phantom MyExperience records cannot automatically be conflated with any time.
    • If a subject's SMS record shows that the difference between phone time and GPS time was consistent throughout data collection, it can be assumed that the phone error was constant across the data collection period. The time offset between phone and GPS time can be used to correct all MyExperience records for these subjects.
    • If a subject's SMS record shows that the time difference between phone and GPS time is not consistent across the data collection period, then each MyExperience record needs to be examined individually.
  4. Here is a log of SMS consistency within each subject that can be used to determine how to deal with the subjects' data.

Combining all MyExperience files into a single file for easier parsing

This creates symlinks for all subjects' myexper.csv files in to a single dir. Assumes /home/phurvitz/public_html/msb/tools/process_sdf.R has been run for each subject

  1. cd ~/public_html/msb/processed_data/downloaded_data/myexperience_merge
  2. find ../ -name "myexper.csv"|sed "s|\(\.\.\/\)\(.*\)\(\/.*\)|ln \1\2\3 \2\.myexper\.csv|" | sh