Main Page/Research/MSB/Scripts

From phurvitz
< Main Page‎ | Research‎ | MSB
Jump to: navigation, search

Scripts to push and pull MSB data

  • Raw MSB data are downloaded from the device onto a local PC. This script pushes data to my storage server msb_push_data.pl.
  • Data are sucked from my server to Jonathan's server and then the parse and classify routines are run. Data are retrieved from Jonathan's server using msb.get.data.R.

Order of processing

  1. Per subject
    1. Enroll subject, allow them to collect data 1 week
    2. Push raw MSB data to ubicomp
    3. Parse data on ubicomp
    4. Retrieve data from ubicomp
      1. Concatenate msb+gps data
    5. convert binary SDF data to Get All Responses.csv
  2. Assuming all subjects' data have been processed:
    1. Parse SMS messages from raw unix mbox format to a series of individual records (6 and 8 field format)
    2. Standardize all SMS to a single format (creates output file)
    3. Standardize all MyExperience surveys to a single format (creates output file); this relies on the timestamps from the SMS in the previous step.
    4. Extract XY coordinates and precision values from the gps_class.csv data downloaded from ubicomp

Scripts to parse the MSB data on the receiving Linux server

  • Convert SMS in raw unix mbox file format to tabular structure, msb_parse_sms.pl
  • Parsing SMS files, standardizes all SMS into a single format, in a single file, preprocess_sms.R
    This creates a single file from which individual subjects' sms records can easily be extracted. It standardizes across 6-field and 8-field record types.
    1. Some SMS have GPS week and TOW
    2. For those that have GPS week and TOW:
      1. If GPS week != 1340 then this is a good timestamp
      2. If GPS week = 1340, then get the mean difference between this timestamp and the phone timestamp
        1. Apply this difference to the "bad" GPS timestamps as an estimate of GPS time
  • Parsing the MyExperience files using the processed SMS file preprocess_myexp.R
    This creates a single file from which individual MyExperience records can be extracted.
    It performs a time-matching algorithm based on the timestamps in the MyExperience file (which are local phone times, prone to errors), and the timestamps in the SMS (local phone time w/timestamps ~ 12 s after the MyExperience timestamp and GPS time where available)
    • For subjects <11 (6-field SMS), no MyExperience records could be used, since there are no links between phone and GPS time. Although these records are carried through, the only records that should be used are those with tsquality = "A" which indicates an absolute match between GPS and phone time.
  • Joining MSB and GPS data, calculates distance to previous and next points, makes WASPN83 coordinates: join_data.R
  • To parse the binary MyExperience.sdf file into a CSV ASCII file: process_sdf.R
    This requires pre-processing to CSV via MyExperience_Analyzer.exe or MyExperience_Analyzer_new.exe.
    Because the sdf files may be in different formats, use trial-and-error to see which exe file will open the sdf. If one exe fails, use the other.
    When the MyExperience Analyzer opens, click Queries > Get All Responses, then File > Save As and save the file as Get All Responses.csv.re SMS does not have GPS timestamps (Subject ID < 11

Conflating MyExperience and SMS

The idea is to get the MyExperience record's phone timestamp, find the corresponding phone timestamp from the SMS, get that SMS's GPS timestamp for conflating with MSB data (which are all indexed by GPS time). There are several basic problems:

  1. If the true time of a MyExperience record cannot be ascertained, the record must be discarded, otherwise there is a risk of associating a survey with the wrong time and place. This data pollution would be worse than no record at all.
  2. Subjects before S11 had no MSB data on the SMS, so there is no sure way to conflate timestamps.
    • Some of the subjects have a fairly close match between phone time and network time, so there is evidence that the phone time was set and stayed correct. However, because time differences between phone and network time can vary substantially, it is not possible to determine with any certainty if the network time actually reflects when a MyExperience survey was taken.
    • This means that none of the MyExperience surveys from subjects before S11 can be used.
  3. For those subjects after S11, not all MyExperience records sent out SMS. No other log file was recorded to store MSB data and MyExperience data. So these phantom MyExperience records cannot automatically be conflated with any time.
    • If a subject's SMS record shows that the difference between phone time and GPS time was consistent throughout data collection, it can be assumed that the phone error was constant across the data collection period. The time offset between phone and GPS time can be used to correct all MyExperience records for these subjects.
    • If a subject's SMS record shows that the time difference between phone and GPS time is not consistent across the data collection period, then each MyExperience record needs to be examined individually.
  4. Here is a log of SMS consistency within each subject that can be used to determine how to deal with the subjects' data.

Combining all MyExperience files into a single file for easier parsing

This creates symlinks for all subjects' myexper.csv files in to a single dir. Assumes /home/phurvitz/public_html/msb/tools/process_sdf.R has been run for each subject

  1. cd ~/public_html/msb/processed_data/downloaded_data/myexperience_merge
  2. find ../ -name "myexper.csv"|sed "s|\(\.\.\/\)\(.*\)\(\/.*\)|ln \1\2\3 \2\.myexper\.csv|" | sh

then use preprocess_sms.R


?

msb.get.data.pl

msb_remdupes.pl

read.msb.files.R

conflate_timestamp.R