Difference between revisions of "Main Page/Research/MSB/Scripts"
From phurvitz
Phil Hurvitz (talk | contribs) (→Combining all MyExperience files into a single file for easier parsing) |
Phil Hurvitz (talk | contribs) (→Scripts to parse the MSB data on the receiving Linux server) |
||
Line 3: | Line 3: | ||
* Raw MSB data are downloaded from the device onto a local PC. This script pushes data to my storage server '''[[/msb_push_data.pl|msb_push_data.pl]]'''. | * Raw MSB data are downloaded from the device onto a local PC. This script pushes data to my storage server '''[[/msb_push_data.pl|msb_push_data.pl]]'''. | ||
* Data are sucked from my server to Jonathan's server and then the parse and classify routines are run. Data are retrieved from Jonathan's server using '''[[/msb.get.data.R|msb.get.data.R]]'''. | * Data are sucked from my server to Jonathan's server and then the parse and classify routines are run. Data are retrieved from Jonathan's server using '''[[/msb.get.data.R|msb.get.data.R]]'''. | ||
+ | |||
+ | Scripts for processing MSB data | ||
+ | |||
+ | ==Order of processing== | ||
+ | # Per subject | ||
+ | ## Enroll subject, allow them to collect data 1 week | ||
+ | ## Push raw MSB data to ubicomp | ||
+ | ## Parse data on ubicomp | ||
+ | ## Retrieve data from ubicomp | ||
+ | ## convert binary SDF data to ''Get All Responses.csv'' | ||
+ | # Assuming all subjects' data have been processed: | ||
+ | ## Parse SMS messages from raw unix mbox format to a series of individual records (6 and 8 field format) | ||
+ | ## Standardize all SMS to a single format (creates output file) | ||
+ | ## Standardize all MyExperience surveys to a single format (creates output file); this relies on the timestamps from the SMS in the previous step. | ||
+ | ## Extract XY coordinates and precision values from the gps_class.csv data downloaded from ubicomp | ||
==Scripts to parse the MSB data on the receiving Linux server== | ==Scripts to parse the MSB data on the receiving Linux server== | ||
+ | |||
+ | * Convert SMS in raw unix mbox file format to tabular structure, '''[[/msb_parse_sms.pl|msb_parse_sms.pl]] | ||
+ | |||
* Parsing SMS files, standardizes all SMS into a single format, in a single file, '''[[/preprocess_sms.R|preprocess_sms.R]]''' | * Parsing SMS files, standardizes all SMS into a single format, in a single file, '''[[/preprocess_sms.R|preprocess_sms.R]]''' | ||
− | *: This creates a single file | + | *: This creates a single file from which individual subjects' sms records can easily be extracted. It standardizes across 6-field and 8-field record types. |
*# Some SMS have GPS week and TOW | *# Some SMS have GPS week and TOW | ||
*# For those that have GPS week and TOW: | *# For those that have GPS week and TOW: | ||
Line 12: | Line 30: | ||
*## If GPS week = 1340, then get the mean difference between this timestamp and the phone timestamp | *## If GPS week = 1340, then get the mean difference between this timestamp and the phone timestamp | ||
*### Apply this difference to the "bad" GPS timestamps as an estimate of GPS time | *### Apply this difference to the "bad" GPS timestamps as an estimate of GPS time | ||
+ | |||
+ | * Parsing the MyExperience files using the processed SMS file '''[[/preprocess_myexp.R|preprocess_myexp.R]]''' | ||
+ | *: This creates a single file from which individual MyExperience records can be extracted. | ||
+ | *: It performs a time-matching algorithm based on the timestamps in the MyExperience file (which are local phone times, prone to errors), and the timestamps in the SMS (local phone time w/timestamps ~ 12 s after the MyExperience timestamp and GPS time where available) | ||
+ | ** For subjects <11 (6-field SMS), no MyExperience records could be used, since there are no links between phone and GPS time. Although these records are carried through, the only records that should be used are those with ''tsquality = "A"'' which indicates an absolute match between GPS and phone time. | ||
+ | |||
* Joining MSB and GPS data, calculates distance to previous and next points, makes WASPN83 coordinates: '''[[/join_data.R|join_data.R]]''' | * Joining MSB and GPS data, calculates distance to previous and next points, makes WASPN83 coordinates: '''[[/join_data.R|join_data.R]]''' | ||
+ | |||
* To parse the binary '''MyExperience.sdf''' file into a CSV ASCII file: '''[[/process_sdf.R|process_sdf.R]]''' | * To parse the binary '''MyExperience.sdf''' file into a CSV ASCII file: '''[[/process_sdf.R|process_sdf.R]]''' | ||
*: This requires pre-processing to CSV via '''[[Media:MyExperience_Analyzer.exe|MyExperience_Analyzer.exe]]''' or '''[[Media:MyExperience_Analyzer_new.exe|MyExperience_Analyzer_new.exe]]'''. | *: This requires pre-processing to CSV via '''[[Media:MyExperience_Analyzer.exe|MyExperience_Analyzer.exe]]''' or '''[[Media:MyExperience_Analyzer_new.exe|MyExperience_Analyzer_new.exe]]'''. | ||
*: Because the sdf files may be in different formats, use trial-and-error to see which exe file will open the sdf. If one exe fails, use the other. | *: Because the sdf files may be in different formats, use trial-and-error to see which exe file will open the sdf. If one exe fails, use the other. | ||
− | *: When the MyExperience Analyzer opens, click ''Queries > Get All Responses'', then ''File > Save As'' and save the file as '''Get All Responses.csv'''. | + | *: When the MyExperience Analyzer opens, click ''Queries > Get All Responses'', then ''File > Save As'' and save the file as '''Get All Responses.csv'''.re SMS does not have GPS timestamps (Subject ID < 11 |
− | |||
− | |||
− | |||
==Conflating MyExperience and SMS== | ==Conflating MyExperience and SMS== |
Revision as of 23:11, 26 January 2009
Contents
Scripts to push and pull MSB data
- Raw MSB data are downloaded from the device onto a local PC. This script pushes data to my storage server msb_push_data.pl.
- Data are sucked from my server to Jonathan's server and then the parse and classify routines are run. Data are retrieved from Jonathan's server using msb.get.data.R.
Scripts for processing MSB data
Order of processing
- Per subject
- Enroll subject, allow them to collect data 1 week
- Push raw MSB data to ubicomp
- Parse data on ubicomp
- Retrieve data from ubicomp
- convert binary SDF data to Get All Responses.csv
- Assuming all subjects' data have been processed:
- Parse SMS messages from raw unix mbox format to a series of individual records (6 and 8 field format)
- Standardize all SMS to a single format (creates output file)
- Standardize all MyExperience surveys to a single format (creates output file); this relies on the timestamps from the SMS in the previous step.
- Extract XY coordinates and precision values from the gps_class.csv data downloaded from ubicomp
Scripts to parse the MSB data on the receiving Linux server
- Convert SMS in raw unix mbox file format to tabular structure, msb_parse_sms.pl
- Parsing SMS files, standardizes all SMS into a single format, in a single file, preprocess_sms.R
- This creates a single file from which individual subjects' sms records can easily be extracted. It standardizes across 6-field and 8-field record types.
- Some SMS have GPS week and TOW
- For those that have GPS week and TOW:
- If GPS week != 1340 then this is a good timestamp
- If GPS week = 1340, then get the mean difference between this timestamp and the phone timestamp
- Apply this difference to the "bad" GPS timestamps as an estimate of GPS time
- Parsing the MyExperience files using the processed SMS file preprocess_myexp.R
- This creates a single file from which individual MyExperience records can be extracted.
- It performs a time-matching algorithm based on the timestamps in the MyExperience file (which are local phone times, prone to errors), and the timestamps in the SMS (local phone time w/timestamps ~ 12 s after the MyExperience timestamp and GPS time where available)
- For subjects <11 (6-field SMS), no MyExperience records could be used, since there are no links between phone and GPS time. Although these records are carried through, the only records that should be used are those with tsquality = "A" which indicates an absolute match between GPS and phone time.
- Joining MSB and GPS data, calculates distance to previous and next points, makes WASPN83 coordinates: join_data.R
- To parse the binary MyExperience.sdf file into a CSV ASCII file: process_sdf.R
- This requires pre-processing to CSV via MyExperience_Analyzer.exe or MyExperience_Analyzer_new.exe.
- Because the sdf files may be in different formats, use trial-and-error to see which exe file will open the sdf. If one exe fails, use the other.
- When the MyExperience Analyzer opens, click Queries > Get All Responses, then File > Save As and save the file as Get All Responses.csv.re SMS does not have GPS timestamps (Subject ID < 11
Conflating MyExperience and SMS
The idea is to get the MyExperience record's phone timestamp, find the corresponding phone timestamp from the SMS, get that SMS's GPS timestamp for conflating with MSB data (which are all indexed by GPS time). There are several basic problems:
- If the true time of a MyExperience record cannot be ascertained, the record must be discarded, otherwise there is a risk of associating a survey with the wrong time and place. This data pollution would be worse than no record at all.
- Subjects before S11 had no MSB data on the SMS, so there is no sure way to conflate timestamps.
- Some of the subjects have a fairly close match between phone time and network time, so there is evidence that the phone time was set and stayed correct. However, because time differences between phone and network time can vary substantially, it is not possible to determine with any certainty if the network time actually reflects when a MyExperience survey was taken.
- This means that none of the MyExperience surveys from subjects before S11 can be used.
- For those subjects after S11, not all MyExperience records sent out SMS. No other log file was recorded to store MSB data and MyExperience data. So these phantom MyExperience records cannot automatically be conflated with any time.
- If a subject's SMS record shows that the difference between phone time and GPS time was consistent throughout data collection, it can be assumed that the phone error was constant across the data collection period. The time offset between phone and GPS time can be used to correct all MyExperience records for these subjects.
- If a subject's SMS record shows that the time difference between phone and GPS time is not consistent across the data collection period, then each MyExperience record needs to be examined individually.
- Here is a log of SMS consistency within each subject that can be used to determine how to deal with the subjects' data.
Combining all MyExperience files into a single file for easier parsing
This creates symlinks for all subjects' myexper.csv files in to a single dir. Assumes /home/phurvitz/public_html/msb/tools/process_sdf.R has been run for each subject
- cd ~/public_html/msb/processed_data/downloaded_data/myexperience_merge
- find ../ -name "myexper.csv"|sed "s|\(\.\.\/\)\(.*\)\(\/.*\)|ln \1\2\3 \2\.myexper\.csv|" | sh