Author: Charles Y K Cheung new changes in v1.04: Important new function: -Now GIGI can read dense markers in long format (rows are markers and columns are individuals, similar to the BEAGLE's genotype file format - except there is no "I" column here.) (See the documentation of the specification). This change allows GIGI to handle very, very dense files in memory efficient manner. see: example/param_longFormat.txt and "dense.genotypes.t" - I converted the original example dense marker file from the old format (rows are individuals) to the long format using the script in the utilities diretory: convertGenotypesfromWideToLongFormat.R - to tell GIGI that the dense genotype file is in the long format, use the -long flag : see documentation. New changes in v1.03: Bug fix: - max ped size was limited to 160... now the number is changed to 5000. - in the check that that the provided allele frequencies of each marker sums up to 1, if(sumAF==1) is replaced by if( (sumAF-1) > 0.0000001) - if a line in the allele frequency only has 1 allelic type (monomorphic marker), added a dummy allelic type with frequency 0 to prevent the program from breaking. - in main(), close the input streams before deallocating some of the variables to ensures output files get written first. Other change: - the call method in the example folder "param.txt" is now set to confidence-based calling (t1=0.8, t2=0.9) instead of the most likely genotype. See manuscript. Rationale: This change is to remind users that calls based on the most likely genotype may not be accurate. For example, if a parent has a rare allele, GIGI will correctly assign a 50% chance that the child has the rare allele IN THE SITUATION when we cannot figure out which chromosome is transmitted. If we use the most likely genotype call method, it will make a call for each genotype despite potential high uncertainty in genotype configuration. Since calls made using the most likely genotypes may be dangerous to use, we change the default call method to confidence-based calling. Analyses that account for the uncertainty in the imputed results may be more appropriate. eg. use the imputed probabilities directly or use a summary of imputed probabilities such as dosage. - A dosage file is generated if all markers to be imputed are di-allelic markers. Here, dosage is defined as the expected percent of 1 alleles in a genotype: dosage of a genotype = 1*P(genotype is 1/1) + 0.5*P(genotype is 1/2) - a binary GIGI file is included in the main uncompressed directory. New changes in v1.02: - warn user in the case when the Inheritance Vector file is empty. e.g. in trios rationale: Since we cannot infer recombination in trios, gl_auto generates an empty inheritance output. This is normal and is correctly stated in the pedigree meiosis file. GIGI will still run, but GIGI will impute only based on the pedigree structure and minor allele frequencies. Hence, Linkage Disequilibrium-based method can potentially be more powerful than GIGI for Trios. - include the perl script extractPedMeiosis.pl in the program to extract the pedigree meiosis file from gl_auto's output. - expand the FAQ section in the documentation file New changes in v1.01 - make new example files - improve the user interface :in main() :summarize relevant information about each input file after reading :print progress - convert to a new format of parameter file :fewer lines :in the code: add readImputeParameterFile_GIGI_v1_01() - implement some error checking routines on input files - add license - modify the documentation file - bug fixes: :call method #1 now works again :fix callThreshold_multiAllelic() :the bug is in the if else statement of method==2. We want the if (method 1), else if (method 2), else ... instead - add various flags - see documentation file - add license Code changes: readDenseMarkers_byComponent(): check that the number of columns are correct readMarkerPos_v2(): ensure positions are in ascending order readAF() has include new changes - ensure each row sums to 1; deallocate variable at the end. shorten the function because it duplicates what is done in readAllelicTypeCount() readAllelicTypeCount(): deallocate variable at the end