Introduction How to generate synthetic datasets with random noise |
What is random noise? The term "noise" originated in communication engineering. Any random or orderly disturbance that is of significantly different characters (frequency and amplitude) to the main signal is called "noise". Larger the signal-to-noise ratio (S/N or SNR), the better the situation is. Addition of "noise" to an ordinary mathematical function will distort the smoothness of the function. There are subdivisions of the term denoting its character such as "white-noise", "pink-noise etc. Random noise consists of a large number of transient disturbances with a statistically random time distribution. These transient disturbances have oscillating magnitudes as a function of time, according to a normal (Gaussian) curve. Also called Gaussian noise and random Gaussian noise. Biological data and random noise Data obtained from most physical systems, i.e., engineering systems, biological systems (think of the air velocity in a wind tunnel or the number of colonies formed in a seeded petri-dish) will show "scatter", that is, the data do not conform to a pure mathematical functional form. For the average of a large number of point estimates however, the data may closely approximate a functional form. Unless a suitable random disturbance model (such as a random noise generator) is included in a mathematical model for an engineering or a biological system will not predict this "scatter" one would expect from actual measurements. Noise of appropriate magnitude can be added to make the data generated by a mathematical model look more "natural" by embedding the nature of unpredictability. Generation of synthetic data by VC VC can be used to generate synthetic biological data by adding random noise to the model predicted responses (the -n command line option must be used to add noise to the predicted responses). Parameters RERR, the relative error can be used to predetermine the relative error involved. Parameter RNGS, the random number generator seed can be used to yield different synthetic data sets for the same set of input parameters stipulating the nature of biological uncertainty involved in repeated experiments.
Tip: Follow the use of keywords and observe the trends in the output variation. |
School of Health Sciences Purdue University Disclaimer | Last updated: 10 June, 2011 |