Weibo Xie, Ying Chen, Gang Zhou, Lei Wang, Chengjun Zhang, Jianwei Zhang, Jinghua Xiao, Tong Zhu, Qifa Zhang*
Address: National Key Laboratory of Crop Genetic Improvement, National Center of Plant Gene Research (Wuhan), Huazhong Agricultural University, 430070 Wuhan, China
* Correspondence: Qifa Zhang. Email: qifazh [at] mail.hzau.edu.cn
Expression levels measured in microarrays based on oligonucleotide probes have now been adapted as a high throughput approach for identifying DNA sequence variation between genotypes, referred to as single feature polymorphisms (SFPs). Although there have been increasing interests in this method, there is still need for improving the algorithm in order to achieve high sensitivity and specificity especially with complex genome and large datasets, while maintaining optimal computational performance. Moreover, it is generally regarded that sequence mismatch between the targets and probes within the probes on the chip reduces binding affinity, providing the basis of sequence polymorphisms for SFP detection. However, SFPs have been frequently detected between probes and targets with perfect matched sequences. Such observations, although merit detailed investigation, have frequently been ignored in the analyses.
We adapted a median polish method to evaluate the contribution from probe-flanking SNPs in SFP detection from multiple transcriptome data. We showed that the median polish method has the advantage of avoiding fitting complex linear models thus can be used to analyze complex transcriptome datasets. The method is also superior in sensitivity, accuracy and computing time requirement using data from multiple species with different genome complexity compared with a previously used method. Using this method, we identified 6,655 SFPs between two rice varieties and 3,387 yeast SFPs from two yeast stains. 76% of rice SFPs and 89% of yeast SFPs detected from examined transcriptomes can be validated by the presence of SNPs in the probe regions. Further comparison in both rice and yeast genome revealed that SNPs in sequences immediately flanking the probes did contribute to the detection of SFPs in cases where the probes and the targets had perfectly matched sequences, as over 15% of such non-polymorphic SFPs were associated with flanking SNPs. It was shown that differences in minimum free energies caused by flanking SNPs, which may change the stability of RNA secondary structure, may partly explain the SFPs as detected.
The median polish method has superior performance in SFP detection regarding sensitivity and accuracy, and at the same time significantly reducing the computing time required. Polymorphisms in sequences immediately flanking the probes can frequently cause SFPs in microarray analysis, demonstrating possible influence of the probe flanking SNPs on comparative transcriptome analyses using oligonucleotide microarrays. The SFPs between the two rice cultivars representing the parents of the most widely cultivated rice hybrid may greatly facilitate gene discovery in future studies.