EN
Gene expression profiling is one of the most explored methods for studying cancers and microarray data repositories have become a rich and important resource. The most common human cancers develop in organs that are walled by smooth muscles. The only method of sample extraction free of unintentional contamination with surrounding tissue is microdissection. Nevertheless, such an approach is implemented infrequently. In the light of the above, there is a possibility of smooth muscle contamination in a large portion of publicly available data. In this study, 2292 publicly available microarrays were analysed to develop a simple screening method for detecting smooth muscle contamination. Microarray Inspector software was used to perform the tests since it has the unique ability to use many selected genes and probesets in a single group as a tissue definition. Furthermore, the test was dataset-independent. Two strategies of tissue definition were explored and compared. The first one depended on Tissue Specific Genes Database (TiSGeD) and BioGPS web resources, which themselves were based on meta-analysis of thousands of microarrays. The second method was based on a differential gene expression analysis of a few hundred preselected arrays. The comparison of the two methods proved the latter to be superior. Among the tested samples of undefined contamination, nearly half were identified to possibly contain significant smooth muscle traces. The obtained results equip researches with a simple method of examining microarray data for smooth muscle contamination. The presented work serves as an example of how to create definitions when searching for other possible contaminations.