Content

The IST Online system is a vast repository of transcription data. It contains over twenty thousand samples, each of which has data for anywhere between 7 000 and 17 000 genes. All in all, the system contains over 300 million data points from a wide variety of samples. The focus of the data is cancer, which comprises about 75% of the samples. All of the samples have been meticulously annotated so that they have all possible relevant information associated with them available. Also, most importantly, the cuts normalization used by IST Online makes it possible to directly compare any sample and/or gene to any other, thus creating the world's largest unified gene expression database.

Within the cancer samples, all major, and nearly all minor cancer types are represented. The focus of the data is on the most studied cancer types, such as breast cancer (over three thousand samples), lung cancer (1 500 samples) and leukemias (a total of 4 000 samples). With these kinds of numbers, previously impossible studies become possible. Cancers are classified on three different levels, so that in the above example, breast cancers are divided into its subtypes, without loosing the ability to easily gather up all breast cancers.

As mentioned above, all samples are manually annotated, and all relevant clinical data is associated with them. While the availability of data is not universal, as we are at the mercy of the primary investigators, enough of it is available to enable analyses of survival, stage or metastasis association, as a couple of examples, for genes and cancers, and how they behave together.


The content includes data from:

  • 68 different healthy tissues
  • 106 different cancer types
  • 92 diseases other than cancer
  • 406 different established cell lines