refine.bio
  • Search
      • Normalized Compendia
      • RNA-seq Sample Compendia
  • Docs
  • About
  • My Dataset
github link
Showing
of 56 results
Sort by

Filters

Technology

Platform

accession-icon GSE16716
MicroArray Quality Control Phase II (MAQC-II) Project
  • organism-icon Mus musculus, Homo sapiens, Rattus norvegicus
  • sample-icon 1314 Downloadable Samples
  • Technology Badge Icon Affymetrix Human Genome U133 Plus 2.0 Array (hgu133plus2), Affymetrix Rat Genome 230 2.0 Array (rat2302), Affymetrix Human Genome U133A Array (hgu133a), Affymetrix Mouse Genome 430 2.0 Array (mouse4302)

Description

The MAQC-II Project: A comprehensive study of common practices for the development and validation of microarray-based predictive models

Publication Title

Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.

Sample Metadata Fields

Sex, Age, Specimen part, Race, Compound

View Samples
accession-icon GSE5350
MicroArray Quality Control (MAQC) Project
  • organism-icon Homo sapiens, Rattus norvegicus
  • sample-icon 212 Downloadable Samples
  • Technology Badge Icon Affymetrix Human Genome U133 Plus 2.0 Array (hgu133plus2)

Description

Microarray technology has had a profound impact on gene expression research. Some studies have questioned whether similar expression results are obtained when the same RNA samples are analyzed on different platforms.

Publication Title

The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements.

Sample Metadata Fields

No sample metadata fields

View Samples
accession-icon GSE24080
MAQC-II Project: Multiple myeloma (MM) data set
  • organism-icon Homo sapiens
  • sample-icon 549 Downloadable Samples
  • Technology Badge Icon Affymetrix Human Genome U133A Array (hgu133a), Affymetrix Human Genome U133 Plus 2.0 Array (hgu133plus2)

Description

The multiple myeloma (MM) data set (endpoints F, G, H, and I) was contributed by the Myeloma Institute for Research and Therapy at the University of Arkansas for Medical Sciences (UAMS, Little Rock, AR, USA). Gene expression profiling of highly purified bone marrow plasma cells was performed in newly diagnosed patients with MM. The training set consisted of 340 cases enrolled on total therapy 2 (TT2) and the validation set comprised 214 patients enrolled in total therapy 3 (TT3). Plasma cells were enriched by anti-CD138 immunomagnetic bead selection of mononuclear cell fractions of bone marrow aspirates in a central laboratory. All samples applied to the microarray contained more than 85% plasma cells as determined by 2-color flow cytometry (CD38+ and CD45-/dim) performed after selection. Dichotomized overall survival (OS) and eventfree survival (EFS) were determined based on a two-year milestone cutoff. A gene expression model of high-risk multiple myeloma was developed and validated by the data provider and later on validated in three additional independent data sets.

Publication Title

Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.

Sample Metadata Fields

Sex, Age

View Samples
accession-icon GSE24363
MAQC-II Project: NIEHS data set
  • organism-icon Rattus norvegicus
  • sample-icon 410 Downloadable Samples
  • Technology Badge Icon Affymetrix Rat Genome 230 2.0 Array (rat2302), Affymetrix Human Genome U133A Array (hgu133a)

Description

The NIEHS data set (endpoint C) was provided by the National Institute of Environmental Health Sciences (NIEHS) of the National Institutes of Health (Research Triangle Park, NC, USA). The study objective was to use microarray gene expression data acquired from the liver of rats exposed to hepatotoxicants to build classifiers for prediction of liver necrosis. The gene expression compendium data set was collected from 418 rats exposed to one of eight compounds (1,2-dichlorobenzene, 1,4-dichlorobenzene, bromobenzene, monocrotaline, N-nitrosomorpholine, thioacetamide, galactosamine, and diquat dibromide). All eight compounds were studied using standardized procedures, i.e. a common array platform (Affymetrix Rat 230 2.0 microarray), experimental procedures and data retrieving and analysis processes.

Publication Title

Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.

Sample Metadata Fields

Sex, Specimen part, Compound

View Samples
accession-icon GSE20194
MAQC-II Project: human breast cancer (BR) data set
  • organism-icon Homo sapiens
  • sample-icon 267 Downloadable Samples
  • Technology Badge Icon Affymetrix Human Genome U133A Array (hgu133a)

Description

The human breast cancer (BR) data set (endpoints D and E) was contributed by the University of Texas M. D. Anderson Cancer Center (MDACC, Houston, TX, USA). Gene expression data from 230 stage I-III breast cancers were generated from fine needle aspiration specimens of newly diagnosed breast cancers before any therapy. The biopsy specimens were collected sequentially during a prospective pharmacogenomic marker discovery study between 2000 and 2008. These specimens represent 70-90% pure neoplastic cells with minimal stromal contamination. Patients received 6 months of preoperative (neoadjuvant) chemotherapy including paclitaxel, 5-fluorouracil, cyclophosphamide and doxorubicin followed by surgical resection of the cancer. Response to preoperative chemotherapy was categorized as a pathological complete response (pCR = no residual invasive cancer in the breast or lymph nodes) or residual invasive cancer (RD), and used as endpoint D for prediction. Endpoint E is the clinical estrogen-receptor status as established by immunohistochemistry. RNA extraction and gene expression profiling were performed in multiple batches over time using Affymetrix U133A microarrays. Genomic analysis of a subset of this sequentially accrued patient population were reported previously. For each endpoint, the first 130 cases were used as a training set and the next 100 cases were used as an independent validation set.

Publication Title

Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.

Sample Metadata Fields

Age, Specimen part, Race

View Samples
accession-icon GSE47875
SEQC Toxicogenomics Study: microarray data set
  • organism-icon Rattus norvegicus
  • sample-icon 105 Downloadable Samples
  • Technology Badge Icon Affymetrix Rat Genome 230 2.0 Array (rat2302)

Description

The comparative advantages of RNA-Seq and microarrays in transcriptome profiling were evaluated in the context of a comprehensive study design. Gene expression data from Illumina RNA-Seq and Affymetrix microarrays were obtained from livers of rats exposed to 27 agents that comprised of seven modes of action (MOAs); they were split into training and test sets and verified with real time PCR.

Publication Title

The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance.

Sample Metadata Fields

Sex, Specimen part

View Samples
accession-icon GSE24061
MAQC-II Project: Hamner data set
  • organism-icon Mus musculus
  • sample-icon 88 Downloadable Samples
  • Technology Badge Icon Affymetrix Mouse Genome 430 2.0 Array (mouse4302)

Description

The Hamner data set (endpoint A) was provided by The Hamner Institutes for Health Sciences (Research Triangle Park, NC, USA). The study objective was to apply microarray gene expression data from the lung of female B6C3F1 mice exposed to a 13-week treatment of chemicals to predict increased lung tumor incidence in the 2-year rodent cancer bioassays of the National Toxicology Program. If successful, the results may form the basis of a more efficient and economical approach for evaluating the carcinogenic activity of chemicals. Microarray analysis was performed using Affymetrix Mouse Genome 430 2.0 arrays on three to four mice per treatment group, and a total of 70 mice were analyzed and used as the MAQC-II's training set (GEO Series GSE6116). Additional data from another set of 88 mice were collected later and provided as the MAQC-II's external validation set (this Series). The training dataset had already been deposited in GEO by its provider and its accession number is GSE6116.

Publication Title

Effect of training-sample size and classification difficulty on the accuracy of genomic predictors.

Sample Metadata Fields

Specimen part, Compound

View Samples
accession-icon GSE56457
A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequence Quality Control consortium
  • organism-icon Homo sapiens
  • sample-icon 16 Downloadable Samples
  • Technology Badge Icon Affymetrix Human Gene Expression Array (primeview), Illumina HumanHT-12 V4.0 expression beadchip, Affymetrix Human Gene 2.0 ST Array (hugene20st)

Description

We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the United States Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for sequence discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed, for these and qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcriptlevel profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings.

Publication Title

A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium.

Sample Metadata Fields

No sample metadata fields

View Samples
accession-icon GSE23610
Gene expression profiles of MCF-7 cells treated with Si-Wu-Tang, estradiol and ferulic acid
  • organism-icon Homo sapiens
  • sample-icon 22 Downloadable Samples
  • Technology Badge Icon Affymetrix Human Genome U133 Plus 2.0 Array (hgu133plus2)

Description

Traditional Chinese medicines (TCM), usually composed of a mixture of components, may simultaneously target multiple genes/pathways and thus achieve superior efficacy for complex diseases such as cancer. To identify novel mechanisms of action and potential health benefits for a TCM formula Si-Wu-Tang (SWT) widely used for womens health, we obtained the DNA microarray expression profiles for SWT, its active component ferulic acid, and estradiol in human breast cancer cell line MCF-7 and analyzed the gene expression signatures associated with each treatment using the Connectivity Map (cMAP).

Publication Title

Discovery of molecular mechanisms of traditional Chinese medicinal formula Si-Wu-Tang using gene expression microarray and connectivity map.

Sample Metadata Fields

Cell line, Treatment

View Samples
accession-icon GSE23906
Evaluation of gene expression data generated from expired Affymetrix GeneChip microarrays using MAQC reference RNA samples
  • organism-icon Homo sapiens
  • sample-icon 12 Downloadable Samples
  • Technology Badge Icon Affymetrix Human Genome U133A Array (hgu133a)

Description

The Affymetrix GeneChip system is a commonly used platform for microarray analysis but the technology is inherently expensive. Unfortunately, changes in experimental planning and execution, such as the unavailability of previously anticipated samples or a shift in research focus, may render significant numbers of pre-purchased GeneChip microarrays unprocessed before their manufacturers expiration dates. Researchers and microarray core facilities wonder whether expired microarrays are still useful for gene expression analysis.

Publication Title

Evaluation of gene expression data generated from expired Affymetrix GeneChip® microarrays using MAQC reference RNA samples.

Sample Metadata Fields

Specimen part

View Samples
...

refine.bio is a repository of uniformly processed and normalized, ready-to-use transcriptome data from publicly available sources. refine.bio is a project of the Childhood Cancer Data Lab (CCDL)

fund-icon Fund the CCDL

Developed by the Childhood Cancer Data Lab

Powered by Alex's Lemonade Stand Foundation

Cite refine.bio

Casey S. Greene, Dongbo Hu, Richard W. W. Jones, Stephanie Liu, David S. Mejia, Rob Patro, Stephen R. Piccolo, Ariel Rodriguez Romero, Hirak Sarkar, Candace L. Savonen, Jaclyn N. Taroni, William E. Vauclain, Deepashree Venkatesh Prasad, Kurt G. Wheeler. refine.bio: a resource of uniformly processed publicly available gene expression datasets.
URL: https://www.refine.bio

Note that the contributor list is in alphabetical order as we prepare a manuscript for submission.

BSD 3-Clause LicensePrivacyTerms of UseContact