I have downloaded the copy number data for ovarian cancer cell lines. The table that I get has all of the genes listed as columns and the cell lines with the ACH designations as the rows. The problem is that the actual copy number value is the same for all genes across each row. Different rows for different cell lines have different values for cn, but for every cell line, the value is the same for all 15,000 or so genes.
Did I do something wrong? I notice this same problem when I download copy number data for all of the cell lines.
SPE
I tried reproducing this issue and I think I see what you mean, they look like all genes have the same CN value:
> cn <- read.csv("~/Downloads/Copy_Number_22Q2_Public_subsetted.csv", row.names=1)
> cn <- as.matrix(cn)
> str(cn)
num [1:73, 1:24352] 1.168 1.001 1.038 0.823 1.159 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:73] "ACH-000520" "ACH-000657" "ACH-001278" "ACH-000713" ...
..$ : chr [1:24352] "DDX11L2" "WASH7P" "MIR6859.1" "MIR1302.2" ...
> cn[1,1:20]
DDX11L2 WASH7P MIR6859.1 MIR1302.2 FAM138A OR4F5 WASH9P MIR6859.2
1.168424 1.168424 1.168424 1.168424 1.168424 1.168424 1.168424 1.168424
OR4F29 OR4F16 LINC01409 FAM87B LINC01128 LINC00115 FAM41C LINC02593
1.168424 1.168424 1.168424 1.168424 1.168424 1.168424 1.168424 1.168424
SAMD11 NOC2L KLHL17 PLEKHN1
1.168424 1.168424 1.168424 1.168424
However, I don’t think the values are actually all the same.
> length(unique(cn[1,]))
[1] 482
It looks like there are 482 distinct CN values for the 18k different genes, so many genes have the same CN value.
This is actually not surprising because the copy number amplifications/deletions are typically much bigger segments than individual genes, so many genes have identical values.
Thanks,
Phil