An API sounds like the only real option. Flat files that big will be pretty difficult to parse and extract information without specialized tools. Additionally, it may be for certain genes you will want to go much deeper than the top 100 genes/features. Could you make an API that exposes:
- Identifying all the features which are significantly associated with a given feature. With the ability to change the filtering criteria:
- P-value filter
- q-value (corrected p-value) filter
- feature types included (dependencies, RNA expression, mutation, etc.)
- A way to use number 1 but for a bunch of genes using a uniform set of filters to generate a network.
The results would come out as a csv with:
geneDep1,feature1,stat(to give direction of relationship),p-value,q-value