Reverse Homology

Q: Where can I find more details about the underlying method?

A: Check out our paper in PLOS Computational Biology. Briefly, this model underlying the backend is trained to identify features in IDR sequences that are conserved across evolution. By applying neural network interpretation tools to the trained model, we can produce visualizations of the features our model hypothesizes are conserved in an input single sequence (and thus important to function). If you get any useful results, you can also cite us using that paper.

Q: Why can't I input sequences longer than 256 residues?

A: Our neural network restricts sequences to shorter than this. The architecture of our neural network does not currently allow for long-range distal interactions, so inputting a longer unbroken sequence would not make for a difference in the outputs. We recommend either breaking up your sequence into multiple 256 windows, or just picking the window you think is most interesting.

Q: My species is not listed under the models available. What do I do about this?

A: We plan to work on a more "universal" model that will incorporate IDRs from all species. For now, keep in mind that the models should be a little more general than strictly human IDRs alone, since they are trained using homologs from other species also for these IDRs. We advise choosing a model close to the species you want to analyze: for example, for vertebrates, we expect the human IDR model will generalize.

Q: I'm getting an error that the sequence must only contain single letter codes for canonical amino acids.

A: Any letters beyond the 20 canonical amino acids are not accepted: this includes non-canonical amino acids like Z, B, or unknowns like X. You may also want to check your input to make sure there aren't any symbols or numbers.

Q: My sequence is taking a long time to run!

A: There may be many sequences in our queue. You may want to consider working with our open-source code here instead.

FAQ