RT11

From CLAB

Jump to: navigation, search

This page is for discussion of RT 11

GVRG: Build web based search driven by any keyword from all available data

[edit] Problem

RT 10 describes a search function that is driven by gene ID. This project is an extention of that project to enable searching by any keyword or phrase (e.g.: "isomerase") from all available datasources.

[edit] Design

  • Choose an index engine
    • Perhaps Swish-e? It also has Perl hooks.
    • I'm feeling pretty good about Solr so far. It's a Lucene derivative.
      • Looks like the GMOD Java folk has done some work for me: LuceGene
        • Uses Readseq, a rather powerful looking Java sequence reader / reformater.
        • EB-eye Search also has potential. I emailed their help desk asking if their content indexing source code is available. No response. I added a complaint the LuceGene homepage.
        • Also see LuceGene homepage for Lucene search at Uniprot.
      • Hmm... Lucene doesn't natively do numeric fields? eek!
        • But Solr does. phew!
    • Other Bio search engines
  • Write programs to build the index from all available datasources (LuceGene has done the hard work already?)
    • GenBank: /home/jhannah/apache2/htdocs/gbrowse/databases/misc/*gbk
    • "10 cluster RNA I+II+III PROTECTED pwdchip 04-25-06.xls"
  • Extend this tool to use the search engine if the search string isn't a gene ID: http://kiran.homelinux.net:8081/cgi-bin/gvrg.pl
Personal tools