Update:Remora/Search Revamp

From MozillaWiki
Jump to: navigation, search

Problem Definition

The current search is slow and has a way of bringing the database to its knees. It also does not answer queries in the most useful way (see bugs for details).

Search rewrite related bugs

3.4.3

  • bug 378657, Exact Name hits should sort first
  • bug 419057, Add support for querying by os and app version to search and recommended calls

3.4

  • bug 400986, Optimize search performance
  • bug 433741, Search translation in current locale + en-US only

3.x triaged

  • bug 401849, Implement special search intersession for these terms


Plan of attack

Implement bug 378657, bug 419057, and bug 401849 now, as in 3.4.3 or potentially some in 3.4.4 if there is a 3.4.4. Possibly implement bug 433741 depending on ease and timing for FTS solution.

Implement a FTS engine to solve bug 400986 and make bug 433741 redundant long term.

FTS Options

Sphinx

http://www.sphinxsearch.com/

Sphinx can FTI existing InnoDB tables among other datasources and has a built in PHP API.

One limitation is that Sphinx only does case folding on English and Russian.

Zend_Search_Lucene

http://framework.zend.com/manual/en/zend.search.lucene.html

I'd recommend this over straight Lucene (http://lucene.apache.org/) as it's a PHP native implementation with PHP API and therefore matches the team skillset a little better (rather than having to code in Java and use the infamously buggy PHP/Java bridge).

A significant problem is referenced here http://framework.zend.com/manual/en/zend.search.lucene.index-creation.html According to the PHP documentation, "flock() will not work on NFS and many other networked file systems Do not use networked file systems with Zend_Search_Lucene."

This seems to rule out ZSL.

MySQL native FTS

Requires tables to be MyISAM which is not ideal, but is a possibility.

Recommendations

  • Fix the smaller bugs in 3.4.3 and 3.4.4
  • Implement Sphinx before the final release of Fx3