wiki:SphinxUnixPatch

Sphinx UNIX socket patch

What is Sphinx

Sphinx is a full-text search engine, distributed under GPL version 2. Commercial license is also available for embedded use.

Generally, it's a standalone search engine, meant to provide fast, size-efficient and relevant fulltext search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently built-in data sources support fetching data either via direct connection to MySQL or PostgreSQL, or using XML pipe mechanism (a pipe to indexer in special XML-based format which Sphinx recognizes).

Sphinx distribution includes the following programs:

  • indexer: an utility to create fulltext indices;
  • search: a simple (test) utility to query fulltext indices from command line;
  • searchd: a daemon to search through fulltext indices from external software (Web scripts using Sphinx API; or MySQL with SphinxSE; or your application server);
  • sphinxapi: a set of API libraries for popular Web scripting languages (there are native API ports for PHP, Python, Java, Perl, and Ruby).

What is the patch for

The patch enables you to use UNIX sockets in order to communicate with searchd daemon. Originally, it can only listen on a TCP port which, in some cases, can not be enough. The patch is for Sphinx release number 0.9.8. It contains changes for both: searchd daemon and sphinx PHP API.

How to use the patch

  1. Download the Sphinx archive from  here
  1. Extract the archive:
tar -zxvf sphinx-0.9.8.tar.gz
  1. Enter the newly created sphinx directory and download the patch to it from here
  1. Apply the patch:
patch -p1 <sphinx-0.9.8-unix_socket.patch
  1. Compile and install patched version of Sphinx:
./configure
make
make install
  1. Edit your sphinx configuration file changing the address option to:
address                         = unix:///var/sphinx/sphinx_socket
  1. Run sphinx searchd daemon giving a path to your config file:
/usr/local/bin/searchd --config /var/sphinx/sphinx.conf  
  1. Check netstat to verify that searchd is listening on the given unix socket:
netstat -a --unix

Active UNIX domain sockets (servers and established)
Proto RefCnt Flags       Type       State         I-Node Path
unix  2      [ ACC ]     STREAM     LISTENING     13079  /var/lib/mysql/mysql.sock
unix  3      [ ]         DGRAM                    12917  /dev/log
unix  2      [ ACC ]     STREAM     LISTENING     1532071 /var/sphinx/sphinx_socket
unix  2      [ ]         DGRAM                    13097  

As we can see searchd daemon is listening on: /var/sphinx/sphinx_socket.

Example usage of UNIX sockets in Sphinx

Let's say you have two separate servers. One is your webserver and the other your database server that is running Sphinx (searchd daemon). You want your php scripts to be able to retrieve data from the search engine. Normally, searchd daemon is listening on TCP port so you can use it to make a remote connection from the webserver. The only drawback is that your data will be sent unencrypted across the network.

This is where Sphinx Unix socket patch comes in handy. By setting your sphinx searchd daemon to listen on a unix socket instead of TCP port, you can easily set up a SSL tunnel connecting two unix sockets (one on each end, that is webserver and sphinx).

The connection between two sockets can be set up using one additional program called  socat. Here's how to do this:

On the sphinx server

  1. Install sphinx on your database/sphinx server (as shown above).
  1. Install socat and generate SSL keys as described on  here.
  1. Run socat specifying paths to your previously generated certificate (for the server) files and unix socket:
nohup socat ssl-l:4444,reuseaddr,fork,cert=/path/to/SSL_certs/DIR/sphinx_server.pem,cafile=/path/to/SSL_certs/DIR/webserver.crt,verify=1 UNIX:/var/sphinx/sphinx_socket &

and make sure you can see the socat process running:

ps aux | grep socat

Now server side is ready to pass everything that comes on port 4444 to the local unix socket (thus reaching sphinx searchd daemon).

On the web server

  1. Install socat and copy the generated keys (as above).
  1. Run socat specifying paths to the SSL certificates, webserver IP address, and a path you want socat to create a listening unix socket.
nohup socat UNIX-LISTEN:/var/socketproxy/searchd,reuseaddr,fork,user=apache ssl:web_server_IP_addr:4444,cert=/path/to/SSL_certs/webserver.pem,cafile=/path/to/SSL_certs/sphinx_server.crt &

and make sure you can see the socat process running:

ps aux | grep socat
  1. Copy previously patched sphinxapi.php to your web directory:
sphinx-0.9.8# cp api/sphinxapi.php /var/www/html/sphinx 
  1. Now you can create a simple php script (which uses sphinxapi) and you should be able to retrieve data from your sphinx server.

The only thing that you need to remember is that you must provide the path to the unix socket, adding unix:// prefix, when using SetServer function.

For example:

$socket_path = "unix:///var/socketproxy/searchd"

$sphinxClient->SetServer($socket_path, 0);

Note: the second argument (port number) is ignored when unix socket is used.