Setting up NBoost for Elasticsearch

In this example we will set up a proxy to sit in between the client and Elasticsearch and boost the results!

Preliminaries

  1. Install NBoost for Pytorch.

    pip install nboost[pt]
    
  2. Set up an Elasticsearch Server

    🔔 If you already have an Elasticsearch server, you can skip this step!

    If you don’t have Elasticsearch, not to worry! You can set up a local Elasticsearch cluster by using docker. First, get the ES image by running:

    docker pull elasticsearch:7.4.2
    

    Once you have the image, you can run an Elasticsearch server via:

    docker run -d -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.4.2
    
  3. Index some data

    NBoost has a handy indexing tool built in (nboost-index). For demonstration purposes, will be indexing a set of passages about traveling and hotels. You can add the index to your Elasticsearch server by running:

    travel.csv comes with NBoost

    nboost-index --file travel.csv --name travel --delim ,
    

Deploying the proxy

Now we’re ready to deploy our Neural Proxy! There are three ways to configure NBoost.

  1. Via command line.

    On the command line, we can run:

    nboost                                  \
        --uhost localhost                   \
        --uport 9200                        \
        --search_route "/<index>/_search"   \
        --query_path url.query.q            \
        --topk_path url.query.size          \
        --default_topk 10                   \
        --topn 50                           \
        --choices_path body.hits.hits       \
        --cvalues_path _source.passage
    

    📢 The --uhost and --uport should be the same as the Elasticsearch server above! Uhost and uport are short for upstream-host and upstream-port (referring to the upstream server).

    Now let’s test it out! Hit the Elasticsearch with:

    curl "http://localhost:8000/travel/_search?pretty&q=passage:vegas&size=2"
    
  2. Via json.

    On the command line, let’s run:

    nboost --search_route "/<index>/_search"
    

    In a python script, we can run:

    import requests
    from pprint import pprint
    
    response = requests.get(
        url='http://localhost:8000/travel/_search',
        json={
            'nboost': {
                'uhost': 'localhost',
                'uport': 9200,
                'query_path': 'body.query.match.passage',
                'topk_path': 'body.size',
                'default_topk': 10,
                'topn': 50,
                'choices_path': 'body.hits.hits',
                'cvalues_path': '_source.passage'
            },
            'size': 2,
            'query': {
                'match': {'passage': 'I want a Louisiana hotel with a pool'}
            }
        }
    )
    
    pprint(response.json())
    
  3. Via query params.

    On the command line, let’s run:

    nboost --search_route "/<index>/_search"
    

    In a python script, we can run:

    import requests
    from pprint import pprint
    
    response = requests.get(
        url='http://localhost:8000/travel/_search',
        params={
            'uhost': 'localhost',
            'uport': 9200,
            'query_path': 'url.query.q',
            'topk_path': 'url.query.size',
            'default_topk': 10,
            'topn': 50,
            'choices_path': 'body.hits.hits',
            'cvalues_path': '_source.passage',
            'q': 'passage:I want a vegas hotel with a pool',
            'size': 2
        }
    )
    
    pprint(response.json())
    

No matter how we configure NBoost, if the Elasticsearch result has the nboost tag in it, congratulations it’s working!

success installation of NBoost