William Liu

Solr

Summary

Apache Solr is an open source search platform built on Apache Lucene. Other platforms built off of Lucene include Elasticsearch.

http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html

Install

Install Solr with:

For my install, I have it in /opt/solr/

Directory Structure

For my install, we have the following directories that you can find under: http://localhost:8983/solr/#/~java-properties

Running Solr Commands

You can run solr commands from the cli inside the bin dir.

For example, you can check Solr is running at localhost:8938/solr/ with the below command:

$/opt/solr/bin/solr status

Found 1 Solr nodes: 

Solr process 9799 running on port 8983
INFO  - 2018-12-05 15:56:06.474; org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL Credential Provider chain: env;sysprop
{
  "solr_home":"/var/solr/data",
  "version":"7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:07:55",
  "startTime":"2018-12-05T21:36:58.410Z",
  "uptime":"0 days, 1 hours, 19 minutes, 8 seconds",
  "memory":"66.7 MB (%13.6) of 490.7 MB"}

Solr Start and Stop

Start Solr with:

/opt/solr/bin/solr start

$ ./solr start
*** [WARN] *** Your open file limit is currently 1024.  
 It should be set to 65000 to avoid operational disruption. 
 If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
Waiting up to 180 seconds to see Solr running on port 8983 [|]  
Started Solr server on port 8983 (pid=3342). Happy searching!

Stop Solr with:

/opt/solr/bin/solr stop -all  # stops all ports
/opt/solr/bin/solr stop -p 8983  # stops a particular port

$ ./solr stop -p 8983
Sending stop command to Solr running on port 8983 ... waiting up to 180 seconds to allow Jetty process 3342 to stop gracefully.

Solr Prebuilt Examples

You can find examples under: /opt/solr/example/exampledocs.

/opt/solr/example/exampledocs
$ ls
books.csv            hd.xml          manufacturers.xml  monitor2.xml      mp500.xml    sd500.xml      test_utf8.sh
books.json           ipod_other.xml  mem.xml            monitor.xml       post.jar     solr-word.pdf  utf8-example.xml
gb18030-example.xml  ipod_video.xml  money.xml          more_books.jsonl  sample.html  solr.xml       vidcard.xml

You can run one of the following examples:

Solr (Example: techproducts)

You can launch solr using the examples with say:

/opt/solr/bin/solr -e techproducts

$ /opt/solr/bin/solr -e techproducts
*** [WARN] *** Your open file limit is currently 1024.  
 It should be set to 65000 to avoid operational disruption. 
 If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
INFO  - 2018-12-05 16:31:10.393; org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL Credential Provider chain: env;sysprop
Solr home directory /opt/solr/example/techproducts/solr already exists.

Starting up Solr on port 8983 using command:
"/opt/solr/bin/solr" start -p 8983 -s "/opt/solr/example/techproducts/solr"

Waiting up to 180 seconds to see Solr running on port 8983 [/]  
Started Solr server on port 8983 (pid=30485). Happy searching!

    
Created new core 'techproducts'
Indexing tech product example docs from /opt/solr/example/exampledocs
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/techproducts/update using content-type application/xml...
POSTing file mem.xml to [base]
POSTing file hd.xml to [base]
POSTing file monitor.xml to [base]
POSTing file utf8-example.xml to [base]
POSTing file sd500.xml to [base]
POSTing file manufacturers.xml to [base]
POSTing file gb18030-example.xml to [base]
POSTing file money.xml to [base]
POSTing file ipod_other.xml to [base]
POSTing file vidcard.xml to [base]
POSTing file solr.xml to [base]
POSTing file ipod_video.xml to [base]
POSTing file monitor2.xml to [base]
POSTing file mp500.xml to [base]
14 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/techproducts/update...
Time spent: 0:00:01.391

Solr techproducts example launched successfully. Direct your Web browser to http://localhost:8983/solr to visit the Solr Admin UI

Under ‘Core Admin’, you’ll now see a core with techproducts

Create a Core

If you don’t start Solr with an example configuration, you can setup your own core.

/opt/solr/bin/solr create -c mycorename

$ ./solr create -c wills_core
WARNING: Using _default configset with data driven schema functionality. NOT RECOMMENDED for production use.
         To turn off: bin/solr config -c wills_core -p 8983 -action set-user-property -property update.autoCreateFields -value false
INFO  - 2018-12-05 16:36:15.606; org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL Credential Provider chain: env;sysprop

Created new core 'wills_core'

You’ll then have:

instanceDir:/opt/solr/server/solr/wills_core
dataDir:/opt/solr/server/solr/wills_core/data/

The directory structure will kinda look like this:

/opt/solr-7.5.0/server/solr/
▸ configsets/
▾ wills_core/
  ▾ conf/
    ▾ lang/
        contractions_ca.txt
        contractions_fr.txt
        contractions_ga.txt
        contractions_it.txt
        hyphenations_ga.txt
        stemdict_nl.txt
        stoptags_ja.txt
        stopwords_ar.txt
        stopwords_bg.txt
        stopwords_ca.txt
        stopwords_cz.txt
        stopwords_da.txt
        stopwords_de.txt
        stopwords_el.txt
        stopwords_en.txt
        stopwords_es.txt
        stopwords_eu.txt
        stopwords_fa.txt
        stopwords_fi.txt
        stopwords_fr.txt
        stopwords_ga.txt
        stopwords_gl.txt
        stopwords_hi.txt
        stopwords_hu.txt
        stopwords_hy.txt
        stopwords_id.txt
        stopwords_it.txt
        stopwords_ja.txt
        stopwords_lv.txt
        stopwords_nl.txt
        stopwords_no.txt
        stopwords_pt.txt
        stopwords_ro.txt
        stopwords_ru.txt
        stopwords_sv.txt
        stopwords_th.txt
        stopwords_tr.txt
        userdict_ja.txt
      managed-schema
      params.json
      protwords.txt
      solrconfig.xml
      stopwords.txt
      synonyms.txt
  ▾ data/
    ▾ index/
        segments_1
        write.lock
    ▾ snapshot_metadata/
    ▾ tlog/
    core.properties
  README.txt
  solr.xml
  zoo.cfg

Delete a Core

If you created a core and want to delete it, you can run:

$ ./solr delete -c new_core_1
INFO  - 2018-12-05 16:52:45.286; org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL Credential Provider chain: env;sysprop

Deleting core 'new_core_1' using command:
http://localhost:8983/solr/admin/cores?action=UNLOAD&core=new_core_1&deleteIndex=true&deleteDataDir=true&deleteInstanceDir=true

ipod_video.xml

Here’s an example xml Solr document. Notice that each document consists of multiple fields, each with a name and a value.

<add><doc>
  <field name="id">MA147LL/A</field>
  <field name="name">Apple 60 GB iPod with Video Playback Black</field>
  <field name="manu">Apple Computer Inc.</field>
  <!-- Join -->
  <field name="manu_id_s">apple</field>
  <field name="cat">electronics</field>
  <field name="cat">music</field>
  <field name="features">iTunes, Podcasts, Audiobooks</field>
  <field name="features">Stores up to 15,000 songs, 25,000 photos, or 150 hours of video</field>
  <field name="features">2.5-inch, 320x240 color TFT LCD display with LED backlight</field>
  <field name="features">Up to 20 hours of battery life</field>
  <field name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, H.264 video</field>
  <field name="features">Notes, Calendar, Phone book, Hold button, Date display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware, USB 2.0 compatibility, Playback speed control, Rechargeable capability, Battery level indication</field>
  <field name="includes">earbud headphones, USB cable</field>
  <field name="weight">5.5</field>
  <field name="price">399.00</field>
  <field name="popularity">10</field>
  <field name="inStock">true</field>
  <!-- Dodge City store -->
  <field name="store">37.7752,-100.0232</field>
  <field name="manufacturedate_dt">2005-10-12T08:00:00Z</field>
</doc></add>

Logs

You can see where logs are stored in your main admin. On mine we have /var/solr/logs:

/var/solr/logs# tail solr.log
2018-12-05 23:31:03.926 INFO  (ShutdownMonitor) [   ] o.e.j.s.AbstractConnector Stopped ServerConnector@12bfd80d{HTTP/1.1,[http/1.1]}{0.0.0.0:8983}
2018-12-05 23:31:03.926 INFO  (ShutdownMonitor) [   ] o.e.j.s.session node0 Stopped scavenging
2018-12-05 23:31:03.930 INFO  (ShutdownMonitor) [   ] o.a.s.c.CoreContainer Shutting down CoreContainer instance=1095273238
2018-12-05 23:31:03.931 INFO  (ShutdownMonitor) [   ] o.a.s.m.SolrMetricManager Closing metric reporters for registry=solr.node, tag=null
2018-12-05 23:31:03.932 INFO  (ShutdownMonitor) [   ] o.a.s.m.r.SolrJmxReporter Closing reporter [org.apache.solr.metrics.reporters.SolrJmxReporter@58fb7731: rootName = null, domain = solr.node, service url = null, agent id = null] for registry solr.node / com.codahale.metrics.MetricRegistry@43a94c48
2018-12-05 23:31:03.945 INFO  (ShutdownMonitor) [   ] o.a.s.m.SolrMetricManager Closing metric reporters for registry=solr.jvm, tag=null
2018-12-05 23:31:03.945 INFO  (ShutdownMonitor) [   ] o.a.s.m.r.SolrJmxReporter Closing reporter [org.apache.solr.metrics.reporters.SolrJmxReporter@1a45193b: rootName = null, domain = solr.jvm, service url = null, agent id = null] for registry solr.jvm / com.codahale.metrics.MetricRegistry@4e2f501e
2018-12-05 23:31:03.948 INFO  (ShutdownMonitor) [   ] o.a.s.m.SolrMetricManager Closing metric reporters for registry=solr.jetty, tag=null
2018-12-05 23:31:03.948 INFO  (ShutdownMonitor) [   ] o.a.s.m.r.SolrJmxReporter Closing reporter [org.apache.solr.metrics.reporters.SolrJmxReporter@38f116f6: rootName = null, domain = solr.jetty, service url = null, agent id = null] for registry solr.jetty / com.codahale.metrics.MetricRegistry@735eea59
2018-12-05 23:31:03.953 INFO  (ShutdownMonitor) [   ] o.e.j.s.h.ContextHandler Stopped o.e.j.w.WebAppContext@37271612{/solr,null,UNAVAILABLE}{/opt/solr-7.5.0/server/solr-webapp/webapp}

Solr in SolrCloud Mode

Start Solr as a two-node cluster (both nodes on the same machine) and creat a collection during startup.

./bin/solr start -e cloud

Setup configs

Zookeeper

Note that since we didn’t specify a Zookeeper cluster, Solr then launches its own Zookeeper and connects both nodes to it.

Ports

You’ll get a prompt for setting up your cluster. By default, we’ll have:

Collection

We’ll name the collection, default gettingstarted, tutorial says techproducts so we’ll use that. Normally you can see collections under the web GUI like:

http://localhost:8983/solr/#/~collections/techproducts

Shards

We’ll specify the number of shards (how you want to split your index). Two means we’ll split relatively evenly across the two nodes we specified above.

Replicas

A replica is the copy of the index that’s used for failover.

configSet

Solr has two main configuration files:

During initial setup, you’ll be asked whether you want default (bare bones) or sample_techproducts_configs (we’ll want the latter)

You’ll see your configs over at:

http://localhost:8983/solr/techproducts/config

A SolrCloud example will be running at:

http://localhost:8983/solr

http://localhost:8983/solr/#/

In the ‘Cloud Tab’ under ‘Cloud’ > ‘Graph’, you can see techproducts appear in two shards, with each shard having a replica on each server and port.

Tutorial

To get familiar with Solr, we’ll:

POST Data

To insert data, you’ll need the bin/post tool to help index various types of documents. Command:

/opt/solr/bin/post -c techproducts /opt/solr/example/exampledocs/*

We’re basically doing a POST with all this data here:

/opt/solr/example/exampledocs
$ ls
books.csv            hd.xml          manufacturers.xml  monitor2.xml      mp500.xml    sd500.xml      test_utf8.sh
books.json           ipod_other.xml  mem.xml            monitor.xml       post.jar     solr-word.pdf  utf8-example.xml
gb18030-example.xml  ipod_video.xml  money.xml          more_books.jsonl  sample.html  solr.xml       vidcard.xml

I created my own core wills_core so my command is:

$/opt/solr/bin/post -c wills_core /opt/solr/example/exampledocs/*

The results look like:

implePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/wills_core/update...
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file books.csv (text/csv) to [base]
POSTing file books.json (application/json) to [base]/json/docs
POSTing file gb18030-example.xml (application/xml) to [base]
POSTing file hd.xml (application/xml) to [base]
POSTing file ipod_other.xml (application/xml) to [base]
POSTing file ipod_video.xml (application/xml) to [base]
POSTing file manufacturers.xml (application/xml) to [base]
POSTing file mem.xml (application/xml) to [base]
POSTing file money.xml (application/xml) to [base]
POSTing file monitor2.xml (application/xml) to [base]
POSTing file monitor.xml (application/xml) to [base]
POSTing file more_books.jsonl (application/json) to [base]/json/docs
POSTing file mp500.xml (application/xml) to [base]
POSTing file post.jar (application/octet-stream) to [base]/extract
POSTing file sample.html (text/html) to [base]/extract
POSTing file sd500.xml (application/xml) to [base]
POSTing file solr-word.pdf (application/pdf) to [base]/extract
POSTing file solr.xml (application/xml) to [base]
POSTing file test_utf8.sh (application/octet-stream) to [base]/extract
POSTing file utf8-example.xml (application/xml) to [base]
POSTing file vidcard.xml (application/xml) to [base]
21 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/wills_core/update...
Time spent: 0:00:04.749

We now have data in our core!

GET our data

So if we have a core named wills_core, we can query it here through the GUI:

http://localhost:8983/solr/#/wills_core/query

Click on the ‘Execute’ button and you’ll get 10 results. The command we’re sending is a GET with a query parameter q that returns all results *:*.

$curl http://localhost:8983/solr/wills_core/select?q=*:* 

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"*:*"}},
  "response":{"numFound":52,"start":0,"docs":[
      {
        "id":"0553573403",
        "cat":["book"],
        "name":["A Game of Thrones"],
        "price":[7.99],
        "inStock":[true],
        "author":["George R.R. Martin"],
        "series_t":"A Song of Ice and Fire",
        "sequence_i":1,
        "genre_s":"fantasy",
        "_version_":1619510392534335488},
      {
        "id":"0553579908",
        "cat":["book"],
        "name":["A Clash of Kings"],
        "price":[7.99],
        "inStock":[true],
        "author":["George R.R. Martin"],
        "series_t":"A Song of Ice and Fire",
        "sequence_i":2,
        "genre_s":"fantasy",
        "_version_":1619510392599347200},
      {
        "id":"055357342X",
        "cat":["book"],
        "name":["A Storm of Swords"],
        "price":[7.99],
        "inStock":[true],
        "author":["George R.R. Martin"],
        "series_t":"A Song of Ice and Fire",
        "sequence_i":3,
        "genre_s":"fantasy",
        "_version_":1619510392601444352},
      {
        "id":"0553293354",
        "cat":["book"],
        "name":["Foundation"],
        "price":[7.99],
        "inStock":[true],
        "author":["Isaac Asimov"],
        "series_t":"Foundation Novels",
        "sequence_i":1,
        "genre_s":"scifi",
        "_version_":1619510392602492928},
      {
        "id":"0812521390",
        "cat":["book"],
        "name":["The Black Company"],
        "price":[6.99],
        "inStock":[false],
        "author":["Glen Cook"],
        "series_t":"The Chronicles of The Black Company",
        "sequence_i":1,
        "genre_s":"fantasy",
        "_version_":1619510392603541504},
      {
        "id":"0812550706",
        "cat":["book"],
        "name":["Ender's Game"],
        "price":[6.99],
        "inStock":[true],
        "author":["Orson Scott Card"],
        "series_t":"Ender",
        "sequence_i":1,
        "genre_s":"scifi",
        "_version_":1619510392604590080},
      {
        "id":"0441385532",
        "cat":["book"],
        "name":["Jhereg"],
        "price":[7.95],
        "inStock":[false],
        "author":["Steven Brust"],
        "series_t":"Vlad Taltos",
        "sequence_i":1,
        "genre_s":"fantasy",
        "_version_":1619510392605638656},
      {
        "id":"0380014300",
        "cat":["book"],
        "name":["Nine Princes In Amber"],
        "price":[6.99],
        "inStock":[true],
        "author":["Roger Zelazny"],
        "series_t":"the Chronicles of Amber",
        "sequence_i":1,
        "genre_s":"fantasy",
        "_version_":1619510392607735808},
      {
        "id":"0805080481",
        "cat":["book"],
        "name":["The Book of Three"],
        "price":[5.99],
        "inStock":[true],
        "author":["Lloyd Alexander"],
        "series_t":"The Chronicles of Prydain",
        "sequence_i":1,
        "genre_s":"fantasy",
        "_version_":1619510392608784384},
      {
        "id":"080508049X",
        "cat":["book"],
        "name":["The Black Cauldron"],
        "price":[5.99],
        "inStock":[true],
        "author":["Lloyd Alexander"],
        "series_t":"The Chronicles of Prydain",
        "sequence_i":2,
        "genre_s":"fantasy",
        "_version_":1619510392609832960}]
  }}

You can modify the query parameters and query for say just a specific field foundation:

$ curl http://localhost:8983/solr/wills_core/select?q=foundation
{
  "responseHeader":{
    "status":0,
    "QTime":14,
    "params":{
      "q":"foundation"}},
  "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"/opt/solr/example/exampledocs/test_utf8.sh",
        "stream_size":[3742],
        "x_parsed_by":["org.apache.tika.parser.DefaultParser",
          "org.apache.tika.parser.txt.TXTParser"],
        "stream_content_type":["application/octet-stream"],
        "content_encoding":["ISO-8859-1"],
        "resourcename":["/opt/solr/example/exampledocs/test_utf8.sh"],
        "content_type":["application/x-sh; charset=ISO-8859-1"],
        "_version_":1619510396080619520}]
  }}

GET Query Parameters

An example GET is: curl http://localhost:8983/solr/wills_core/select?q=foundation

https://lucene.apache.org/solr/guide/7_6/common-query-parameters.html

The following query parameters are built up with: