Apache Solr is an open source search platform built on Apache Lucene. Other platforms built off of Lucene include Elasticsearch.
http://lucene.apache.org/solr/guide/7_5/solr-tutorial.html
Install Solr with:
tar -xvf solr-7.5.0.tgz
./install_solr_service.sh solr-7.5.0.tgz
sudo service solr status
For my install, I have it in /opt/solr/
For my install, we have the following directories that you can find under: http://localhost:8983/solr/#/~java-properties
/opt/solr
- solr.install.dir/opt/solr/server/solr/configsets/_default/conf
- solr.default.confdir/var/solr/data
- solr.solr.home/var/solr/log
- solr.log.dir/var/solr
- user.dirYou can run solr commands from the cli inside the bin
dir.
For example, you can check Solr is running at localhost:8938/solr/
with the below command:
$/opt/solr/bin/solr status
Found 1 Solr nodes:
Solr process 9799 running on port 8983
INFO - 2018-12-05 15:56:06.474; org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL Credential Provider chain: env;sysprop
{
"solr_home":"/var/solr/data",
"version":"7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df - jimczi - 2018-09-18 13:07:55",
"startTime":"2018-12-05T21:36:58.410Z",
"uptime":"0 days, 1 hours, 19 minutes, 8 seconds",
"memory":"66.7 MB (%13.6) of 490.7 MB"}
Start Solr with:
/opt/solr/bin/solr start
$ ./solr start
*** [WARN] *** Your open file limit is currently 1024.
It should be set to 65000 to avoid operational disruption.
If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
Waiting up to 180 seconds to see Solr running on port 8983 [|]
Started Solr server on port 8983 (pid=3342). Happy searching!
Stop Solr with:
/opt/solr/bin/solr stop -all # stops all ports
/opt/solr/bin/solr stop -p 8983 # stops a particular port
$ ./solr stop -p 8983
Sending stop command to Solr running on port 8983 ... waiting up to 180 seconds to allow Jetty process 3342 to stop gracefully.
You can find examples under: /opt/solr/example/exampledocs
.
/opt/solr/example/exampledocs
$ ls
books.csv hd.xml manufacturers.xml monitor2.xml mp500.xml sd500.xml test_utf8.sh
books.json ipod_other.xml mem.xml monitor.xml post.jar solr-word.pdf utf8-example.xml
gb18030-example.xml ipod_video.xml money.xml more_books.jsonl sample.html solr.xml vidcard.xml
You can run one of the following examples:
techproducts
dih
schemaless
cloud
You can launch solr using the examples with say:
/opt/solr/bin/solr -e techproducts
$ /opt/solr/bin/solr -e techproducts
*** [WARN] *** Your open file limit is currently 1024.
It should be set to 65000 to avoid operational disruption.
If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
INFO - 2018-12-05 16:31:10.393; org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL Credential Provider chain: env;sysprop
Solr home directory /opt/solr/example/techproducts/solr already exists.
Starting up Solr on port 8983 using command:
"/opt/solr/bin/solr" start -p 8983 -s "/opt/solr/example/techproducts/solr"
Waiting up to 180 seconds to see Solr running on port 8983 [/]
Started Solr server on port 8983 (pid=30485). Happy searching!
Created new core 'techproducts'
Indexing tech product example docs from /opt/solr/example/exampledocs
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/techproducts/update using content-type application/xml...
POSTing file mem.xml to [base]
POSTing file hd.xml to [base]
POSTing file monitor.xml to [base]
POSTing file utf8-example.xml to [base]
POSTing file sd500.xml to [base]
POSTing file manufacturers.xml to [base]
POSTing file gb18030-example.xml to [base]
POSTing file money.xml to [base]
POSTing file ipod_other.xml to [base]
POSTing file vidcard.xml to [base]
POSTing file solr.xml to [base]
POSTing file ipod_video.xml to [base]
POSTing file monitor2.xml to [base]
POSTing file mp500.xml to [base]
14 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/techproducts/update...
Time spent: 0:00:01.391
Solr techproducts example launched successfully. Direct your Web browser to http://localhost:8983/solr to visit the Solr Admin UI
Under ‘Core Admin’, you’ll now see a core with techproducts
If you don’t start Solr with an example configuration, you can setup your own core.
/opt/solr/bin/solr create -c mycorename
$ ./solr create -c wills_core
WARNING: Using _default configset with data driven schema functionality. NOT RECOMMENDED for production use.
To turn off: bin/solr config -c wills_core -p 8983 -action set-user-property -property update.autoCreateFields -value false
INFO - 2018-12-05 16:36:15.606; org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL Credential Provider chain: env;sysprop
Created new core 'wills_core'
You’ll then have:
instanceDir:/opt/solr/server/solr/wills_core
dataDir:/opt/solr/server/solr/wills_core/data/
The directory structure will kinda look like this:
/opt/solr-7.5.0/server/solr/
▸ configsets/
▾ wills_core/
▾ conf/
▾ lang/
contractions_ca.txt
contractions_fr.txt
contractions_ga.txt
contractions_it.txt
hyphenations_ga.txt
stemdict_nl.txt
stoptags_ja.txt
stopwords_ar.txt
stopwords_bg.txt
stopwords_ca.txt
stopwords_cz.txt
stopwords_da.txt
stopwords_de.txt
stopwords_el.txt
stopwords_en.txt
stopwords_es.txt
stopwords_eu.txt
stopwords_fa.txt
stopwords_fi.txt
stopwords_fr.txt
stopwords_ga.txt
stopwords_gl.txt
stopwords_hi.txt
stopwords_hu.txt
stopwords_hy.txt
stopwords_id.txt
stopwords_it.txt
stopwords_ja.txt
stopwords_lv.txt
stopwords_nl.txt
stopwords_no.txt
stopwords_pt.txt
stopwords_ro.txt
stopwords_ru.txt
stopwords_sv.txt
stopwords_th.txt
stopwords_tr.txt
userdict_ja.txt
managed-schema
params.json
protwords.txt
solrconfig.xml
stopwords.txt
synonyms.txt
▾ data/
▾ index/
segments_1
write.lock
▾ snapshot_metadata/
▾ tlog/
core.properties
README.txt
solr.xml
zoo.cfg
If you created a core and want to delete it, you can run:
$ ./solr delete -c new_core_1
INFO - 2018-12-05 16:52:45.286; org.apache.solr.util.configuration.SSLCredentialProviderFactory; Processing SSL Credential Provider chain: env;sysprop
Deleting core 'new_core_1' using command:
http://localhost:8983/solr/admin/cores?action=UNLOAD&core=new_core_1&deleteIndex=true&deleteDataDir=true&deleteInstanceDir=true
Here’s an example xml Solr document. Notice that each document consists of multiple fields, each with a name and a value.
<add><doc>
<field name="id">MA147LL/A</field>
<field name="name">Apple 60 GB iPod with Video Playback Black</field>
<field name="manu">Apple Computer Inc.</field>
<!-- Join -->
<field name="manu_id_s">apple</field>
<field name="cat">electronics</field>
<field name="cat">music</field>
<field name="features">iTunes, Podcasts, Audiobooks</field>
<field name="features">Stores up to 15,000 songs, 25,000 photos, or 150 hours of video</field>
<field name="features">2.5-inch, 320x240 color TFT LCD display with LED backlight</field>
<field name="features">Up to 20 hours of battery life</field>
<field name="features">Plays AAC, MP3, WAV, AIFF, Audible, Apple Lossless, H.264 video</field>
<field name="features">Notes, Calendar, Phone book, Hold button, Date display, Photo wallet, Built-in games, JPEG photo playback, Upgradeable firmware, USB 2.0 compatibility, Playback speed control, Rechargeable capability, Battery level indication</field>
<field name="includes">earbud headphones, USB cable</field>
<field name="weight">5.5</field>
<field name="price">399.00</field>
<field name="popularity">10</field>
<field name="inStock">true</field>
<!-- Dodge City store -->
<field name="store">37.7752,-100.0232</field>
<field name="manufacturedate_dt">2005-10-12T08:00:00Z</field>
</doc></add>
You can see where logs are stored in your main admin. On mine we have /var/solr/logs
:
/var/solr/logs# tail solr.log
2018-12-05 23:31:03.926 INFO (ShutdownMonitor) [ ] o.e.j.s.AbstractConnector Stopped ServerConnector@12bfd80d{HTTP/1.1,[http/1.1]}{0.0.0.0:8983}
2018-12-05 23:31:03.926 INFO (ShutdownMonitor) [ ] o.e.j.s.session node0 Stopped scavenging
2018-12-05 23:31:03.930 INFO (ShutdownMonitor) [ ] o.a.s.c.CoreContainer Shutting down CoreContainer instance=1095273238
2018-12-05 23:31:03.931 INFO (ShutdownMonitor) [ ] o.a.s.m.SolrMetricManager Closing metric reporters for registry=solr.node, tag=null
2018-12-05 23:31:03.932 INFO (ShutdownMonitor) [ ] o.a.s.m.r.SolrJmxReporter Closing reporter [org.apache.solr.metrics.reporters.SolrJmxReporter@58fb7731: rootName = null, domain = solr.node, service url = null, agent id = null] for registry solr.node / com.codahale.metrics.MetricRegistry@43a94c48
2018-12-05 23:31:03.945 INFO (ShutdownMonitor) [ ] o.a.s.m.SolrMetricManager Closing metric reporters for registry=solr.jvm, tag=null
2018-12-05 23:31:03.945 INFO (ShutdownMonitor) [ ] o.a.s.m.r.SolrJmxReporter Closing reporter [org.apache.solr.metrics.reporters.SolrJmxReporter@1a45193b: rootName = null, domain = solr.jvm, service url = null, agent id = null] for registry solr.jvm / com.codahale.metrics.MetricRegistry@4e2f501e
2018-12-05 23:31:03.948 INFO (ShutdownMonitor) [ ] o.a.s.m.SolrMetricManager Closing metric reporters for registry=solr.jetty, tag=null
2018-12-05 23:31:03.948 INFO (ShutdownMonitor) [ ] o.a.s.m.r.SolrJmxReporter Closing reporter [org.apache.solr.metrics.reporters.SolrJmxReporter@38f116f6: rootName = null, domain = solr.jetty, service url = null, agent id = null] for registry solr.jetty / com.codahale.metrics.MetricRegistry@735eea59
2018-12-05 23:31:03.953 INFO (ShutdownMonitor) [ ] o.e.j.s.h.ContextHandler Stopped o.e.j.w.WebAppContext@37271612{/solr,null,UNAVAILABLE}{/opt/solr-7.5.0/server/solr-webapp/webapp}
Start Solr as a two-node cluster (both nodes on the same machine) and creat a collection during startup.
./bin/solr start -e cloud
Note that since we didn’t specify a Zookeeper cluster, Solr then launches its own Zookeeper and connects both nodes to it.
You’ll get a prompt for setting up your cluster. By default, we’ll have:
8983
for the first node7574
for the second nodeWe’ll name the collection, default gettingstarted
, tutorial says techproducts
so we’ll use that. Normally you can see collections under the web GUI like:
http://localhost:8983/solr/#/~collections/techproducts
We’ll specify the number of shards (how you want to split your index). Two means we’ll split relatively evenly across the two nodes we specified above.
A replica is the copy of the index that’s used for failover.
Solr has two main configuration files:
managed-schema
or schema.xml
)solrconfig.xml
During initial setup, you’ll be asked whether you want default
(bare bones)
or sample_techproducts_configs
(we’ll want the latter)
You’ll see your configs over at:
http://localhost:8983/solr/techproducts/config
A SolrCloud example will be running at:
http://localhost:8983/solr
http://localhost:8983/solr/#/
In the ‘Cloud Tab’ under ‘Cloud’ > ‘Graph’, you can see techproducts
appear in
two shards, with each shard having a replica on each server and port.
To get familiar with Solr, we’ll:
To insert data, you’ll need the bin/post
tool to help index various types of documents.
Command:
/opt/solr/bin/post -c techproducts /opt/solr/example/exampledocs/*
We’re basically doing a POST with all this data here:
/opt/solr/example/exampledocs
$ ls
books.csv hd.xml manufacturers.xml monitor2.xml mp500.xml sd500.xml test_utf8.sh
books.json ipod_other.xml mem.xml monitor.xml post.jar solr-word.pdf utf8-example.xml
gb18030-example.xml ipod_video.xml money.xml more_books.jsonl sample.html solr.xml vidcard.xml
I created my own core wills_core
so my command is:
$/opt/solr/bin/post -c wills_core /opt/solr/example/exampledocs/*
The results look like:
implePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/wills_core/update...
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log
POSTing file books.csv (text/csv) to [base]
POSTing file books.json (application/json) to [base]/json/docs
POSTing file gb18030-example.xml (application/xml) to [base]
POSTing file hd.xml (application/xml) to [base]
POSTing file ipod_other.xml (application/xml) to [base]
POSTing file ipod_video.xml (application/xml) to [base]
POSTing file manufacturers.xml (application/xml) to [base]
POSTing file mem.xml (application/xml) to [base]
POSTing file money.xml (application/xml) to [base]
POSTing file monitor2.xml (application/xml) to [base]
POSTing file monitor.xml (application/xml) to [base]
POSTing file more_books.jsonl (application/json) to [base]/json/docs
POSTing file mp500.xml (application/xml) to [base]
POSTing file post.jar (application/octet-stream) to [base]/extract
POSTing file sample.html (text/html) to [base]/extract
POSTing file sd500.xml (application/xml) to [base]
POSTing file solr-word.pdf (application/pdf) to [base]/extract
POSTing file solr.xml (application/xml) to [base]
POSTing file test_utf8.sh (application/octet-stream) to [base]/extract
POSTing file utf8-example.xml (application/xml) to [base]
POSTing file vidcard.xml (application/xml) to [base]
21 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/wills_core/update...
Time spent: 0:00:04.749
We now have data in our core!
So if we have a core named wills_core
, we can query it here through the GUI:
http://localhost:8983/solr/#/wills_core/query
Click on the ‘Execute’ button and you’ll get 10 results. The command we’re sending
is a GET with a query parameter q
that returns all results *:*
.
$curl http://localhost:8983/solr/wills_core/select?q=*:*
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"*:*"}},
"response":{"numFound":52,"start":0,"docs":[
{
"id":"0553573403",
"cat":["book"],
"name":["A Game of Thrones"],
"price":[7.99],
"inStock":[true],
"author":["George R.R. Martin"],
"series_t":"A Song of Ice and Fire",
"sequence_i":1,
"genre_s":"fantasy",
"_version_":1619510392534335488},
{
"id":"0553579908",
"cat":["book"],
"name":["A Clash of Kings"],
"price":[7.99],
"inStock":[true],
"author":["George R.R. Martin"],
"series_t":"A Song of Ice and Fire",
"sequence_i":2,
"genre_s":"fantasy",
"_version_":1619510392599347200},
{
"id":"055357342X",
"cat":["book"],
"name":["A Storm of Swords"],
"price":[7.99],
"inStock":[true],
"author":["George R.R. Martin"],
"series_t":"A Song of Ice and Fire",
"sequence_i":3,
"genre_s":"fantasy",
"_version_":1619510392601444352},
{
"id":"0553293354",
"cat":["book"],
"name":["Foundation"],
"price":[7.99],
"inStock":[true],
"author":["Isaac Asimov"],
"series_t":"Foundation Novels",
"sequence_i":1,
"genre_s":"scifi",
"_version_":1619510392602492928},
{
"id":"0812521390",
"cat":["book"],
"name":["The Black Company"],
"price":[6.99],
"inStock":[false],
"author":["Glen Cook"],
"series_t":"The Chronicles of The Black Company",
"sequence_i":1,
"genre_s":"fantasy",
"_version_":1619510392603541504},
{
"id":"0812550706",
"cat":["book"],
"name":["Ender's Game"],
"price":[6.99],
"inStock":[true],
"author":["Orson Scott Card"],
"series_t":"Ender",
"sequence_i":1,
"genre_s":"scifi",
"_version_":1619510392604590080},
{
"id":"0441385532",
"cat":["book"],
"name":["Jhereg"],
"price":[7.95],
"inStock":[false],
"author":["Steven Brust"],
"series_t":"Vlad Taltos",
"sequence_i":1,
"genre_s":"fantasy",
"_version_":1619510392605638656},
{
"id":"0380014300",
"cat":["book"],
"name":["Nine Princes In Amber"],
"price":[6.99],
"inStock":[true],
"author":["Roger Zelazny"],
"series_t":"the Chronicles of Amber",
"sequence_i":1,
"genre_s":"fantasy",
"_version_":1619510392607735808},
{
"id":"0805080481",
"cat":["book"],
"name":["The Book of Three"],
"price":[5.99],
"inStock":[true],
"author":["Lloyd Alexander"],
"series_t":"The Chronicles of Prydain",
"sequence_i":1,
"genre_s":"fantasy",
"_version_":1619510392608784384},
{
"id":"080508049X",
"cat":["book"],
"name":["The Black Cauldron"],
"price":[5.99],
"inStock":[true],
"author":["Lloyd Alexander"],
"series_t":"The Chronicles of Prydain",
"sequence_i":2,
"genre_s":"fantasy",
"_version_":1619510392609832960}]
}}
You can modify the query parameters and query for say just a specific field foundation
:
$ curl http://localhost:8983/solr/wills_core/select?q=foundation
{
"responseHeader":{
"status":0,
"QTime":14,
"params":{
"q":"foundation"}},
"response":{"numFound":1,"start":0,"docs":[
{
"id":"/opt/solr/example/exampledocs/test_utf8.sh",
"stream_size":[3742],
"x_parsed_by":["org.apache.tika.parser.DefaultParser",
"org.apache.tika.parser.txt.TXTParser"],
"stream_content_type":["application/octet-stream"],
"content_encoding":["ISO-8859-1"],
"resourcename":["/opt/solr/example/exampledocs/test_utf8.sh"],
"content_type":["application/x-sh; charset=ISO-8859-1"],
"_version_":1619510396080619520}]
}}
An example GET is: curl http://localhost:8983/solr/wills_core/select?q=foundation
https://lucene.apache.org/solr/guide/7_6/common-query-parameters.html
The following query parameters are built up with:
qt
) - e.g. /select
or /update
Request Handlers are here: https://lucene.apache.org/solr/guide/7_6/implicit-requesthandlers.htmlq
). By default, the Standard Query Parser is defType=lucene
q
) - e.g.sort
) arranges search results in either ascending(asc
) or descending (desc
) order
Solr can sort according to a lot of different methods (document stores, function results, etc)
E.g. score desc
, div(popularity,price) desc
, inStock desc, price asc
Multiple sort orderings can be separated by a comma, e.g. sort=<field_name><direction>,<field_name><direction>
start
) causes Solr to skip over the preceding records and starts at the document
identified by the offset.rows
) can be used to paginate results from a query. By default, Solr returns 10 documents
at a time in response to a queryfq
) restricts what documents are returned
E.g. fq=popularity:[10 TO *]&fq=section:0
- fq can be specified multiple timesfl
) is used to limit the list of fields returned in the response
E.g. id,name,price
returns only the id
, name
, and price
fields
*
to return all the stored fields in each document (i.e. those with useDocValueAsStored="true"
)
Functions can be computed for each document (e.g. fl=id,title,product(price,popularity)
)
Field names aliases change the key used in the response by prefixing it with a display name.
fl=id,sales_price:price,secret_sauce:prod(price,popularity),why_score:[explain style=nl]
E.g. Here we have the new name secret_sauce
that is a product of the price and popularitydebug
) can be specified multiple times and supports the following arguments:
debug=query
for info about the query, debug=timing
for how long the query took,