My un-answered questions on Solr

Prefix, wildcard, and fuzzy queries aren't analyzed, and thus won't match synonyms.
Why Prefix, wildcard, and fuzzy queries aren't analyzed?

Search for "wrong patient", but "wrong resident" is brought up at the top.
How can I have wild card at the first character of the search term?
It seems that I have to convert search term to lowercase before giving it to Solr. How can I work around this?
How can I search for acronym which may be the same as normal english word?
How can we give exact match higher relevancy?

What is an entity?
What is "field collapsing"?
What is "more like this"? How does it work? How can I configure it?
What is "faceting"? How does it work? How can I configure it?

To give exact match (searching for a phrase) higher relevancy, perhaps we can give the first instance of word A higher score. Subsequent match for word A is given a significantly lower score. Perhaps, we should limit the number of time any word can appear in a document.

Does Solr have anything to detect keyword stuffing? What if a document contains just a single word? What if 90% of the document is the same word, and the other 10% is another word? What if 50% of the document is one word, and the other 50% contains other words? Obviously, if we have a small set of documents, we can review each document and throw out the one that is not valid. But what if we have to index a large number of documents, perhaps from external sources where we don't have control over the content? If we are search for a word, that word appears in document A 10 times, and appears in document B 100 times, does it means that document B has higher relevant score? Does it means that document B is more valid? Document B might not be valid. Document B might be artificially stuffed with this keyword.

Why might we want to remove duplicate token (RemoveDuplicatesTokenFilterFactory)? Does "removing duplicate tokens" still maintain a count of how many times this token appear in this document?


In Clinical Cafe, members that has 'Other' as their profession, we allow them to specify their profession in a text input field. This information is stored in vOtherProfession in the cc_member table. Not sure if this information is carried over to Solr via DIH (may be the problem was with writing the SQL for DIH). Not sure if the problem was with searching this data, or displaying this data.

If the problem was with writing the SQL for DIH, not sure what we can do. If this is the case, not sure what to do to make this information searchable. For displaying, we can query the database (ideally we should make 1 SQL call to the database).

If the problem was not with writing the SQL for DIH, but with making this information searchable, then perhaps we need to review our setup. Perhaps we should copy this information to the 'text' field. Perhaps we should index this field. Perhaps we should copy both fields to another multiValue field, and search on this field.

Results from Solr should contains both fields. Our code can check to see which one to display.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License