Have you ever used Google, Yahoo, MSN for searching search any web page, your friend, any place information, images, etc on the internet? Obviously yes! So that is the search engine.
A search engine is a web-based tool to locate or search for information from a huge database such as web pages, newsgroups, programs, images, etc. on the worldwide web or internet.
Three Major Components of Any Search Engine are
- Web Crawler (Spider/bots)
- Search Engine
How Search Engine Works/Mechanism (Crawl, Index, Ranking)
The primary goal of the search engine mechanism is to present highly relevant search results according to the query given by the user. The user needs to enter a query on the search engine interface and then search engine generates results by following below-mentioned steps
- Crawl: In the search engine the key operation is the crawling of hundreds of billions of pages in a very short time. Firstly search engine navigates the web through downloading the web pages and following the links on these pages to discover new pages.
- Index: WebPages are found by the search engines added into a data structure called an index. The index has all the discovered URLs with information like keywords, content, freshness, etc.
- Ranking: When a user gives a search query, all of the relevant pages are identified firstly by Index and then an algorithm used to assign the ranking to the individual web pages. The efficiency of the search engine will be calculated by the most relevant search result from the user perspective. Different search engines have different algorithms due to that for the same search query there will be different results.
Apart from the search query, there are some other parameters used by a search engine to give relevant results like location, language, device and previous search history.
Some Very Famous Search engines: Google, Yahoo, MSN, Bing, Ask, Alexa, LYCOS, AOL.Search, Altavista, etc.
Google Search Engine Vs Own Search Engine Framework (Elasticsearch/Apache Solr)
Above search engines are products based and best solutions for searching on the worldwide huge database but apart from that, there are also some search engines where users can develop their search solutions for a particular database like elastic search and apache solr search.
In the above search engines like Google, Yahoo, etc we don’t require any coding or any effort for indexing and setting up the UI but in elastic search or apache solr search, users will require efforts for indexing and UI design.
The great advantage of using any search engine framework is that they are very efficient and save you lots of time to write complicated SQL queries. Also through the elastic/apache search engine, we not only can search or perform the other database operations like create, update, delete that make a great difference compared to product-based search engines like Google, etc and other search frameworks.
Elastic search is real-time distributed, open-source, full-text search which is generally used in single page application. It has been developed in Java. It is a RESTful search engine built on top of the Apache Lucene library. That has been introduced after apache solr. The key feature of the elastic search engine is multi-tenancy. It’s an open-source search engine anyone can contribute and the Elastic will review and accept what they want to. The contribution goes through strict revisions thus you will get quality here.
Major Features include:
- Distributed search
- An analyzer chain
- Analytical search
- Grouping & aggregation
Let’s take a very simple look at elastic search with an example.
Installing of Elasticsearch:
- Can install the elastic search from the https://linuxize.com/post/how-to-install-elasticsearch-on-ubuntu-18-04/
- After installing the elastic search we need to create the index and type. The index can be compared with the MySQL database and type can be compared with the table and the entries as a document. And then we can perform the search operation. Like in the below example I have created index blog and type post and created documents/entries.
Solr is scalable, ready to deploy, search/storage engine optimized to search large volumes of text-centric data. Solr is enterprise-ready, fast and highly scalable, built on a Java library called Lucene. It’s an open-source search engine anyone can contribute and thus there is availability of more features.
The major features include:
- Full-text search
- Faceted search
- Real-time indexing
- Dynamic clustering
- Database integration
- NoSQL features
- Handles documents in a better way
Installation: We can install the apache solr from https://tecadmin.net/install-apache-solr-on-ubuntu/