This repository has been archived on 2024-06-01. You can view files and clone it, but cannot push or open issues or pull requests.
warehouse/blog/selfhost-search-engine.typ

136 lines
5 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#import "template.typ": conf
#show: doc => conf(
title: [ Host your own private search engine with SearXNG ],
doc,
)
= Introduction
#link("https://docs.searxng.org/")[SearXNG];, put in its own words, is a
'free internet metasearch engine'. Note that it describes itself as a
#emph[metasearch] engine specifically - unlike your traditional search
engine like Google or Bing, SearXNG does things a little bit
differently: It aggregrates the results produced by search services like
those aforementioned, and feeds them back to you.
Because of this key detail and a great deal of effort by those whove
helped shape it, SearXNG protects your privacy, and does so very well: -
Private data from requests going to the search services it aggregrates
results from is removed - It does #strong[not] forward #emph[anything]
to any third parties through search services - Private data is
#emph[also] removed from requests going to the results pages
Furthermore, SearXNG can be configured to use
#link("https://torproject.org")[Tor];.
However, the aspect of privacy isnt the only great selling feature of
the engine; from my use of the engine so far, its also great
at…searching \(is that a surprise?). The fact that its a metasearch
engine plays a key role in this, as it provides SearXNG the ability to
pull content more efficiently and gives #emph[you] the ability to
further tailor your experience.
= Setting up SearXNG
== Installing the service
As you may have expected if youve used NixOS for a while, searxng is
packaged #emph[and] has a service on NixOS. This makes setting it up
just that much easier.
To get started, place somewhere in your #emph[system] config the
following:
```nix
{
# ...
services.searx = {
enable = true;
settings = {
server = {
port = 8888;
bind_address = "127.0.0.1";
secret_key = "@SEARX_SECRET_KEY@";
base_url = "https://search.devraza.duckdns.org/"; # replace with wherever you want to host yours
};
};
};
# ...
}
```
The snippet above starts the `searx` systemd service for listening on
port `8888`, and assumes a `base_url` of
`https://search.devraza.duckdns.org`.
Now that weve got the actual `searx` instance running, we can now set
up a reverse proxy allowing the service to be accessed remotely
\(whether this is within your local network or across the internet is up
to you).
== Setting up a reverse proxy
=== What is a reverse proxy?
Before I get started with the technical details of setting this up, Id
like to briefly clarify what a reverse proxy exactly is \(to my
understanding).
Lets get the wikipedia definition of reverse proxy out of the way
first:
#quote(block: true)[
\[…\] a reverse proxy is an application that sits in front of back-end
applications and forwards client requests to those applications. \[…\]
]
However, you might be confused as to what this actually means; Ill give
an example of the usage of reverse proxies to better explain this:
- Suppose youve got a few services running on a server \(for
demonstration purposes, lets name these `x`, `y` and `z`), each
running on their own unique port.
- Assuming you had a domain, and wanted to access all of these services
from their own unique sub-domains \(e.g.~`x.yourdomain.com`,
`y.yourdomain.com` and `z.yourdomain.com`), you would have to use a
reverse proxy.
- This reverse proxy would take in requests from clients going to
sub-domains, and forward these requests to the appropriate port on
your machine for the service being requested.
The concept should be clear now, if it wasnt already.
=== Using NGINX to set up the reverse proxy
NGINX is a popular web server that supports the creation of virtual
hosts and the usage of reverse proxies. To accomodate our `searx`
instance, we append the following to our NixOS server configuration:
```nix
{
# ...
services.nginx = {
enable = true;
# any extra configuration here
virtualHosts = {
"search" = { # this can be anything, being an arbitrary identifier
forceSSL = true;
serverName = "search.yourdomain.com"; # replace this with whatever you're serving from
# SearX proxy
locations."/" = {
proxyPass = "http://${toString config.services.searx.settings.server.bind_address}:${toString config.services.searx.settings.server.port}";
proxyWebsockets = true;
recommendedProxySettings = true;
};
};
};
};
# ...
}
```
The expression highlighted above is used to dynamically adjust the location NGINX will forward requests to, depending on your `searx` config
After saving your changes and rebuilding your servers system
configuration \(as usual), you should have a working #emph[private]
instance of SearXNG that you can access using the `serverName` youve
given it.
Set your browser to use this as your search engine using the relevant
documentation \(with Firefox this is as easy as right-clicking on the
URL after opening up the page and clicking a button). Enjoy!