Apparently the layout of https://annas-archive.org has changed, making changes necessary.
The issue has been reported in #5146, see there for more details.
- closes#5146
- Adds a new engine `searx/engines/openalex.py` that integrates the OpenAlex
Works API to return scientific paper results using the `paper.html` template.
- Uses the official API (no auth required); supports OpenAlex polite pool via `mailto`.
- Adds developer docs at `docs/dev/engines/online/openalex.rst`.
OpenAlex API reference: https://docs.openalex.org/how-to-use-the-api/api-overview
- the previous CDN icon list url no longer works
- a list of all icons is mirrored to the JSDelivr CDN however
- there's no reason to set the engine to inactive now that we use public CDNs
This patch adds GitHub Code Search [1] engine to allow querying the codebases.
Template code.html is changed to allow passthrough of strip and highlighting
options.
Engine Searchcode is adjusted to pass filename and not rely on hardcoded
extensions.
GitHub search code API does not return the exact code line indices, this
implementation assigns the code arbitrary numbers starting from 1
(effectively relabeling the code).
The API allows for unauth calls, and the default engine settings default to
that, although the calls are heavily rate limited.
The 'text' lexer is the default pygments lexer when parsing fails.
[1] https://docs.github.com/en/rest/search/search?apiVersion=2022-11-28#search-code
Co-authored-by: Markus Heiser <markus.heiser@darmarIT.de>
pyrightconfig.json :
for the paths searx, searxng_extra and tests, individual rules were
defined (for example, in test fewer / different rules are needed than in the
searx package
searx/engines/__builtins__.pyi :
The builtin types that are added to the global namespace of a module by the
intended monkey patching of the engine modules / replaces the previous
filtering of the stdout using grep.
test.pyright_modified (utils/lib_sxng_test.sh) :
static type check of local modified files not yet commited
make test :
prerequisite 'test.pyright' has been replaced by 'test.pyright_modified'
searx/engines/__init__.py, searx/enginelib/__init__.py :
First, minimal typifications that were considered necessary.
FIxes publishedDate format in reuters engine to encompass ISO 8601 times both with and without milliseconds.
Why is this change important?
Previously, the engine would sometimes fail saying:
2025-08-12 21:13:23,091 ERROR:searx.engines.reuters: exception : time data '2024-04-15T19:08:30.833Z' does not match format '%Y-%m-%dT%H:%M:%SZ'
Traceback (most recent call last):
...
File "/usr/local/searxng/searx/engines/reuters.py", line 87, in response
publishedDate=datetime.strptime(result["display_time"], "%Y-%m-%dT%H:%M:%SZ"),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
Note that most queries seem to work with Reuters, but there are some results that have the additional milliseconds and fail. Regardless, the change is backwards compatible as both the formats (with and without the ms) should now parse correctly.
Add Baidu Captcha detection to reduce `JSONDecodeError` error
Baidu will redirect to `wappass.baidu.com` and return a captcha challenge.
Current behavior will get the data from `wappass.baidu.com` then return a
`json.decoder.JSONDecodeError` error.
Sometimes, there's only an `adaptivestreaming` field in `streams`, which
is usually an m3u8 file. That's however not supported by the video player
of any browser, so we can't use and must build a different url instead.
- not 100% sure about the condition code mapping, there are no real matches for most of the codes from Apple WeatherKit to the weather codes we have in SearXNG
- related: https://github.com/searxng/searxng/issues/4885
* [fix] CI task "update_engine_traits.py" fails
To catch all problems with an HTTP request, the more general class
``httpx.HTTPError`` must be caught, for your test use::
$ ./manage dev.env
$ python ./searxng_extra/update/update_engine_traits.py
Closes: https://github.com/searxng/searxng/issues/5068
* [data] update searx.data - update_engine_traits.py
The error message in case the vqd value could not be determined was incorrect
and triggered an exception::
File "/usr/local/searxng/searxng-src/searx/engines/duckduckgo.py", line 132, in get_vqd
logger.error("vqd value from duckduckgo.com ", resp.status_code)
Message: 'vqd value from duckduckgo.com '
Arguments: (202,)
The current google_videos.py in the master branch is completely non functional, due to it not parsing the returned video search results correctly. The result is searxng saying that no results were found. This commit is a new updated google_videos.py that's designed to fix that and is confirmed to be working.
Implementing the suggestions by Bnyro.
Re-formatted with `black` for compatibility. After failing automated checks, ran the command:
black --line-length 120 --skip-string-normalization --target-version py311 google_videos.py
Wordnik is now an answerer and not in the infobox anymore: it uses the
translations answerer, because it provides all the features needed. By default,
only its first results is shown
Additionally a new "define" category is added - I know, it's the same as the
"dictionaries" category, but I don't think we can alias categories. This allows
to search e.g. for `!define tree`, the idea is to allow easy searches for
definitions of words.
Related:
- https://github.com/searxng/searxng/issues/4111
These engines override the user agent manually using `gen_useragent`, although that's already done in the online preprocessor that runs before the actual `request(query, params)` method is called. Hence, this call is duplicated.
Related:
- https://github.com/searxng/searxng/pull/4990#discussion_r2195142838
- apparently, PDIA switched from Angolia to AWS :/
- we no longer require an API key, but the AWS node might change, so we still have to extract the API url of the node
- the response format is still the same, so no changes needed in that regard
- closes#4989
This patch migrates from `redis==5.2.1` [1] to `valkey==6.1.0` [2].
The migration to valkey is necessary because the company behind Redis has decided
to abandon the open source license. After experiencing a drop in user numbers,
they now want to run it under a dual license again. But this move demonstrates
once again how unreliable the company is and how it treats open source
developers.
To review first, read the docs::
$ make docs.live
Follow the instructions to remove redis:
- http://0.0.0.0:8000/admin/settings/settings_redis.html
Config and install a local valkey DB:
- http://0.0.0.0:8000/admin/settings/settings_valkey.html
[1] https://pypi.org/project/redis/
[2] https://pypi.org/project/valkey/
Co-authored-by: HLFH <gaspard@dhautefeuille.eu>
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Integration / Python 3.10 (push) Has been cancelled
Integration / Python 3.11 (push) Has been cancelled
Integration / Python 3.12 (push) Has been cancelled
Integration / Python 3.13 (push) Has been cancelled
Integration / Python 3.9 (push) Has been cancelled
Integration / Theme (push) Has been cancelled
What's changed?
- this PR adds Pixabay, a collection of royalty free images
- additionaly it seems to have some videos, so there's an engine for it too
Author Notes
- when using SearXNG's transport, all our requests will get blocked, probably due to fingerprinting
- we should find an alternative solution because this is just a hacky change to make things work for now, but idk how ...
Tube Archivist [1] is a self-hosted project which archives youtube videos on
your own local server. This engine connects with Tube Archivist's search API to
allow searching from SearXNG into your own hosted videos.
[1] https://www.tubearchivist.com/
Signed-off-by: Robert M. Clabough <robert@claobugh.tech>
Co-authored-by: Bnyro <bnyro@tutanota.com>
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Changes:
- Add required iscqry, pz and bct search parameters
- Remove unused/optional search parameters (ei, fr2, age)
- Fix offset calculation
- Use new sB cookie for filters (time, safesearch, language)
- Group related parameter assignments together
- Restructure request parameter building to better match a real request
- Use f-strings for string formatting
- Add logging of domain and cookies used
Related:
- https://github.com/searxng/searxng/issues/4910
Changes:
- Improve log messages for better debugging of future CAPTCHA issues
- Fixed erroneous get_sc_url variable where sc was always blank (when no cached value)
- Move Origin and Referer headers to request() function
- Add missing form parameters (abp, abd, abe) required by Startpage
to avoid being flagged as automated requests
- Include segment parameter for paginated requests
- Clean up unnecessary commented-out headers
- Fix minor typos e.g. "time-stamp" → "timestamp", "scrap" → "scrapes"
Related:
- https://github.com/searxng/searxng/issues/4673
The types necessary for weather information such as GeoLocation, DateTime,
Temperature,Pressure, WindSpeed, RelativeHumidity, Compass (wind direction) and
symbols for the weather have been implemented.
There are unit conversions and translations for weather property labels.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Sometimes (e.g. when ddg does not have a result item) there is no content and
the engine will fail with an IndexError:
* Error: IndexError
* Percentage: 10
* Parameters: `()`
* File name: `searx/engines/duckduckgo.py:375`
* Function: `response`
* Code: `item["content"] = extract_text(eval_xpath(div_result, './/a[contains(@class, "result__snippet")]')[0])`
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Changes:
- Add trailing slash to base URL to prevent potential redirects
- Remove advanced search syntax filtering (no longer guarantees a CAPTCHA)
- Correct pagination offset calculation: Page 2 now starts at offset 10,
subsequent pages use 10 + (n-2)*15 formula instead of the previous
broken 20 + (n-2)*50 calculation that caused CAPTCHAs
- Restructure request parameter building to better match a real request
- "kt" cookie is no longer an empty string if the language/region is "all"
- Group related parameter assignments together
- Add header logging to debugging output
Related:
- https://github.com/searxng/searxng/issues/4824
There is currently no known z-library, and all known URLs are dead [1]. To avoid
interrupting automated updates, a connection error to a z-library is treated as
a *known error*, and the old properties of the z-library are retained.
[1] https://github.com/searxng/searxng/issues/3610
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
- apparently the API now requires a `X-Pinterest-PWS-Handler` in order to
properly function (extracted from their web UI)
- the other `X-Pinterest` headers here are added in case they become mandatory
too
Closes: https://github.com/searxng/searxng/issues/4812
Icons category makes sense because it allows to quickly search for free SVG
icons to use for websites / other designs with a quick `!icons` query
Icons don't seem to fit into the normal images category that well because icons
are quite a special type of images
The global variable CACHE is not initialized when DDG images or DDG videos
import the get_vqd() function (please remember: the engine modules are imported
using the importlib method and not via the `import` keyword).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The EngineCache class replaces all previously individual solutions for caches in
the context of the engines.
- demo_offline.py
- duckduckgo.py
- radio_browser.py
- soundcloud.py
- startpage.py
- wolframalpha_api.py
- wolframalpha_noapi.py
Search term to test most of the modified engines::
!ddg !rb !sc !sp !wa test
!ddg !rb !sc !sp !wa foo
For introspection of the DB, jump into developer environment and run command to
show cache state::
$ ./manage pyenv.cmd bash --norc --noprofile
(py3) python -m searx.enginelib cache state
cache tables and key/values
===========================
[demo_offline ] 2025-04-22 11:32:50 count --> (int) 4
[startpage ] 2025-04-22 12:32:30 SC_CODE --> (str) fSOBnhEMlDfE20
[duckduckgo ] 2025-04-22 12:32:31 4dff493e.... --> (str) 4-128634958369380006627592672385352473325
[duckduckgo ] 2025-04-22 12:40:06 3e2583e2.... --> (str) 4-263126175288871260472289814259666848451
[radio_browser ] 2025-04-23 11:33:08 servers --> (list) ['https://de2.api.radio-browser.info', ...]
[soundcloud ] 2025-04-29 11:40:06 guest_client_id --> (str) EjkRJG0BLNEZquRiPZYdNtJdyGtTuHdp
[wolframalpha ] 2025-04-22 12:40:06 code --> (str) 5aa79f86205ad26188e0e26e28fb7ae7
number of tables: 6
number of key/value pairs: 7
In the "cache tables and key/values" section, the table name (engine name) is at
first position on the second there is the calculated expire date and on the
third and fourth position the key/value is shown.
About duckduckgo: The *vqd coode* of ddg depends on the query term and therefore
the key is a hash value of the query term (to not to store the raw query term).
In the "properties of ENGINES_CACHE" section all properties of the SQLiteAppl /
ExpireCache and their last modification date are shown::
properties of ENGINES_CACHE
===========================
[last modified: 2025-04-22 11:32:27] DB_SCHEMA : 1
[last modified: 2025-04-22 11:32:27] LAST_MAINTENANCE :
[last modified: 2025-04-22 11:32:27] crypt_hash : ca612e3566fdfd7cf7efe2b1c9349f461158d07cb78a3750e5c5be686aa8ebdc
[last modified: 2025-04-22 11:32:30] CACHE-TABLE--demo_offline: demo_offline
[last modified: 2025-04-22 11:32:30] CACHE-TABLE--startpage: startpage
[last modified: 2025-04-22 11:32:31] CACHE-TABLE--duckduckgo: duckduckgo
[last modified: 2025-04-22 11:33:08] CACHE-TABLE--radio_browser: radio_browser
[last modified: 2025-04-22 11:40:06] CACHE-TABLE--soundcloud: soundcloud
[last modified: 2025-04-22 11:40:06] CACHE-TABLE--wolframalpha: wolframalpha
These properties provide information about the state of the ExpireCache and
control the behavior. For example, the maintenance intervals are controlled by
the last modification date of the LAST_MAINTENANCE property and the hash value
of the password can be used to detect whether the password has been changed (in
this case the DB entries can no longer be decrypted and the entire cache must be
discarded).
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Fixes the semantic scholar engine by extracting a ui version token.
BTW: remove html tags from the content.
Author's checklist:
- they are ratelimiting very fast, if you do approx more than 2 requests per
minute, you have to wait some time again...
- they also have an official api at api.semanticscholar.org, but it's ratelimits
are even harder
Closes: https://github.com/searxng/searxng/issues/4685
* filtering ChinaSo-News results by source, option ``chinaso_news_source``
* add ChinaSo engine to the online docs https://docs.searxng.org/dev/engines/online/chinaso.html
* fix SearXNG categories in the settings.yml
* deactivate ChinaSo engines ``inactive: true`` until [1] is fixed
* configure network of the ChinaSo engines
[1] https://github.com/searxng/searxng/issues/4694
Signed-off-by: @BrandonStudio
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>