testing-integration / test-unit (push) Waiting to run
testing-integration / test-sqlite (push) Waiting to run
testing / backend-checks (push) Waiting to run
testing / frontend-checks (push) Waiting to run
testing / test-unit (push) Blocked by required conditions
testing / test-e2e (push) Blocked by required conditions
testing / test-remote-cacher (redis) (push) Blocked by required conditions
testing / test-remote-cacher (valkey) (push) Blocked by required conditions
testing / test-remote-cacher (garnet) (push) Blocked by required conditions
testing / test-remote-cacher (redict) (push) Blocked by required conditions
testing / test-mysql (push) Blocked by required conditions
testing / test-pgsql (push) Blocked by required conditions
testing / test-sqlite (push) Blocked by required conditions
testing / security-check (push) Blocked by required conditions
This PR detects Interlisp files (files that include "(DEFINE-FILE-INFO" somewhere near the start, and do not have an .LCOM extension) as text files and displays them as such in the web UI.
To check for extensions, I had to extend the `typesniffer.DetectContentType` function to accept an extra filename parameter—which could be useful for future filetype detection features. It is possible that a few of the places I modified pass a full file path instead of just passing a file name.
Implements #8184
## Checklist
### Tests
- I added test coverage for Go changes...
- [x] in their respective `*_test.go` for unit tests.
- [ ] in the `tests/integration` directory if it involves interactions with a live Forgejo server.
- I added test coverage for JavaScript changes... - NA
- [ ] in `web_src/js/*.test.js` if it can be unit tested.
- [ ] in `tests/e2e/*.test.e2e.js` if it requires interactions with a live Forgejo server (see also the [developer guide for JavaScript testing](https://codeberg.org/forgejo/forgejo/src/branch/forgejo/tests/e2e/README.md#end-to-end-tests)).
### Documentation
- [ ] I created a pull request [to the documentation](https://codeberg.org/forgejo/docs) to explain to Forgejo users how to use this change.
- [x] I did not document these changes and I do not expect someone else to do it.
### Release notes
- [ ] I do not want this change to show in the release notes.
- [x] I want the title to show in the release notes with a link to this pull request.
- [ ] I want the content of the `release-notes/<pull request number>.md` to be be used for the release notes instead of the title.
<!--start release-notes-assistant-->
## Release notes
<!--URL:https://codeberg.org/forgejo/forgejo-->
- Features
- [PR](https://codeberg.org/forgejo/forgejo/pulls/8377): <!--number 8377 --><!--line 0 --><!--description ZGV0ZWN0IEludGVybGlzcCBzb3VyY2VzIGFzIHRleHQ=-->detect Interlisp sources as text<!--description-->
<!--end release-notes-assistant-->
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/8377
Reviewed-by: Gusted <gusted@noreply.codeberg.org>
Co-authored-by: Bojidar Marinov <bojidar.marinov.bg@gmail.com>
Co-committed-by: Bojidar Marinov <bojidar.marinov.bg@gmail.com>
Fuzzy searching for code has been known to be problematic #5264 and in my personal opinion isn't very useful.
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/6947
Reviewed-by: Gusted <gusted@noreply.codeberg.org>
Co-authored-by: Shiny Nematoda <snematoda.751k2@aleeas.com>
Co-committed-by: Shiny Nematoda <snematoda.751k2@aleeas.com>
Added support for searching content in a specific directory or file.
Reviewed-on: https://codeberg.org/forgejo/forgejo/pulls/6143
Reviewed-by: Gusted <gusted@noreply.codeberg.org>
Reviewed-by: 0ko <0ko@noreply.codeberg.org>
Co-authored-by: Shiny Nematoda <snematoda.751k2@aleeas.com>
Co-committed-by: Shiny Nematoda <snematoda.751k2@aleeas.com>
When opening a repository, it will call `ensureValidRepository` and also
`CatFileBatch`. But sometimes these will not be used until repository
closed. So it's a waste of CPU to invoke 3 times git command for every
open repository.
This PR removed all of these from `OpenRepository` but only kept
checking whether the folder exists. When a batch is necessary, the
necessary functions will be invoked.
---
Conflict resolution: Because of the removal of go-git in (#4941)
`_nogogit.go` files were either renamed or merged into the 'common'
file. Git does handle the renames correctly, but for those that were
merged has to be manually copied pasted over. The patch looks the same,
201 additions 90 deletions as the original patch.
(cherry picked from commit c03baab678ba5b2e9d974aea147e660417f5d3f7)
meilisearch does not have an search option to contorl fuzzynes per query
right now:
- https://github.com/meilisearch/meilisearch/issues/1192
- https://github.com/orgs/meilisearch/discussions/377
- https://github.com/meilisearch/meilisearch/discussions/1096
so we have to create a workaround by post-filter the search result in
gitea until this is addressed.
For future works I added an option in backend only atm, to enable
fuzzynes for issue indexer too.
And also refactored the code so the fuzzy option is equal in logic to
code indexer
---
*Sponsored by Kithara Software GmbH*
Conflicts:
routers/web/repo/search.go
trivial context confict s/isMatch/isFuzzy/
Fix for gitea putting everything into one request without batching and
sending it to Elasticsearch for indexing as issued in #28117
This issue occured in large repositories while Gitea tries to
index the code using ElasticSearch.
I've applied necessary changes that takes batch length from below config
(app.ini)
```
[queue.code_indexer]
BATCH_LENGTH=<length_int>
```
and batches all requests to Elasticsearch in chunks as configured in the
above config
(cherry picked from commit 5c0fc9087211f01375f208d679a1e6de0685320c)
The `ToUTF8*` functions were stripping BOM, while BOM is actually valid
in UTF8, so the stripping must be optional depending on use case. This
does:
- Add a options struct to all `ToUTF8*` functions, that by default will
strip BOM to preserve existing behaviour
- Remove `ToUTF8` function, it was dead code
- Rename `ToUTF8WithErr` to `ToUTF8`
- Preserve BOM in Monaco Editor
- Remove a unnecessary newline in the textarea value. Browsers did
ignore it, it seems but it's better not to rely on this behaviour.
Fixes: https://github.com/go-gitea/gitea/issues/28743
Related: https://github.com/go-gitea/gitea/issues/6716 which seems to
have once introduced a mechanism that strips and re-adds the BOM, but
from what I can tell, this mechanism was removed at some point after
that PR.
Refactor `modules/indexer` to make it more maintainable. And it can be
easier to support more features. I'm trying to solve some of issue
searching, this is a precursor to making functional changes.
Current supported engines and the index versions:
| engines | issues | code |
| - | - | - |
| db | Just a wrapper for database queries, doesn't need version | - |
| bleve | The version of index is **2** | The version of index is **6**
|
| elasticsearch | The old index has no version, will be treated as
version **0** in this PR | The version of index is **1** |
| meilisearch | The old index has no version, will be treated as version
**0** in this PR | - |
## Changes
### Split
Splited it into mutiple packages
```text
indexer
├── internal
│ ├── bleve
│ ├── db
│ ├── elasticsearch
│ └── meilisearch
├── code
│ ├── bleve
│ ├── elasticsearch
│ └── internal
└── issues
├── bleve
├── db
├── elasticsearch
├── internal
└── meilisearch
```
- `indexer/interanal`: Internal shared package for indexer.
- `indexer/interanal/[engine]`: Internal shared package for each engine
(bleve/db/elasticsearch/meilisearch).
- `indexer/code`: Implementations for code indexer.
- `indexer/code/internal`: Internal shared package for code indexer.
- `indexer/code/[engine]`: Implementation via each engine for code
indexer.
- `indexer/issues`: Implementations for issues indexer.
### Deduplication
- Combine `Init/Ping/Close` for code indexer and issues indexer.
- ~Combine `issues.indexerHolder` and `code.wrappedIndexer` to
`internal.IndexHolder`.~ Remove it, use dummy indexer instead when the
indexer is not ready.
- Duplicate two copies of creating ES clients.
- Duplicate two copies of `indexerID()`.
### Enhancement
- [x] Support index version for elasticsearch issues indexer, the old
index without version will be treated as version 0.
- [x] Fix spell of `elastic_search/ElasticSearch`, it should be
`Elasticsearch`.
- [x] Improve versioning of ES index. We don't need `Aliases`:
- Gitea does't need aliases for "Zero Downtime" because it never delete
old indexes.
- The old code of issues indexer uses the orignal name to create issue
index, so it's tricky to convert it to an alias.
- [x] Support index version for meilisearch issues indexer, the old
index without version will be treated as version 0.
- [x] Do "ping" only when `Ping` has been called, don't ping
periodically and cache the status.
- [x] Support the context parameter whenever possible.
- [x] Fix outdated example config.
- [x] Give up the requeue logic of issues indexer: When indexing fails,
call Ping to check if it was caused by the engine being unavailable, and
only requeue the task if the engine is unavailable.
- It is fragile and tricky, could cause data losing (It did happen when
I was doing some tests for this PR). And it works for ES only.
- Just always requeue the failed task, if it caused by bad data, it's a
bug of Gitea which should be fixed.
---------
Co-authored-by: Giteabot <teabot@gitea.io>
2023-06-23 12:37:56 +00:00
Renamed from modules/indexer/code/elastic_search.go (Browse further)