Indexing
When a repository is indexed by the application, the latest commit is cloned and all files with a few exceptions are added to the database
Symbol Ranking
In order to provide a basic result ranking, ctags is used to generate an index of symbol definitions.
In practice this means that a class name matching the search expression will be ranked higher than a match in a comment
Ignored files
To keep the search fast, some files are not indexed because huge file contents clog the postgres index and lead to very slow search and indexing times.
Skipped and excluded files can be seen in the detail view of a repository in the admin panel.
Excluded files
As a first line of defense there is a fixed list of folders in the application code that will not be checked by the indexer, currently this only includes:
- The
.git
folder of a repository
Custom excluded files
All files and paths that match a user-defined global or repository exclude rule are also not included in the index.
Skipped files
Finally, if a file is not excluded and inspected by the indexer, there are cases where a file will be skipped eventually.
- Files bigger than FILE_MAXSIZE
- Binary files (determined by isbinaryfile)
If you inspect a repository in the admin panel, these files are displayed in a yellow color.
TIP
If a file was skipped by the indexer because one of the conditions listed above is met, it is still a good idea to write an custom exclude rule to improve performance.
This way the indexer does not have to read the file to check if it is too big or a binary file.