Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[UI/Collection] Local Documents interface unresponsive with large index (8M+ chunks) #3342

Open
josenobile opened this issue Dec 21, 2024 · 0 comments
Labels
bug-unconfirmed chat gpt4all-chat issues

Comments

@josenobile
Copy link

Bug Report

GPT4All's Local Documents functionality stops showing collections and appears unresponsive while trying to index documents, despite having a large existing index (8M+ chunks) in its SQLite database.

Steps to Reproduce

  1. Install GPT4All via Flatpak
  2. Add a document collection pointing to /root/FITCO (which gets translated to /run/flatpak/doc/8d2db31f/FITCO)
  3. Let it index documents (was working for months)
  4. After some time (3+ months ago), the interface stopped showing any collections
  5. Attempting to add new collections doesn't show any error but doesn't display anything either

Diagnostic Steps Performed

  1. Checked database integrity:
sqlite3 localdocs_v3.db "PRAGMA integrity_check;"
-> Result: ok
  1. Inspected database stats:
  • 538,412 documents
  • 8,029,111 chunks
  • 1,119 documents without chunks
  • Collection "Fitco" shows start_update_time but no last_update_time
  • Database size: 12GB
  1. Attempted fixes:
  • Ran VACUUM on the database (took ~6 minutes)
  • Reset collection timestamps:
UPDATE collections SET last_update_time = NULL, start_update_time = NULL WHERE id = 1;
  1. Checked system resources while app running:
  • Single CPU core at 100% (previously used all cores)
  • Memory usage normal
  • No disk I/O issues

Current State

  • Database appears structurally sound
  • Application UI shows "No Collections Installed"
  • Process runs but only uses one core
  • Flatpak permissions show read-only access to xdg-documents

Expected Behavior

  • Collections should appear in the interface
  • Indexing should use multiple CPU cores
  • Should be able to add new collections
  • Should show indexing progress

Your Environment

  • GPT4All version: v3.6.1
  • Operating System: Fedora Linux 41 (Workstation Edition)
    • Kernel: 6.12.5-200.fc41.x86_64
    • Intel(R) Core(TM) i7-8550U CPU (no GPU installed)
  • Installed via: Flatpak
  • Database path: /home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/
  • Embedding model: nomic-embed-text-v1.5

Additional System Details

# Flatpak Permissions
[Context]
shared=network;ipc;
sockets=x11;wayland;fallback-x11;
devices=dri;
filesystems=xdg-documents:ro;xdg-config/kdeglobals:ro;

[Session Bus Policy]
com.canonical.AppMenu.Registrar=talk
org.kde.kconfig.notify=talk
org.kde.KGlobalSettings=talk

Screenshots/Logs

  1. UI shows "No Collections Installed" despite having indexed content
  2. Process monitor shows single core usage
  3. Database queries confirm existing indexed content

Database Schema

CREATE TABLE chunks(
    id integer primary key autoincrement,
    document_id integer not null,
    chunk_text text not null,
    file text not null,
    title text,
    author text,
    subject text,
    keywords text,
    page integer,
    line_from integer,
    line_to integer,
    words integer default 0 not null,
    tokens integer default 0 not null,
    foreign key(document_id) references documents(id)
);

[Full schema and additional tables available if needed]

Notes

  • The issue seems related to either Flatpak permissions or the application's state management
  • Database contains significant amount of indexed data that appears valid
  • UI/backend communication might be interrupted
  • CPU usage pattern suggests the indexing process is not running properly

Image


ls -al /home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/
total 25491464
drwxr-xr-x. 1 josenobile josenobile         316 Dec 21 11:39 .
drwxr-xr-x. 1 josenobile josenobile          14 Sep 25 10:38 ..
-rw-r--r--. 1 josenobile josenobile  1921909280 Oct 12 12:59 Llama-3.2-3B-Instruct-Q4_0.gguf
-rw-r--r--. 1 josenobile josenobile  7058767872 Oct  8 08:58 localdocs_v2.db
-rw-r--r--. 1 josenobile josenobile 12459642880 Dec 21 11:38 localdocs_v3.db
-rw-r--r--. 1 josenobile josenobile     1720088 Dec 21 12:03 localdocs_v3.db-journal
-rw-r--r--. 1 josenobile josenobile         282 Dec 21 10:10 log-prev.txt
-rw-r--r--. 1 josenobile josenobile         282 Dec 21 11:39 log.txt
-rw-r--r--. 1 josenobile josenobile  4661212096 Sep 25 10:40 Meta-Llama-3.1-8B-Instruct-128k-Q4_0.gguf
-rw-r--r--. 1 josenobile josenobile           0 Sep 25 10:38 test_write.txt
josenobile@fedora:~$ file /home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/*
/home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/Llama-3.2-3B-Instruct-Q4_0.gguf:           data
/home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/localdocs_v2.db:                           SQLite 3.x database, last written using SQLite version 3042000, file counter 128256, database pages 1723332, cookie 0x6, schema 4, largest root page 15, UTF-8, version-valid-for 128256
/home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/localdocs_v3.db:                           SQLite 3.x database, last written using SQLite version 3046001, file counter 114318, database pages 3041905, cookie 0xc, schema 4, largest root page 19, UTF-8, version-valid-for 114318
/home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/localdocs_v3.db-journal:                   data
/home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/log-prev.txt:                              ASCII text
/home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/log.txt:                                   ASCII text
/home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/Meta-Llama-3.1-8B-Instruct-128k-Q4_0.gguf: data
/home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/test_write.txt:                            empty
josenobile@fedora:~$ du -hsc /home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/*
1.8G    /home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/Llama-3.2-3B-Instruct-Q4_0.gguf
6.6G    /home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/localdocs_v2.db
12G     /home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/localdocs_v3.db
1.7M    /home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/localdocs_v3.db-journal
4.0K    /home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/log-prev.txt
4.0K    /home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/log.txt
4.4G    /home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/Meta-Llama-3.1-8B-Instruct-128k-Q4_0.gguf
0       /home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/test_write.txt
25G     total
cat /home/josenobile/.var/app/io.gpt4all.gpt4all/data/nomic.ai/GPT4All/log.txt
[Warning] (Sat Dec 21 11:39:00 2024): qrc:/gpt4all/qml/AddModelView.qml:119:13: QML AddHFModelView: Detected anchors on an item that is managed by a layout. This is undefined behavior; use Layout.alignment instead.
[Debug] (Sat Dec 21 11:39:00 2024): deserializing chats took: 0 ms
@josenobile josenobile added bug-unconfirmed chat gpt4all-chat issues labels Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed chat gpt4all-chat issues
Projects
None yet
Development

No branches or pull requests

1 participant