79 Commits

Author SHA1 Message Date
Full Stack Engineer
98f1139852 fix: ctx_size overflow causing model load failure after reopening chat (#7879)
* feat: paragraph-level Edit with AI for assistant markdown (#7812)

* fix: ctx_size overflow causing model load failure after reopening chat

* fix linter issue

---------

Co-authored-by: Clayton <118192227+claytonlin1110@users.noreply.github.com>
2026-04-03 06:55:44 +05:30
Clayton
1c3a03557c feat: add optional health check auto-restart for crashed model sessions (#7855)
* feat: add optional health check auto-restart for crashed model sessions

* fix: update

* fix: update

* fix: lint

* fix: lint

* fix: tauri

* fix: build

* fix: update

* fix: update
2026-03-31 12:34:23 +05:30
dataCenter430
42b2d13321 feat(gguf): fix model compatibility heuristic for Apple Silicon unified memory (#7842)
* feat(gguf): improve model compatibility heuristic for Apple Silicon unified memory

* fix: resolve review

---------

Co-authored-by: Louis <louis@jan.ai>
2026-03-28 13:36:32 +07:00
Clayton
d469c2e38c feat: add /v1/orchestrations for server-side MCP tool execution (#7800)
* feat: add /v1/orchestrations for server-side MCP tool execution

* fix: test

* fix: optional assistant_id
2026-03-24 14:31:34 +07:00
Vanalite
0ec8edc639 Merge remote-tracking branch 'origin/main' into chore/merge_main_to_v.0.7.9
# Conflicts:
#	Makefile
2026-03-23 13:31:38 +07:00
dev-miro26
5504a7aa7e chore: update dependencies in Cargo.lock and Cargo.toml, add fm-rs package and modify tauri-plugin-foundation-models dependencies 2026-03-20 06:04:33 +00:00
dev-miro26
c1eeee0d9c refactor: update macOS target configuration to support Apple Silicon architecture for foundation models 2026-03-20 05:21:50 +00:00
dev-miro26
9a684f5473 fix: ensure chunk_text is converted to a string in chat completion stream 2026-03-19 21:28:29 +00:00
dev-miro26
7d98402c62 refactor: change max_tokens type to u32 and improve variable naming in commands.rs 2026-03-19 21:21:59 +00:00
dev-miro26
da499a579b refactor: remove foundation models server dependency and streamline integration with Apple's FoundationModels framework 2026-03-19 19:08:12 +00:00
Vanalite
ab12f7463f chore: disable foundation model for RC 2026-03-19 14:43:29 +07:00
dev-miro26
aa2b968181 feat: expand file type support in document parser and UI components 2026-03-18 15:41:59 +00:00
dev-miro26
d5aa7d4022 feat: add foundation models plugin with server management capabilities (#7744) 2026-03-18 13:08:52 +07:00
Louis
2b18fb5ab1 fix: kv-cache defaults and fit migration (#7751)
* fix: kv-cache default to q8

* fix: add tests

* fix: disable fit setting by default

* fix: default parallel
2026-03-17 21:45:38 +07:00
dataCenter430
8df4bf887e Merge pull request #7605 from dataCenter430/fix/GPU-detection-losing
Fix: gpu detection losing
2026-03-12 18:10:06 +07:00
Louis
49404b75ad fix: bundle jan-cli.exe in Windows NSIS installer (#7618)
* fix: copy jan-cli nsis

* fix: bin file location

* fix: add app to PATH

* fix: polishing

* fix: polish cli commands

* fix: download default model when serving

* fix: less clicks / steps

* fix: add select option

* fix: Default 32k ctx when running in CLI

* fix: deprecate codex for now

* fix: disable uninstalled option
2026-03-05 17:43:02 +07:00
dev-miro26
7d0e861ba1 fix: update flash attention handling in ArgumentBuilder (#7565) 2026-03-04 06:48:56 +07:00
Louis
10d9649fb5 feat: add cli support
# Conflicts:
#	src-tauri/Cargo.lock
#	src-tauri/Cargo.toml
#	src-tauri/src/lib.rs
2026-03-03 20:06:58 +07:00
Louis
305e2a8bdc fix: simplify mlx-server backend with new ChatSession update (#7538) 2026-02-19 00:20:47 +07:00
Louis
89e5bba1f8 fix: outdated test 2026-02-09 22:41:34 +07:00
Louis
ed7d5158c2 fix: disable fit does not work well - llama.cpp somehow use max ctx-size 2026-02-09 22:23:39 +07:00
Louis
d4614a2323 refactor: remove model planner since now use auto fit setting 2026-02-09 12:43:12 +07:00
Akarshan Biswas
fe3fe43b7f feat: implement fit settings in llamacpp extension and overhaul argument builder tests (#7442)
Introduces support for the `fit` parameter and its associated configurations (`fit_target`, `fit_ctx`) to allow automatic adjustment of arguments to device memory. This change spans the extension settings, guest-js types, and the Rust argument builder.

**Key changes:**

* **Settings & Types:** Added `fit`, `fit_target`, and `fit_ctx` to `settings.json` and synchronized these fields across the TypeScript definitions and the Rust `LlamacppConfig` struct.
* **Logic Updates:** * Implemented `add_fit_settings` in the `ArgumentBuilder` to handle `--fit`, `--fit-target`, and `--fit-ctx` flags.
* Modified `add_gpu_layers` to use `-1` as the default for loading all layers, while treating `100` as a manual override.
* Updated several argument methods (batch size, context size, etc.) to only append flags if the values differ from the defaults, reducing command-line clutter.
* Added a check to exclude `fit` settings when using the `ik` backend fork.

* **Testing:** Significantly expanded the Rust test suite. Replaced basic assertions with dedicated helper functions (`assert_arg_pair`, `assert_has_flag`, `assert_no_flag`) and added comprehensive test cases for various configurations, including GPU layers, embedding mode, and backend-specific behavior.
2026-02-09 08:56:57 +05:30
Louis
d01ec56c75 feat: detect vision capability while importing model 2026-02-04 09:27:38 +07:00
Louis
1482565248 feat: vision support 2026-02-04 09:27:38 +07:00
Louis
023e5dea10 feat: add prompt cache and fix binary bundle 2026-02-04 09:27:38 +07:00
Louis
b16b519f4e feat: support mlx plugin
# Conflicts:
#	Makefile
#	web-app/src/routes/settings/providers/$providerName.tsx
2026-02-04 09:27:38 +07:00
Akarshan Biswas
81cd0dfae8 refactor: migrate llamacpp backend logic to rust plugin (#7171)
* refactor: migrate llamacpp backend logic to rust plugin

Moves the core logic for managing llama.cpp backends—including version detection, compatibility checking, migration, prioritization, and updates—from the TypeScript extension to the Rust Tauri plugin.

Changes:
- **tauri-plugin-llamacpp**:
  - Added `src/backend.rs` containing the logic for backend management.
  - Exposed new commands: `map_old_backend_to_new`, `list_supported_backends`, `determine_supported_backends`, `prioritize_backends`, `check_backend_for_updates`, `remove_old_backend_versions`, etc.
  - Added unit tests for backend logic in Rust.
  - Updated permissions and guest-js bindings to include new commands.
- **llamacpp-extension**:
  - Refactored `src/backend.ts` and `src/index.ts` to delegate logic to the Rust plugin.
  - Removed obsolete TypeScript implementation of backend logic and corresponding tests.
  - Simplified configuration and update workflows by using the centralized Rust API.

* tests: fix parse backend version tests

* fix: correct backenddir path
2026-01-02 11:10:03 +05:30
Akarshan Biswas
bf08f8d6b4 refactor: move llama.cpp config handling to Rust (#7047)
* refactor: move llama.cpp config handling to Rust

- Removed duplicated TypeScript type definitions for LlamacppConfig, ModelPlan, DownloadItem, ModelConfig, etc.
- Added a new `src/guest-js/types.ts` that exports the consolidated types and a helper `normalizeLlamacppConfig` for converting raw config objects.
- Implemented a dedicated Rust module `args.rs` that builds all command‑line arguments for llama.cpp from a `LlamacppConfig` struct, handling embedding, flash‑attention, GPU/CPU flags, and other options.
- Updated `commands.rs` to construct arguments via `ArgumentBuilder`, validate paths, and log the generated args.
- Added more explicit error handling for invalid configuration arguments and updated the error enum to include `InvalidArgument`.
- Exported the new `cleanupLlamaProcesses` command and updated the guest‑JS API accordingly.
- Adjusted the TypeScript `loadLlamaModel` helper to use the new config normalization and argument shape.
- Improved logging and documentation for clarity.

* fix: ignore empty mmproj path arguments

Prevent adding the `--mmproj` flag when the provided path string is empty.
An empty `mmproj_path` previously caused an empty argument to be passed to the model loader, potentially leading to errors or undefined behavior. By filtering out empty strings before pushing the flag, the command line construction is now robust against malformed input.

* refactor: use String::new() for empty API key

Use `String::new()` instead of `“”.to_string()` when no API key is supplied.
This eliminates an unnecessary heap allocation and clarifies that the intent is to create an empty string without creating a temporary literal.
2025-12-09 13:32:59 +05:30
Dinh Long Nguyen
5ede3436a8 fix: attachment parsing error (exit gracefully) (#7093)
* Handle attachment parsing error (exit gracefully)

* Revert "Handle attachment parsing error (exit gracefully)"

This reverts commit 56a356f3f60a25c0aa243e5957b76d9545098c9e.

* Handle attachment parsing error (exit gracefully)
2025-12-03 19:31:11 +07:00
Akarshan Biswas
2af5ab9566 fix: set backend path environment variables for llama.cpp (#6937)
* fix: set backend path environment variables for llama.cpp

Ensure that the backend executable’s directory is added to the appropriate
environment variable (`PATH`, `LD_LIBRARY_PATH`, or `DYLD_LIBRARY_PATH`)
before invoking `llama_load` and `get_devices`.
This change fixes load failures on Windows, Linux, and macOS where the
dynamic loader cannot locate the required libraries without the proper
search paths, and cleans up unused imports.

* refactor: centralize library path setup in Rust utilities

Move the library‑path configuration logic out of the TypeScript code into the
Rust `setup_library_path` helper. The TypeScript files no longer set the
`PATH`, `LD_LIBRARY_PATH`, or `DYLD_LIBRARY_PATH` environment variables
directly; instead they defer to the Rust side, which now accepts a
`Path` and performs platform‑specific normalization (including UNC‑prefix
trimming on Windows). This removes duplicated code, keeps environment
configuration consistent across the plugin, and simplifies maintenance.
The import order in `device.rs` was corrected and small formatting fixes
were applied. No functional changes to the public API occur.

* feat: add CUDA path detection and warnings for llama.cpp

Add utilities to detect CUDA installations on Windows and Linux, automatically
inject CUDA paths into the process environment, and warn when the llama.cpp
binary requires CUDA but the runtime is not found.  The library‑path setup has
been refactored to prepend new paths and normalise UNC prefixes for Windows.
This ensures the backend can load CUDA libraries correctly and provides
diagnostic information when CUDA is missing.

* refactor: correctly map and store effective backend type

This update unifies backend type handling across the llamacpp extension.
Previously, the stored backend preference, the version string, and the
auto‑update logic used inconsistent identifiers (raw backend names versus
their effective mapped forms). The patch:

* Maps legacy backend names to their new “effective” type before any
comparison or storage.
* Stores the full `version/effectiveType` string instead of just the
type, ensuring the configuration and localStorage stay in sync.
* Updates all logging and warning messages to reference the effective
backend type.
* Simplifies the update check logic by comparing the effective type and
version together, preventing unnecessary migrations.

These changes eliminate bugs that occurred when the backend type
changed after an update and make the internal state more coherent.

* refactor: improve CUDA detection and migrate legacy libs

Enhance `_isCudaInstalled` to accept the backend directory and CUDA version, checking both the new and legacy installation paths. If a library is found in the old location, move it to the new `build/bin` directory and create any missing folders. Update `mapOldBackendToNew` formatting and remove duplicated comments. Minor consistency and readability fixes were also applied throughout the backend module.

* refactor: broaden llama backend archive regex

This update expands the regular expression used to parse llama‑cpp extension archives.
The new pattern now supports:
- Optional prefixes and the `-main` segment
- Version strings that include a hash suffix
- An optional `-cudart-llama` part
- A wide range of backend detail strings

These changes ensure `installBackend` can correctly handle the latest naming conventions (e.g., `k_llama-main-b4314-09c61e1-bin-win-cuda-12.8-x64-avx2.zip`) while preserving backward compatibility with older formats.
2025-11-13 09:12:59 +05:30
Louis
384eed56e6 feat: add backend migration mapping and update backend handling (#6917) (#6920)
Added `mapOldBackendToNew` to translate legacy backend strings (e.g., `win-avx2-x64`, `win-avx512-cuda-cu12.0-x64`) into the new unified names (`win-common_cpus-x64`, `win-cuda-12-common_cpus-x64`). Updated backend selection, installation, and download logic to use the mapper, ensuring consistent naming across the extension and tests. Updated tests to verify the mapping, new download items, and correct extraction paths. Minor formatting updates to the Tauri command file for clearer logging. This change enables smoother migration for stored user preferences and reduces duplicate asset handling.

Co-authored-by: Akarshan Biswas <akarshan@menlo.ai>
2025-11-11 11:57:43 +07:00
Vanalite
a951e6d7d7 chore: address PR comments 2025-11-05 16:10:37 +07:00
Akarshan Biswas
047483f815 feat: add configurable timeout for llamacpp connections (#6872)
* feat: add configurable timeout for llamacpp connections

This change introduces a user-configurable read/write timeout (in seconds) for llamacpp connections, replacing the hard-coded 600s value. The timeout is now settable via the extension settings and used in both HTTP requests and server readiness checks. This provides flexibility for different deployment scenarios, allowing users to adjust connection duration based on their specific use cases while maintaining the default 10-minute timeout behavior.

* fix: correct timeout conversion factor and clarify settings description

The previous timeout conversion used `timeout * 100` instead of `timeout * 1000`, which incorrectly shortened the timeout to 1/10 of the intended value (e.g., 10 minutes became 1 minute). This change corrects the conversion factor to milliseconds. Additionally, the settings description was updated to explicitly state that this timeout applies to both connection and load operations, improving user understanding of its scope.

* style: replace loose equality with strict equality in key comparison

This change updates the comparison operator from loose equality (`==`) to strict equality (`===`) when checking for the 'timeout' key. While the key is always a string in this context (making the behavior identical), using strict equality prevents potential type conversion issues and adheres to JavaScript best practices for reliable comparisons.
2025-11-04 17:34:58 +05:30
Akarshan Biswas
b2a8efd799 refactor: Simplify Tauri plugin calls and update 'FA' setting (#6779)
* refactor: Simplify Tauri plugin calls and enhance 'Flash Attention' setting

This commit introduces significant improvements to the llama.cpp extension, focusing on the 'Flash Attention' setting and refactoring Tauri plugin interactions for better code clarity and maintenance.

The backend interaction is streamlined by removing the unnecessary `libraryPath` argument from the Tauri plugin commands for loading models and listing devices.

* **Simplified API Calls:** The `loadLlamaModel`, `unloadLlamaModel`, and `get_devices` functions in both the extension and the Tauri plugin now manage the library path internally based on the backend executable's location.
* **Decoupled Logic:** The extension (`src/index.ts`) now uses the new, simplified Tauri plugin functions, which enhances modularity and reduces boilerplate code in the extension.
* **Type Consistency:** Added `UnloadResult` interface to `guest-js/index.ts` for consistency.

* **Updated UI Control:** The 'Flash Attention' setting in `settings.json` is changed from a boolean checkbox to a string-based dropdown, offering **'auto'**, **'on'**, and **'off'** options.
* **Improved Logic:** The extension logic in `src/index.ts` is updated to correctly handle the new string-based `flash_attn` configuration. It now passes the string value (`'auto'`, `'on'`, or `'off'`) directly as a command-line argument to the llama.cpp backend, simplifying the version-checking logic previously required for older llama.cpp versions. The old, complex logic tied to specific backend versions is removed.

This refactoring cleans up the extension's codebase and moves environment and path setup concerns into the Tauri plugin where they are most relevant.

* feat: Simplify backend architecture

This commit introduces a functional flag for embedding models and refactors the backend detection logic for cleaner implementation.

Key changes:

 - Embedding Support: The loadLlamaModel API and SessionInfo now include an isEmbedding: boolean flag. This allows the core process to differentiate and correctly initialize models intended for embedding tasks.

 - Backend Naming Simplification (Refactor): Consolidated the CPU-specific backend tags (e.g., win-noavx-x64, win-avx2-x64) into generic *-common_cpus-x64 variants (e.g., win-common_cpus-x64). This streamlines supported backend detection.

 - File Structure Update: Changed the download path for CUDA runtime libraries (cudart) to place them inside the specific backend's directory (/build/bin/) rather than a shared lib folder, improving asset isolation.

* fix: compare

* fix mmap settings and adjust flash attention

* fix: correct flash_attn and main_gpu flag checks in llamacpp extension

Previously the condition for `flash_attn` was always truthy, causing
unnecessary or incorrect `--flash-attn` arguments to be added. The
`main_gpu` check also used a loose inequality which could match values
that were not intended. The updated logic uses strict comparison and
correctly handles the empty string case, ensuring the command line
arguments are generated only when appropriate.
2025-11-01 23:30:49 +05:30
Minh141120
15c426aefc chore: update org name 2025-10-28 17:26:27 +07:00
Dinh Long Nguyen
340042682a ui ux enhancement 2025-10-09 03:48:51 +07:00
Akarshan
7762cea10a feat: Distinguish and preserve embedding model sessions
This commit introduces a new field, `is_embedding`, to the `SessionInfo` structure to clearly mark sessions running dedicated embedding models.

Key changes:
- Adds `is_embedding` to the `SessionInfo` interface in `AIEngine.ts` and the Rust backend.
- Updates the `loadLlamaModel` command signatures to pass this new flag.
- Modifies the llama.cpp extension's **auto-unload logic** to explicitly **filter out** and **not unload** any currently loaded embedding models when a new text generation model is loaded. This is a critical performance fix to prevent the embedding model (e.g., used for RAG) from being repeatedly reloaded.

Also includes minor code style cleanup/reformatting in `jan-provider-web/provider.ts` for improved readability.
2025-10-08 20:03:35 +05:30
Dinh Long Nguyen
ff93dc3c5c Merge branch 'dev' into feat/file-attachment 2025-10-08 16:34:45 +07:00
Dinh Long Nguyen
510c4a5188 working attachments 2025-10-08 16:08:40 +07:00
Louis
fe2c2a8687 Merge branch 'dev' into release/v0.7.0
# Conflicts:
#	web-app/src/containers/DropdownModelProvider.tsx
#	web-app/src/containers/ThreadList.tsx
#	web-app/src/containers/__tests__/DropdownModelProvider.displayName.test.tsx
#	web-app/src/hooks/__tests__/useModelProvider.test.ts
#	web-app/src/hooks/useChat.ts
#	web-app/src/lib/utils.ts
2025-10-06 20:42:05 +07:00
Vanalite
fa61163350 fix: Fix openssl issue on mobile after merging 2025-10-05 14:40:39 +07:00
Akarshan Biswas
0f0ba43b7f feat: Adjust RAM/VRAM calculation for unified memory systems (#6687)
* feat: Adjust RAM/VRAM calculation for unified memory systems

This commit refactors the logic for calculating **total RAM** and **total VRAM** in `is_model_supported` and `plan_model_load` commands, specifically targeting systems with **unified memory** (like modern macOS devices where the GPU list may be empty).

The changes are as follows:

* **Total RAM Calculation:** If no GPUs are detected (`sys_info.gpus.is_empty()` is true), **total RAM** is now set to $0$. This avoids confusing total system memory with dedicated GPU memory when planning model placement.
* **Total VRAM Calculation:** If no GPUs are detected, **total VRAM** is still calculated as the system's **total memory (RAM)**, as this shared memory acts as VRAM on unified memory architectures.

This adjustment improves the accuracy of memory availability checks and model planning on unified memory systems.

* fix: total usable memory in case there is no system vram reported

* chore: temporarily change to self-hosted runner mac

* ci: revert back to github hosted runner macos

---------

Co-authored-by: Louis <louis@jan.ai>
Co-authored-by: Minh141120 <minh.itptit@gmail.com>
2025-10-01 18:58:14 +07:00
Roushan Kumar Singh
247db95bad resolve TypeScript and Rust warnings (#6612)
* chore: fix warnings

* fix: add missing scrollContainerRef dependencies to React hooks

* fix: typo

* fix: remove unsupported fetch option and enable AsyncIterable types

- Removed `connectTimeout` from fetch init (not supported in RequestInit)
- Updated tsconfig to target ES2018

* chore: refactor rename

* fix(hooks): update dependency arrays for useThreadScrolling effects

* Add type.d.ts to extend requestinit with connectionTimeout

* remove commentd unused import
2025-10-01 16:06:41 +07:00
Vanalite
262a1a9544 Merge remote-tracking branch 'origin/dev' into mobile/dev
# Conflicts:
#	src-tauri/src/core/setup.rs
#	src-tauri/src/lib.rs
#	web-app/src/hooks/useChat.ts
2025-10-01 09:52:01 +07:00
Dinh Long Nguyen
e6bc1182a6 Merge branch 'dev' into feat/sync-release=to-dev 2025-09-30 22:04:27 +07:00
Vanalite
43d20e2a32 fix: revert the modification of vulkan 2025-09-30 14:50:54 +07:00
Akarshan
34b254e2d8 fix: Improve KV cache estimation robustness
The KV cache size calculation in estimate_kv_cache_internal now includes a fallback mechanism for models that do not explicitly define key_length and value_length in the GGUF metadata.

If these attention keys are missing, the head dimension (and thus key/value length) is calculated using the formula embedding_length / total_heads. This improves robustness and compatibility with GGUF models that don't have the proper keys in metadata.

Also adds logging of the full model metadata for easier debugging of the estimation process.
2025-09-30 11:14:18 +05:30
Vanalite
549c962248 fix: Fix nvidia and vulkan after upgrade to be compatible with mobile compiling too 2025-09-30 09:44:21 +07:00
Vanalite
5e57caee43 Merge remote-tracking branch 'origin/dev' into mobile/dev
# Conflicts:
#	extensions/yarn.lock
#	package.json
#	src-tauri/plugins/tauri-plugin-hardware/src/vendor/vulkan.rs
#	src-tauri/src/lib.rs
#	yarn.lock
2025-09-29 22:22:00 +07:00