anythingllm

Files

Yitong Li 2f7a818744 fix(collector): infer file extension from Content-Type for URLs without explicit extensions (#5252 )

* fix(collector): infer file extension from Content-Type for URLs without explicit extensions

When downloading files from URLs like https://arxiv.org/pdf/2307.10265,
the path has no recognizable file extension. The downloaded file gets
saved without an extension (or with a nonsensical one like .10265),
causing processSingleFile to reject it with 'File extension .10265
not supported for parsing'.

Fix: after downloading, check if the filename has a supported file
extension. If not, inspect the response Content-Type header and map
it to the correct extension using the existing ACCEPTED_MIMES table.

For example, a response with Content-Type: application/pdf will cause
the file to be saved with a .pdf extension, allowing it to be processed
correctly.

Fixes #4513

* small refactor

---------

Co-authored-by: Timothy Carambat <rambat1010@gmail.com>

2026-03-23 09:40:22 -07:00

convert

Patch dev pupeeteer crash for MacOS 15 (#4713 )

2025-12-05 12:11:32 -08:00

helpers

fix(collector): infer file extension from Content-Type for URLs without explicit extensions (#5252 )

2026-03-23 09:40:22 -07:00

index.js

missed lint

2025-10-08 12:57:31 -07:00