| name | qdrant-doc-deleter |
| description | Deletes specific PDF documents from Qdrant vector database collection. Use when user wants to remove, delete, or clean up PDF documents from the vector database, Qdrant collection, or needs to manage document versions. |
Qdrant Document Deleter
This skill helps users delete specific PDF documents from the Qdrant vector database collection using the existing UTIL/delete_documents.py script.
When to Use This Skill
Activate this skill automatically when the user:
- Wants to delete/remove PDF documents from Qdrant
- Needs to clean up old document versions (e.g., 2024 vs 2025)
- Asks to remove specific documents from the vector database
- Wants to manage or prune the Qdrant collection
- Uses keywords like "delete documents", "remove from Qdrant", "clean up PDFs"
How to Use
Step 1: Identify Documents to Delete
Ask the user which documents they want to delete. Documents should be specified by their PDF filename (e.g., document-name.pdf).
Step 2: Review the Script
Read UTIL/delete_documents.py to check the current DOCUMENTS_TO_DELETE list:
DOCUMENTS_TO_DELETE = [
'file1.pdf',
'file2.pdf',
# ...
]
Step 3: Update the Document List
Edit the DOCUMENTS_TO_DELETE list in the script to include the documents the user wants to delete:
from Edit import Edit
Edit(
file_path='/home/tromanow/COHORT/TX/UTIL/delete_documents.py',
old_string='DOCUMENTS_TO_DELETE = [\n # Current list...\n]',
new_string='DOCUMENTS_TO_DELETE = [\n # Documents to delete\n "document1.pdf",\n "document2.pdf",\n]'
)
Step 4: Explain the Process to the User
Before running, explain:
- How many documents will be deleted
- What the script does (searches for chunks with matching filename)
- That it requires confirmation (user must type "yes")
- That deletion is permanent
Step 5: Run the Script
Execute the deletion script:
cd UTIL
python delete_documents.py
The script will:
- Display the documents to be deleted
- Ask for confirmation ("yes" to proceed)
- Scroll through all chunks in the collection
- Delete chunks where
filenamematches the document names - Show deletion statistics
Step 6: Report Results
After execution, summarize:
- Number of documents processed
- Number of chunks deleted per document
- Any errors or warnings
- Total chunks removed from the collection
Key Features of the Script
Safety Features
- Confirmation required: User must type "yes" to proceed
- Detailed logging: Shows progress for each document
- Error handling: Continues processing even if one document fails
- Statistics: Reports exact number of chunks deleted
How It Works
- Scroll through collection: Uses Qdrant's scroll API to iterate through all points
- Filter by filename: Checks each chunk's
filenamefield in payload - Batch delete: Collects all matching point IDs and deletes them
- No reloading needed: Surgically removes specific documents without affecting others
Configuration
The script uses settings from LOAD_DB/config.py:
QDRANT_COLLECTION_NAME_CONTEXTUAL: The collection to delete fromQDRANT_API_URL: Qdrant server URLQDRANT_API_KEY: API key for authentication
Example Workflow
Example 1: Delete Outdated 2024 Documents
User: "I need to delete the 2024 versions of the payment rate PDFs"
Claude: I'll help you delete the 2024 payment rate documents from Qdrant.
Based on the existing script, I can see there are 2024 documents listed. Let me update the script to delete:
- wd-24-23-att1-twc.pdf
- acf-218-qpr-ffy-2024-for-texas.pdf
- texas-early-learning-strategic-plan-2024-2026-final-accessible.pdf
[Updates DOCUMENTS_TO_DELETE list]
Now I'll run the deletion script. You'll need to confirm by typing "yes" when prompted.
[Runs script]
✓ Deleted 3 documents:
- wd-24-23-att1-twc.pdf: 45 chunks
- acf-218-qpr-ffy-2024-for-texas.pdf: 32 chunks
- texas-early-learning-strategic-plan-2024-2026-final-accessible.pdf: 78 chunks
Total: 155 chunks removed from the collection.
Example 2: Delete Specific Document
User: "Remove tx-ccdf-state-plan-ffy2025-2027-approved.pdf from the database"
Claude: I'll delete tx-ccdf-state-plan-ffy2025-2027-approved.pdf from the Qdrant collection.
[Updates script with single document]
Running deletion script...
✓ Deleted tx-ccdf-state-plan-ffy2025-2027-approved.pdf: 124 chunks removed
Error Handling
Document Not Found
If a document has 0 chunks deleted:
- Explain that the document wasn't found in the collection
- Suggest checking the filename spelling
- Offer to verify collection contents with
verify_qdrant.py
Deletion Errors
If deletion fails for a document:
- Report the error message
- Continue with remaining documents
- Suggest checking Qdrant connection/credentials
No Confirmation
If user doesn't type "yes":
- Script exits safely
- No changes made to collection
Advanced Usage
Verify Before Deleting
Before deletion, check if documents exist:
cd LOAD_DB
python verify_qdrant.py
This shows all documents and chunk counts in the collection.
Check Specific Document
To find chunks for a specific document before deleting, you can use the retriever:
# Search for chunks from a specific document
from chatbot.retriever import Retriever
retriever = Retriever()
# Use scroll API to filter by filename
Notes
- Permanent deletion: There's no undo - chunks are permanently removed
- Collection name: Defaults to
QDRANT_COLLECTION_NAME_CONTEXTUALfrom config - Filename matching: Exact match on the
filenamefield in chunk payload - Batch processing: Processes in batches of 100 for efficiency
- Independent operation: Doesn't affect other documents in the collection
Related Tools
load_pdf_qdrant.py: Load new documents to Qdrantverify_qdrant.py: Verify collection contents and statisticsreload_single_pdf.py: Reload a single PDF (delete + re-add)