Debug Guide: PDF-to-MD 500 Internal Server Error
Generated: 2025-07-23
Status: In Progress
Endpoint: https://convert-to-markdown.knowcode.tech/pdf-to-md
Error: 500 Internal Server Error
Issue Description
The pdf-to-md endpoint is returning a 500 Internal Server Error when attempting to convert PDF files to Markdown.
Debug Steps
1. Check Cloud Function Logs
First, let's view the recent logs to identify the exact error:
# View recent logs
gcloud functions logs read pdf-to-md --region=us-east4 --project=convert-to-markdown-us-east4 --limit=50
# Stream logs in real-time (open in separate terminal)
gcloud functions logs read pdf-to-md --region=us-east4 --project=convert-to-markdown-us-east4 --follow
2. Function Configuration Check
Verify the function configuration:
gcloud functions describe pdf-to-md --region=us-east4 --project=convert-to-markdown-us-east4
Expected configuration:
- Memory: 512MB
- Timeout: 60s
- Runtime: nodejs20
- Entry point: pdfToMarkdown
3. Local Testing
Test the function locally to isolate cloud-specific issues:
cd /Users/lindsaysmith/Documents/lambda1.nosync/xlsx-docx-ppt-convert-to-md
# Start local function server
npx @google-cloud/functions-framework --target=pdfToMarkdown --port=8080
# In another terminal, test with a PDF file
curl -X POST http://localhost:8080 -F "file=@test.pdf"
4. Common Error Points to Check
Based on source code analysis:
MIME Type Validation (
src/pdfConverterToMD.js:23
)- Must be
application/pdf
- Some PDFs may have different MIME types
- Must be
File Size Limit (
src/pdfConverterToMD.js:4
)- Maximum 5MB
- Check if file exceeds limit
PDF Parsing (
lib/converters/pdf.js:6
)- Uses
pdf-parse
library - May have compatibility issues with certain PDF versions
- Uses
Memory Issues
- Function allocated 512MB
- Large or complex PDFs may exceed memory
5. Test with Different PDF Types
Create test PDFs to isolate the issue:
# Create test directory
mkdir -p test-pdfs
cd test-pdfs
# Test 1: Simple text PDF
echo "Simple test content" > test-simple.txt
# Convert to PDF using online tool or local converter
# Test 2: PDF with tables
# Create a document with tables
# Test 3: Small PDF (<100KB)
# Test 4: Different PDF versions (1.4, 1.5, 1.7)
6. Debug Script Usage
Run the debug deployment script:
cd deploy-gcp
# Modify debug script for pdf-to-md
sed -i 's/docx-to-html/pdf-to-md/g' debug-deployment.sh
./debug-deployment.sh
# Check generated files
cat pdf-function-details.json
cat pdf-run-details.json
cat deployment-logs.json
7. Production Test
Use the official test script:
cd deploy-gcp
./07-test-public-functions.sh
# Look specifically for pdf-to-md test results
8. Enhanced Logging
If needed, add debug logging to key points:
In src/pdfConverterToMD.js
:
- Before file validation
- After file buffer creation
- Before calling convertPdfToMarkdown
In lib/converters/pdf.js
:
- Before pdf-parse call
- After successful parsing
- In catch blocks
9. Cloud Console Monitoring
Check in GCP Console:
- Cloud Functions → pdf-to-md → Metrics
- Look for:
- Error rate spikes
- Memory usage patterns
- Cold start frequency
- Request/response sizes
10. Common Solutions
Based on typical 500 errors:
Dependency Issues
npm list pdf-parse npm install pdf-parse@latest
Memory Increase
gcloud functions deploy pdf-to-md \ --memory=1GB \ --region=us-east4
Timeout Increase
gcloud functions deploy pdf-to-md \ --timeout=120s \ --region=us-east4
Error Log Analysis
Error Details
Error: Invalid file type. Only PDF files are allowed.
at Multipart.<anonymous> (/workspace/src/pdfConverterToMD.js:24:26)
The error occurs because the MIME type validation was too strict, only accepting exactly application/pdf
.
Root Cause
The MIME type check in pdfConverterToMD.js
line 23 was using:
if (!mimeType.includes('application/pdf'))
This fails when:
- Browsers send different MIME types (e.g.,
application/x-pdf
) - MIME type is missing or empty
- Different tools use variant PDF MIME types
Solution Applied
Updated the MIME type validation to accept multiple PDF MIME types and fallback to filename extension:
const validPdfTypes = ['application/pdf', 'application/x-pdf', 'application/acrobat',
'applications/vnd.pdf', 'text/pdf', 'text/x-pdf'];
const isPdf = validPdfTypes.some(type => mimeType && mimeType.includes(type)) ||
(filename && filename.toLowerCase().endsWith('.pdf'));
Also improved error message to show the actual MIME type received for debugging.
Test Results
Before Fix
- Error: 500 Internal Server Error
- Failure rate: 100%
- Error message: "Invalid file type. Only PDF files are allowed."
After Fix (Local Testing)
- Status: 200 OK
- Success rate: 100%
- Successfully converts PDF to Markdown
After Fix (Production)
- Cloud Run URL: Works perfectly (https://pdf-to-md-qpg64cvnga-uk.a.run.app)
- Public Domain: Still returns 500 error
- Root Cause: Domain mapping issue - the domain is mapped to
xlsx-converter
service only
Resolution
The fix is working correctly at the Cloud Run level. The 500 error on the public domain is because:
- The domain
convert-to-markdown.knowcode.tech
is mapped only to thexlsx-converter
service - Other functions (like
pdf-to-md
) are not accessible through the custom domain - They work via their individual Cloud Run URLs but not through the shared domain
Verified Working URLs
- Direct Cloud Run:
https://pdf-to-md-qpg64cvnga-uk.a.run.app
- Cloud Functions URL:
https://us-east4-convert-to-markdown-us-east4.cloudfunctions.net/pdf-to-md
- Custom Domain:
https://convert-to-markdown.knowcode.tech/pdf-to-md
(domain mapping issue)
Prevention
- Add better error handling
- Implement request validation
- Add health check endpoint
- Set up alerts for error rates > 1%
Related Issues
- Similar issues with other converters: [None identified yet]
- Dependencies requiring updates: [To be determined]
Next Steps: Execute steps 1-3 to gather initial error information.