ReadPDFFile V2 gives error when reading PDF file

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Please sign in to see details of an important advisory in our Customer Advisories area.

ReadPDFFile V2 gives error when reading PDF file

L2 Linker

Hi everyone,

 

I was trying to make a playbook to extract indicators (Hash values, domains, IP addresses) from a PDF file. I tried to use the ReadPDFFile V2 utility, however it gives the below error on 2 of the PDF files I tried.

 

Command: 
!ReadPDFFileV2 entryID="29@14" maxImages="20" auto-extract="inline"(Scripts)
Reason
Could not load pdf file in EntryID 29@14 Error: 'http://www.w3.org/1999/xhtml'
 
Any idea how I can resolve this?
I though it was the PDF version at first (the original file is 1.5), so i tried converting to v1.8 and still it failed.
 
Thanks in advance.
1 accepted solution

Accepted Solutions

L2 Linker

jwilkes_0-1644418739377.png

Yes, this looks related.  Glad to help!

 

View solution in original post

7 REPLIES 7

L2 Linker

Good morning, is 'http://www.w3.org/1999/xhtml' an indicator inside the PDF?

Hi @jwilkes 

 

It is not present in the file. There are 2 other domains not related to this and a couple of hashes and CVE.

 

 

L2 Linker

Do other PDFs with the same PDF version work with ReadPDFFileV2 ?

Hi @jwilkes 

 

I tried another file, the PDF present in the below site as testing. That also failed with the same error. 

https://www.ncsc.gov.uk/news/indicators-of-compromise-for-malware-used-by-apt28

 

Since I am experimenting I am using the playbook "Phishing - Generic v3". The ReadPDFFilev2 is present in it and it stops there with the above error.

 

L2 Linker

@pottapitot ,

I tried that same file for my XSOAR instance (6.2) and it failed as well.  Can you please create a support ticket to investigate this further?  I know that ReadPDFFileV2 uses the linux utility "pdftohtml" and maybe there are some limitations.

I know we have found limitations before but they were with PDF encryption: https://xsoar.ideas.aha.io/ideas/FR-I-1397

@jwilkes 

 

Wow! I am impressed by the speed of the resolution. I was looking through the automation script after my previous post when I noticed there was an update for the ReadPDFFileV2 script. I updated the automation and tried it again. Now it works perfectly!

I am not sure how you did it but once again thanks alot! 😃

L2 Linker

jwilkes_0-1644418739377.png

Yes, this looks related.  Glad to help!

 

  • 1 accepted solution
  • 3327 Views
  • 7 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!