hwp5proc
: HWPv5 processor¶
Do various operations on HWPv5 files.
usage: hwp5proc [-h] [--loglevel LOGLEVEL] [--logfile LOGFILE]
{version,header,summaryinfo,ls,cat,unpack,records,models,find,xml,rawunz,diststream}
...
Named Arguments¶
- --loglevel
Set log level.
- --logfile
Set log file.
Subcommands¶
version¶
Print the file format version of .hwp files.
Print the file format version of <hwp5file>.
usage: hwp5proc version [-h] <hwp5file>
Positional Arguments¶
- <hwp5file>
.hwp file to analyze
header¶
Print file headers of .hwp files.
Print the file header of <hwp5file>.
usage: hwp5proc header [-h] <hwp5file>
Positional Arguments¶
- <hwp5file>
.hwp file to analyze
summaryinfo¶
Print summary informations of .hwp files.
Print the summary information of <hwp5file>.
usage: hwp5proc summaryinfo [-h] <hwp5file>
Positional Arguments¶
- <hwp5file>
.hwp file to analyze
ls¶
List streams in .hwp files.
List streams in the <hwp5file>.
usage: hwp5proc ls [-h] [--vstreams | --ole] <hwp5file>
Positional Arguments¶
- <hwp5file>
.hwp file to analyze
Named Arguments¶
- --vstreams
Process with virtual streams (i.e. parsed/converted form of real streams)
Default: False
- --ole
Treat <hwp5file> as an OLE Compound File. As a result, some streams will be presented as-is. (i.e. not decompressed)
Default: False
cat¶
Extract out internal streams of .hwp files
Extract out the specified stream in the <hwp5file> to the standard output.
usage: hwp5proc cat [-h] [--vstreams | --ole] <hwp5file> <stream>
Positional Arguments¶
- <hwp5file>
.hwp file to analyze
- <stream>
Internal path of a stream to extract
Named Arguments¶
- --vstreams
Process with virtual streams (i.e. parsed/converted form of real streams)
Default: False
- --ole
Treat <hwp5file> as an OLE Compound File. As a result, some streams will be presented as-is. (i.e. not decompressed)
Default: False
Example:
$ hwp5proc cat samples/sample-5017.hwp BinData/BIN0002.jpg | file -
$ hwp5proc cat samples/sample-5017.hwp BinData/BIN0002.jpg > BIN0002.jpg
$ hwp5proc cat samples/sample-5017.hwp PrvText | iconv -f utf-16le -t utf-8
$ hwp5proc cat --vstreams samples/sample-5017.hwp PrvText.utf8
$ hwp5proc cat --vstreams samples/sample-5017.hwp FileHeader.txt
ccl: 0
cert_drm: 0
cert_encrypted: 0
cert_signature_extra: 0
cert_signed: 0
compressed: 1
distributable: 0
drm: 0
history: 0
password: 0
script: 0
signature: HWP Document File
version: 5.0.1.7
xmltemplate_storage: 0
unpack¶
Extract out internal streams of .hwp files into a directory.
Extract out streams in the specified <hwp5file> to a directory.
usage: hwp5proc unpack [-h] [--vstreams | --ole] <hwp5file> [<out-directory>]
Positional Arguments¶
- <hwp5file>
.hwp file to analyze
- <out-directory>
Output directory
Named Arguments¶
- --vstreams
Process with virtual streams (i.e. parsed/converted form of real streams)
Default: False
- --ole
Treat <hwp5file> as an OLE Compound File. As a result, some streams will be presented as-is. (i.e. not decompressed)
Default: False
Example:
$ hwp5proc unpack samples/sample-5017.hwp
$ ls sample-5017
Example:
$ hwp5proc unpack --vstreams samples/sample-5017.hwp
$ cat sample-5017/PrvText.utf8
records¶
Print the record structure of .hwp file record streams.
Print the record structure of the specified stream.
usage: hwp5proc records [-h]
[--simple | --json | --raw | --raw-header | --raw-payload]
[--range <range> | --treegroup <treegroup>]
[<hwp5file>] [<record-stream>]
Positional Arguments¶
- <hwp5file>
.hwp file to analyze
- <record-stream>
Record-structured internal streams. (e.g. DocInfo, BodyText/*)
Named Arguments¶
- --simple
Print records as simple tree
Default: False
- --json
Print records as json
Default: False
- --raw
Print records as is
Default: False
- --raw-header
Print record headers as is
Default: False
- --raw-payload
Print record payloads as is
Default: False
- --range
Specifies the range of the records. N-M means “from the record N to M-1 (excluding M)” N means just the record N
- --treegroup
Specifies the N-th subtree of the record structure.
Example:
$ hwp5proc records samples/sample-5017.hwp DocInfo
Example:
$ hwp5proc records samples/sample-5017.hwp DocInfo --range=0-2
If neither <hwp5file> nor <record-stream> is specified, the record stream is read from the standard input with an assumption that the input is in the format version specified by -V option.
Example:
$ hwp5proc records --raw samples/sample-5017.hwp DocInfo --range=0-2 > tmp.rec
$ hwp5proc records < tmp.rec
models¶
Print parsed binary models of .hwp file record streams.
Print parsed binary models in the specified <record-stream>.
usage: hwp5proc models [-h] [--file-format-version <version>]
[--simple | --json | --format <format> | --events]
[--treegroup <treegroup> | --seqno <treegroup>]
[<hwp5file>] [<record-stream>]
Positional Arguments¶
- <hwp5file>
.hwp file to analyze
- <record-stream>
Record-structured internal streams. (e.g. DocInfo, BodyText/*)
Named Arguments¶
- --file-format-version, -V
Specifies HWPv5 file format version of the standard input stream
- --simple
Print records as simple tree
Default: False
- --json
Print records as json
Default: False
- --format
Print records formatted
- --events
Print records as events
Default: False
- --treegroup
Specifies the N-th subtree of the record structure.
- --seqno
Print a model of <seqno>-th record
Example:
$ hwp5proc models samples/sample-5017.hwp DocInfo
$ hwp5proc models samples/sample-5017.hwp BodyText/Section0
$ hwp5proc models samples/sample-5017.hwp docinfo
$ hwp5proc models samples/sample-5017.hwp bodytext/0
Example:
$ hwp5proc models --simple samples/sample-5017.hwp bodytext/0
$ hwp5proc models --format='%(level)s %(tagname)s\\n' \\
samples/sample-5017.hwp bodytext/0
Example:
$ hwp5proc models --simple --treegroup=1 samples/sample-5017.hwp bodytext/0
$ hwp5proc models --simple --seqno=4 samples/sample-5017.hwp bodytext/0
If neither <hwp5file> nor <record-stream> is specified, the record stream is read from the standard input with an assumption that the input is in the format version specified by -V option.
Example:
$ hwp5proc cat samples/sample-5017.hwp BodyText/Section0 > Section0.bin
$ hwp5proc models -V 5.0.1.7 < Section0.bin
find¶
Find record models with specified predicates.
Find record models with specified predicates.
usage: hwp5proc find [-h] [--from-stdin]
[--model <model-name> | --tag <hwptag>] [--incomplete]
[--format <format>] [--dump]
[<hwp5files> [<hwp5files> ...]]
Positional Arguments¶
- <hwp5files>
.hwp files to analyze
Named Arguments¶
- --from-stdin
get filenames from stdin
Default: False
- --model
filter with record model name
- --tag
filter with record HWPTAG
- --incomplete
filter with incompletely parsed content
Default: False
- --format
record output format
- --dump
dump record
Default: False
Example: Find paragraphs:
$ hwp5proc find --model=Paragraph samples/*.hwp
$ hwp5proc find --tag=HWPTAG_PARA_TEXT samples/*.hwp
$ hwp5proc find --tag=66 samples/*.hwp
Example: Find and dump records of HWPTAG_LIST_HEADER
which is parsed
incompletely:
$ hwp5proc find --tag=HWPTAG_LIST_HEADER --incomplete --dump samples/*.hwp
xml¶
Transform .hwp files into an XML.
Transform <hwp5file> into an XML.
usage: hwp5proc xml [-h] [--embedbin] [--no-xml-decl] [--output <file>]
[--format <format>] [--no-validate-wellformed]
<hwp5file>
Positional Arguments¶
- <hwp5file>
.hwp file to analyze
Named Arguments¶
- --embedbin
Embed BinData/* streams in the output XML.
Default: False
- --no-xml-decl
Do not output <?xml … ?> XML declaration.
Default: False
- --output
Output filename.
- --format
“flat”, “nested” (default: “nested”)
- --no-validate-wellformed
Do not validate well-formedness of output.
Default: False
Example:
$ hwp5proc xml samples/sample-5017.hwp > sample-5017.xml
$ xmllint --format sample-5017.xml
With --embedbin
option, you can embed base64-encoded BinData/*
files in
the output XML.
Example:
$ hwp5proc xml --embedbin samples/sample-5017.hwp > sample-5017.xml
$ xmllint --format sample-5017.xml
rawunz¶
Deflate an headerless zlib-compressed stream.
Deflate an headerless zlib-compressed stream
usage: hwp5proc rawunz [-h]
diststream¶
Decode a distribute document stream.
Decode a distribute document stream.
usage: hwp5proc diststream [-h] [--sha1 | --key] [--raw]
Named Arguments¶
- --sha1
Print SHA-1 value for decryption.
Default: False
- --key
Print decrypted key.
Default: False
- --raw
Print raw binary objects as is.
Default: False