Friday, April 29, 2011

Using VBA to parse text in an MS Word document

Hi, I was hoping someone could help with a MS Word Macro.

Basically, I have a MS Word document which lists out several text files and specific pages of interest in each file.

The file format is similar to:

textdocument1.txt              P. 6, 12 - issue1
textdocument2.txt              P. 5 - issue1
                               P. 13, 17 - issue3
textdocument3.txt              P. 10

I want to read each line into my Macro as a string.

Then traverse through it to identify the file name. With the file name, I can then open the file, go to the page number, and copy the data I need.

But I'm stuck at step 1, how do I capture the line into a string in an MS Word Macro?

Any help will be appreciated.

From stackoverflow
  • If your word document lists all the text files like this:

    <name>{tab}<page ref>{newline}
    <name>{tab}<page ref>{newline}
    <name>{tab}<page ref>{newline}
    

    Then all the lines are available in the Paragraphs collection. You can loop through that with a simple For Each loop:

    Dim p As Paragraph
    
    For Each p In ActiveDocument.Paragraphs
      Debug.Print p.Range.Text
    Next p
    
  • The following code should get you started:

    Public Sub ParseLines()
        Dim singleLine As Paragraph
        Dim lineText As String
    
        For Each singleLine In ActiveDocument.Paragraphs
            lineText = singleLine.Range.Text
    
            '// parse the text here...
    
        Next singleLine
    End Sub
    

    I found the basic algorithm in this article.

    Anonymous Type : This will break the document up into paragraphs. If you want sentences do this per line (i.e. sentence) check my answer below.
  • per line

    Public Sub ParseDoc()
    
        Dim doc As Document
        Set doc = ActiveDocument
        Dim paras As Paragraphs
        Set paras = doc.Paragraphs
        Dim para As Paragraph
        Dim sents As Sentences
        Dim sent As Range
        For Each para In paras
    
            Set sents = para.Range.Sentences
            For Each sent In sents
                Debug.Print sent.Text
            Next
    
        Next
    
    End Sub
    

0 comments:

Post a Comment