Thursday, April 28, 2011

Can you grep a file using a regular expression and only output the matching part of a line?

I have a log file which contains a number of error lines, such as:

Failed to add email@test.com to database

I can filter these lines with a single grep call:

grep -E 'Failed to add (.*) to database'

This works fine, but what I'd really like to do is have grep (or another Unix command I pass the output into) only output the email address part of the matched line.

Is this possible?

From stackoverflow
  • You can use sed:

    grep -E 'Failed to add (.*) to database'| sed 's/'Failed to add \(.*\) to database'/\1'
    
    bortzmeyer : Using the -o option of grep is simpler...
  • or python:

    cat file | python -c "import re, sys; print '\r\n'.join(re.findall('add (.*?) to', sys.stdin.read()))"
    
  • sed is fine without grep:

    sed -n 's/Failed to add \(.*\) to database/\1/p' filename
    
    RandomNickName42 : Surely he could of used awk also !!
  • This should do the job:

    grep -x -e '(?<=Failed to add ).+?(?= to database)'
    

    It uses a positive look-ahead assertion, followed by the match for the email address, followed by a postivie look-behind assertion. This insures that it matches the entire line, but only actually consumes (and thus returns) the email address part.

    The -x option specifies that grep should match lines rather than the whole text.

  • If you want to use grep, it would be more appropiate to use egrep;

    About egrep
    
    Search a file for a pattern using full regular expressions.
    

    grep will not always have as complete of functionality for regex.

    bortzmeyer : He already uses egrep since he uses -E. NOthing to do with the problem which is controlling output.
    RandomNickName42 : What are you talking about? If you see the "tag", he's asking about *UNIX* grep, which is not (as your answer suggest's) GNU-Everywhere, refer to http://www.softpanorama.org/Tools/Grep/grep_reference.shtml for some review of the various grep version's on UNIX (NOT GNU GREP), what you will see in black and white, "Limited regex - grep", "Extended regex - egrep". So _REGARDLESS_ of the fact that GNU grep may be (is) better, it's not going to be something you can *always* count on, being deployed and being available for all your scripts. My entire point is that you can not count on grep"basic"
    bortzmeyer : I fail to see the point. The OP said nothing about the OS he uses except "Unix". So it can be an Unix where GNU grep is the default (Debian, for instance) or an Unix where GNU grep could be installed immediately with one command (NetBSD with pkg_add textproc/grep)
  • Recent versions of GNU grep have a -o option which does exactly what you want. ( -o is for --only-matching).

    bortzmeyer : Two downvotes without one comment. There are really stupid monkeys on SO.

0 comments:

Post a Comment