Monday 14 October 2019

Regular Expressions

Regular Expressions (RE)

Write a program to check the password. (Conditions: it should be minimum 8 characters, should be number,special char and alpha.

A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern.

re.compile():
We can combine a regular expression pattern into pattern objects, which can be used for pattern matching. It also helps to search a pattern again without rewriting it.

#user defined exceptions and regular expressions
import re
class Error(Exception):
    """Super class for other exceptions"""
    pass
class NoSpecialChar(Error):
    """Raised when the string is too length"""
    pass
class PasswrdSize(Error):
    """Raised when the string is too length"""
    pass
class AlphaNumeric(Error):
    pass
class NoCapitalLetter(Error):
    pass
class NoSmallLetter(Error):
    pass
n=8
rex=re.compile('[@_!#$%^&*()<>?/|}{~:]')
while True:
    try:
        a=input("enter pwd: ")
        b=len(a)
        for c in a:
            if c.isupper():
                # s contains a capital letter
                # <do something>
                # one such letter is enough
                break
        else:
                raise NoCapitalLetter
        for c in a:
            if c.islower():
                break
        else:
                raise NoSmallLetter
           
        if b<n:
            raise PasswrdSize
        elif (rex.search(a)==None):
            raise NoSpecialChar
        elif (re.match(a,'123#$%abc'))==False:
            raise AlphaNumeric
       
        break
    except PasswrdSize:
        print("Password length should be 8")
    except NoSpecialChar:
        print("pwd must be atleast one special char")
    except AlphaNumeric:
        print("Pwd should be alpha numberic")
    except NoCapitalLetter:
        print("Atleast one capital letter must be there in pwd")
    except NoSmallLetter:
        print("Atlease one small letter should be there in pwd")
print("Password created Success")

"""output:
case1
enter pwd: rajendra@123
Atleast one capital letter must be there in pwd
case2
enter pwd: rajendra123
pwd must be atleast one special char
enter pwd:
case3:
enter pwd: Rajendra@123
Password created Success
case4:
enter pwd: rAjen$@123
Password created Success
CASE5:
enter pwd: RAJEnDRA@123
Password created Success
"""
       
RegularExpressions:
If you want to represent a group of words (strings) according to particular pattern then we should go for Regular Expressions
To perform :
1.   Validations
Ex: email validations, mobile validations,validating passwords, generating OTPs
mobile number validations:
xxxxxxxxxx(10 numbers)

But most of the people perform validation by using Javascript right, but why do we learn RegularExpressions in python, in side java script also people use regular expressions. Hence Regular expression is language independent concept. Even we can use this concept in java as well.

Applications of Regular Expressions:
2.   To develop pattern matching application
i.e In ms-word we use find command, which searches for a particular pattern…in Unix/Linux environment we use grep,egrep command
3.   Regular Expression plays a key role when we develop For Translators, Assemblers , Interpreters, compiler designing we use :
·       Lexical analysis: scanning or Tokenisation
·       Syntax analysis: i.e Parsing
, semantic analysis, Intermediate Code Generation, Code Optimisation, Target Code Generation These are the various important phases while we design the compiler
4.   To develop digital circuits we go for Regular Expressions: Finite Automata with output i.e Moore machine and Melay machine, Binary Adder

5.   To develop Communication protocols Ex: TCP/IP,…
Python have special predefined module to develop RegularExpressions(RE) i.e “re” module

re module:
This module have several predefined functions to apply regular expressions in our programs(applications)

1.   compile() function:
this function converts the desired search pattern into RegularExpression i.e RegEx object.

Ex:
import re #STEP 1
pattern=re.compile(‘carona’) #Step 2here pattern is a variable     #identifier so instead we can use any valid identifier for var
print(type(pattern))
output:
<class ‘_sre.SRE_carona’> #s means submodule of RE

2.   finditer() function: we can check the no of match patterns.
Ex:
import re
pattern=re.compile(‘carona’)
matcher=pattern.finditer(‘Entire world is suffering from corona virus,due to corona virus 20k people died with corona positive……..’)
#once we got the matcher object, we can call other predefined methods…
(2.1) start() method: returns start index of the match
(2.2) end() method: returns end+1 index of the match
(2.3) group() method:returns matched string/word

Write a pattern matching application:
Write a python program to find the particular pattern is available or not? If available where it is available, how many time it is available,
import re
ctr=0
pattern=re.compile("or")
matcher=pattern.finditer('corona 20k in . world')
for i in matcher:
    ctr+=1 #no of occurances
    print('match is available at index:',i.start())
print('no of occurances:',ctr)
"""output:
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
‘c’
o
r
‘o’
‘n’
‘a’
‘’
‘2’
‘0’
‘k
‘’
‘i’
‘n’
‘’
.
‘w’
o
r
‘l’
‘d’

match is available at index: 1
match is available at index: 17
no of occurances: 2
"""
Example: 2
import re
ctr=0
pattern=re.compile("or")
matcher=pattern.finditer('corona 20k in . world')
for i in matcher:
    ctr+=1 #no of occurances
    print('start:{},end:{},group:{}'.format(i.start(),i.end(),i.group()))
print('no of occurances:',ctr)
""""""output:
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
‘c’
o
r
‘o’
‘n’
‘a’
‘’
‘2’
‘0’
‘k
‘’
‘i’
‘n’
‘’
.
‘w’
o
r
‘l’
‘d’

corona 20k in . world
start:1,end:3,group:or
start:17,end:19,group:or
no of occurances: 2
"""
Character Classes:

Characters
Searches for
[abc]
a or b or c
[^abc]
It searches all the chars Except a,b and c
[a-z]
Any lower case alphabets
[A-Z]
Any upper case alphabets
[a-zA-Z]
Any alphabets
[0-9]
Any digits
[a-zA-Z0-9]
Alphanumerics
[^a-zA-Z0-9]
Special characters

Example program:
import re
a=re.finditer('[abc]','a7b@k9z')
for i in a:
    print(i.start(),'....',i.group())
"""
index     char name
start()    group()
0     .... a
2     .... b

Char ‘a’ is available at  0th index
Char ‘b’ is available at 2nd index
m.group() returns the matched pattern

"""
Example2:
#search special characters,d..z pattern in the 'a7b@k9z'
import re
a=re.finditer('[^abc]','a7b@k9z')
for i in a:
    print(i.start(),'....',i.group())
""":
index     char name
start()    group()
1 .... 7
3 .... @
4 .... k
5 .... 9
6 .... z
"""
Example 3
#search numbers, alphabets pattern in the 'a7b@k9z'
import re
a=re.finditer('[a-zA-Z0-9]','a7b@k9z')
for i in a:
    print(i.start(),'....',i.group())
""":
index     char name
start()    group()
0      .... a
1      .... 7
2      .... b
4      .... k
5      .... 9
6      .... z
"""
Example 4:
#search special characters in the 'a7b@k9z'
import re
a=re.finditer('[^a-zA-Z0-9]','a7b@k9z')
for i in a:
    print(i.start(),'....',i.group())
""":
index     char name
start()    group()
3....... .... @
"""
Predefined character classes:

Characters
Searches for
\s
Space character
\S
Except space characters
\d
Any digit
\D
Except digits
\w
Any word character(alpha numberic characters i.e [a-zA-Z0-9]
\W
Any character except word(special char) i.e [^a-zA-Z0-9]
.
Every character

Example 1:
#search space in the 'rajendra19@k9z'
import re
a=re.finditer('\s','rajendra19 @k9z')
for i in a:
    print(i.start(),'........',i.group())
""":
index     char name
start()    group()
10 ........ (space)
"""
Example 2:
#search except space characters in the 'rajendra19@k9z'
import re
a=re.finditer('\S','rajendra19 @k9z')
for i in a:
    print(i.start(),'........',i.group())
""":
index     char name
start()    group()
0 ........ r
1 ........ a
2 ........ j
3 ........ e
4 ........ n
5 ........ d
6 ........ r
7 ........ a
8 ........ 1
9 ........ 9
11 ........ @
12 ........ k
13 ........ 9
14 ........ z
"""
#Ex3: search only digits in the 'rajendra19@k9z'
import re
a=re.finditer('\d','rajendra19 @k9z')
for i in a:
    print(i.start(),'........',i.group())
""":
index     char name
start()    group()
8 ........ 1
9 ........ 9
13 ........ 9
"""
#Ex4: search except digits in the 'rajendra19@k9z'
import re
a=re.finditer('\D','rajendra19 @k9z')
for i in a:
    print(i.start(),'........',i.group())
""":
index     char name
start()    group()
0 ........ r
1 ........ a
2 ........ j
3 ........ e
4 ........ n
5 ........ d
6 ........ r
7 ........ a
10 ........ 
11 ........ @
12 ........ k
14 ........ z
"""
#Ex5: search any word alpha numeric in the 'rajendra19@k9z'
import re
a=re.finditer('\w','rajendra19 @k9z')
for i in a:
    print(i.start(),'........',i.group())
""":
index     char name
start()    group()
0 ........ r
1 ........ a
2 ........ j
3 ........ e
4 ........ n
5 ........ d
6 ........ r
7 ........ a
8 ........ 1
9 ........ 9
12 ........ k
13 ........ 9
14 ........ z
"""
#Ex6: search any word except alpha numeric in the 'rajendra19@k9z'
import re
a=re.finditer('\W','rajendra19 @k9z')
for i in a:
    print(i.start(),'........',i.group())
""":
index     char name
start()    group()
10 ........ 
11 ........ @
"""
#Ex7: search all the char in the 'rajendra19@k9z'
import re
a=re.finditer('.','rajendra19 @k9z')
for i in a:
    print(i.start(),'........',i.group())
""":
index     char name
start()    group()
0 ........ r
1 ........ a
2 ........ j
3 ........ e
4 ........ n
5 ........ d
6 ........ r
7 ........ a
8 ........ 1
9 ........ 9
10 ........ 
11 ........ @
12 ........ k
13 ........ 9
14 ........ z
"""
Quantifiers:
It can be used to specify the no of occurrences to match
Quantity means the no of occurrences,


Characters
Searches for
‘a’
Exactly  char ‘a’
‘a+’
Atleast one ‘a’

Any number of ‘a’s, including zero number as well
‘a?’
At most one ‘a’ : either one ‘a’ or zero number of a’s
a{n}
Exactly n no of ‘a’’s
a{m,n}
Minimum no of ‘a’’s and maximum no of ‘a’’s. ex: a{2,3}
[^a]
Except ‘a’ all the characters
^a
It will check whether the given target string starts with ‘a’ or not
$a
It will check whether the target string end with ‘a’ or not


#Ex8: Quantifiers search all the char in the 'rajendra19@k9z'
import re
a=re.finditer('a','rajendra19 @k9z') #exact ‘a’
for i in a:
    print(i.start(),'........',i.group())
""":
index     char name
start()    group()
1 ........ a
7 ........ a
"""
#Ex8: Quantifiers search all the char in the 'rajendra19@k9z'
import re
a=re.finditer('a+','raajendraaa19 @k9z')#atleast one 'a' in the sequence
for i in a:
    print(i.start(),'........',i.group())
""":
index     char name
start()    group()
1 ........ aa
8 ........ aaa
"""
#Ex9:
import re
a=re.finditer('a*','rajendra19 @k9z')#atleast one 'a' in the sequence
for i in a:
    print(i.start(),'........',i.group())
""":
index     char name
start()    group()
0 ........
1 ........ a
2 ........
3 ........
4 ........
5 ........
6 ........
7 ........ a
8 ........
9 ........
10 ........
11 ........
12 ........
13 ........
14 ........
15 ........
"""
#Ex10:
import re
a=re.finditer('a?','rajendra19 @k9z')#atleast one 'a' in the sequence
for i in a:
    print(i.start(),'........',i.group())
""":
index     char name
start()    group()
0 ........
1 ........ a
2 ........
3 ........
4 ........
5 ........
6 ........
7 ........ a
8 ........
9 ........
10 ........
11 ........
12 ........
13 ........
14 ........
15 ........
"""
#Ex11:
import re
a=re.finditer('a{2}','rajendraa19 @k9z')#2 'a's in the sequence
for i in a:
    print(i.start(),'........',i.group())
""":
index     char name
start()    group()
7 ........ aa
"""
#Ex12:
import re
#min no of 'a' and max no of 'a's in the sequence
a=re.finditer('a{1,2}','rajendraa19 @k9z')
for i in a:
    print(i.start(),'........',i.group())
""":
index     char name
start()    group()
case1:
a{2,3}
7 ........ aa
case2:
a{2,1}:
error bcoz min should be 1, max should be 2 as per your input
case 3:
1 ........ a
7 ........ aa
"""
#Ex13:
import re
#min no of 'a' and max no of 'a's in the sequence
a=re.finditer('[^a]','rajendraa19 @k9z')
for i in a:
    print(i.start(),'........',i.group())
""":
index     char name
start()    group()
0 ........ r
2 ........ j
3 ........ e
4 ........ n
5 ........ d
6 ........ r
9 ........ 1
10 ........ 9
11 ........ 
12 ........ @
13 ........ k
14 ........ 9
15 ........ z
"""
Functions are available in re module:
1.   match()
2.   fullmatch()
3.   search()
4.   findall()
5.   finditer()
6.   sub()
7.   subn()
8.   split()
9.   compile()
match(): to check the given pattern at beginning of the target string or not. If it is available then it returns match object, None otherwise
Example 1
import re
s=input("enter pattern to check: ")
m=re.match(s,'Blore is a green city')
if m!=None:
    print("match is available at the beging of the string")
    print('start index:{} and end index{}'.format(m.start(),m.end()))
else:
    print('Match is not available at the begining of the string')
"""output:
case1:
enter pattern to check: is
Match is not available at the begining of the string
case2:
enter pattern to check: B
match is available at the beging of the string
start index:0 and end index1
case3:
enter pattern to check: Blo
match is available at the beging of the string
start index:0 and end index3

3.   fullmatch() method:
it searches the full pattern in the given string
"""
import re
s=input("enter pattern to check: ")
m=re.fullmatch(s,'Blore is a green city')
if m!=None:
    print("full string matched")
    print('start index:{} and end index{}'.format(m.start(),m.end()))
else:
    print('full string Match is not available at the begining of the string')
"""output:
case1:
enter pattern to check: Blore is a green city
full string matched
start index:0 and end index21

case2:
enter pattern to check: is
full string Match is not available at the begining of the string

case3:
enter pattern to check: Blore
full string Match is not available at the begining of the string

"""
4.   search():
 match of the first occurrences to search the pattern in everywhere. It returns None if it is not found
Example:
import re
s=input("enter pattern to check: ")
m=re.search(s,'rajendraa')
if m!=None:
    print("full string matched")
    print('first occurences with start index:{} and end index{}'.format(m.start(),m.end()))
else:
    print('full string Match is not available ')
"""output:
case1:
enter pattern to check: aa
full string matched
first occurences with start index:7 and end index9
case2:
enter pattern to check: rajendra
full string matched
first occurences with start index:0 and end index8

"""
5.   findall() method
import re
m=re.findall('[0-9]','abc123')
print(m)
"""output:
['1', '2', '3']
"""

Next
Previous: User Defined Exception
https://youtu.be/TeVaBIo-WQo

Files with Exception handling

#Ask the user to take two input integer values var a,b try:     a=int(input("enter any integer value:"))     b=int(input(&qu...