I'll extract all the numbers contained in the string. Which is more appropriate for the purpose, regular expression or isdigit() method?
Example:
line = "hello 12 hi 89"
Result:
[12, 89]#1 building
@jmnas, I love your answer, but I can't find floating-point numbers. I am working on A script to analyze the code to be fed into CNC milling machine, and need to find X and Y dimensions that can be integer or floating-point number, so I change the code to the following. Find int, float with positive and negative values. You still can't find A value in hexadecimal format, but you can add "X" and "A" to "F" in the num char tuple, which I think will parse something like "0x23AC".
s = 'hello X42 I\'m a Y-32.35 string Z30' xy = ("X", "Y") num_char = (".", "+", "-") l = [] tokens = s.split() for token in tokens: if token.startswith(xy): num = "" for char in token: # print(char) if char.isdigit() or (char in num_char): num = num + char try: l.append(float(num)) except ValueError: pass print(l)
#2 building
This is a bit late, but you can also extend the regex expression to illustrate the scientific notation.
import re # Format is [(<string>, <expected output>), ...] ss = [("apple-12.34 ba33na fanc-14.23e-2yapple+45e5+67.56E+3", ['-12.34', '33', '-14.23e-2', '+45e5', '+67.56E+3']), ('hello X42 I\'m a Y-32.35 string Z30', ['42', '-32.35', '30']), ('he33llo 42 I\'m a 32 string -30', ['33', '42', '32', '-30']), ('h3110 23 cat 444.4 rabbit 11 2 dog', ['3110', '23', '444.4', '11', '2']), ('hello 12 hi 89', ['12', '89']), ('4', ['4']), ('I like 74,600 commas not,500', ['74,600', '500']), ('I like bad math 1+2=.001', ['1', '+2', '.001'])] for s, r in ss: rr = re.findall("[-+]?[.]?[\d]+(?:,\d\d\d)*[\.]?\d*(?:[eE][-+]?\d+)?", s) if rr == r: print('GOOD') else: print('WRONG', rr, 'should be', r)
everything goes well!
In addition, you can view AWS Glue built in regular expression
#3 building
The best choices I have found are as follows. It will extract a number and eliminate any type of character.
def extract_nbr(input_str): if input_str is None or input_str == '': return 0 out_number = '' for ele in input_str: if ele.isdigit(): out_number += ele return float(out_number)
#4 building
This answer also contains the case where the number is floating-point in the string
def get_first_nbr_from_str(input_str): ''' :param input_str: strings that contains digit and words :return: the number extracted from the input_str demo: 'ab324.23.123xyz': 324.23 '.5abc44': 0.5 ''' if not input_str and not isinstance(input_str, str): return 0 out_number = '' for ele in input_str: if (ele == '.' and '.' not in out_number) or ele.isdigit(): out_number += ele elif out_number: break return float(out_number)
#5 building
If you know that there is only one number in the string, "hello 12 hi," you can try filtering.
For example:
In [1]: int(''.join(filter(str.isdigit, '200 grams'))) Out[1]: 200 In [2]: int(''.join(filter(str.isdigit, 'Counters: 55'))) Out[2]: 55 In [3]: int(''.join(filter(str.isdigit, 'more than 23 times'))) Out[3]: 23
But be careful! :
In [4]: int(''.join(filter(str.isdigit, '200 grams 5'))) Out[4]: 2005